[00:12:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [00:15:46] re: keeprefreshing.com, has anyone asked them to not use us as their default in the text box on their main page? that would probably cut out a lot of it. [00:21:23] bblack: i just did [00:22:01] mwalker: the mediawiki-core team does some coding sometimes too [00:23:30] ori, true; I forgot about them; they're all roots though [00:23:33] mwalker: ops! :) [00:24:24] mwalker: Actually only Tim and Ori are roots. [00:24:29] As long as we're all agreed that SWAT is a stupid name. [00:25:23] hey, it's better than most names we invent around 'ere. [00:25:46] "EventLogging" is pretty horrible, I regret that daily [00:26:15] EventLogging seems significantly better than SWAT. [00:26:22] what's wrong with SWAT? Setting Wiki's Ablaze Team is awesome [00:26:29] Well, Wikis. [00:26:38] You don't add an apostrophe to create a plural like that. [00:26:43] sure... /me has bad grammar [00:26:47] :-) [00:27:09] SWAT is a well-known pre-existing acronym. [00:27:21] WKPEA? [00:27:22] And the name is almost completely non-obvious. [00:27:36] But it's not a real issue. [00:27:38] special weapons and tactics seems appropriate as well [00:27:55] PHP may well be more dangerous than guns and explosives. [00:27:57] Time will tell. [00:28:10] you could say that scap is a special weapon, and its tactical intent is to set wikis ablaze [00:28:36] I'm not sure we're going to war with the servers. [00:28:46] Or the wikis. [00:28:58] oh; it's a war against the servers [00:29:19] * mwalker hates computers with the fiery passion of 1,000 burning suns [00:29:23] it's why I'm in the field :D [00:29:46] My computers generally seem to behave themselves. It's developers and their code that I usually hate. [00:30:50] Software development is more art than science. It's pretty much impossible to write bug-free code for anything non-trivial. [00:31:00] much more so in PHP, of course. [00:31:16] Completely agreed. [00:46:19] (03PS1) 10BryanDavis: Logstash: do not use `date` on json event @timestamp [operations/puppet] - 10https://gerrit.wikimedia.org/r/133402 [00:47:55] PROBLEM - Ubuntu mirror in sync with upstream on carbon is CRITICAL: /srv/ubuntu/project/trace/carbon.wikimedia.org is over 12 hours old. [00:48:30] (03CR) 10Ori.livneh: [C: 04-1] "looks good; small quibble inline" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/133393 (owner: 10BryanDavis) [00:48:44] (03CR) 10BryanDavis: "Applied via cherry-pick in beta. This has eliminated the warnings that were being logged for each scap log event on deployment-logstash1." [operations/puppet] - 10https://gerrit.wikimedia.org/r/133402 (owner: 10BryanDavis) [00:48:55] RECOVERY - Ubuntu mirror in sync with upstream on carbon is OK: /srv/ubuntu/project/trace/carbon.wikimedia.org is over 0 hours old. [00:49:06] (03PS2) 10Ori.livneh: Logstash: do not use `date` on json event @timestamp [operations/puppet] - 10https://gerrit.wikimedia.org/r/133402 (owner: 10BryanDavis) [00:49:26] (03CR) 10Ori.livneh: [C: 032 V: 032] Logstash: do not use `date` on json event @timestamp [operations/puppet] - 10https://gerrit.wikimedia.org/r/133402 (owner: 10BryanDavis) [00:53:17] (03PS2) 10Ori.livneh: Add CirrusSearch general debug log to whitelist [operations/puppet] - 10https://gerrit.wikimedia.org/r/133354 (owner: 10Chad) [00:55:38] (03PS2) 10BryanDavis: Logstash: Add support for logging irc !log messages [operations/puppet] - 10https://gerrit.wikimedia.org/r/133393 [00:56:12] (03CR) 10Ori.livneh: [C: 032] Add CirrusSearch general debug log to whitelist [operations/puppet] - 10https://gerrit.wikimedia.org/r/133354 (owner: 10Chad) [01:01:37] (03CR) 10Ori.livneh: [C: 032] Logstash: Add support for logging irc !log messages [operations/puppet] - 10https://gerrit.wikimedia.org/r/133393 (owner: 10BryanDavis) [01:03:25] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Thu May 15 01:03:20 UTC 2014 [01:03:37] ori: Thanks for all those merges. Beta's back down to only 2 cherry-picks ahead of the production puppet branch. [01:03:51] my pleasure, happy to see this work coming along [01:08:31] (03PS2) 10BryanDavis: parsoid: systemuser is only for production [operations/puppet] - 10https://gerrit.wikimedia.org/r/123212 (owner: 10Hashar) [01:34:31] (03PS2) 10Withoutaname: Create 'noratelimit' user group on dewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130809 (https://bugzilla.wikimedia.org/57819) [01:39:06] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 14.29% of data exceeded the critical threshold [500.0] [01:52:06] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% data above the threshold [250.0] [02:14:06] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3800 MB (3% inode=99%): [02:15:34] !log LocalisationUpdate completed (1.24wmf3) at 2014-05-15 02:14:31+00:00 [02:15:46] Logged the message, Master [02:23:05] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3435 MB (3% inode=99%): [02:27:12] !log LocalisationUpdate completed (1.24wmf4) at 2014-05-15 02:26:09+00:00 [02:27:16] Logged the message, Master [03:00:05] RECOVERY - Disk space on virt0 is OK: DISK OK [03:13:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [03:15:10] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu May 15 03:14:04 UTC 2014 (duration 14m 3s) [03:15:15] Logged the message, Master [04:20:10] !log installed db1073 [04:20:14] Logged the message, Master [04:51:28] TimStarling: do you know if there's a reason for twemproxy-{eqiad,labs}.yaml to be in mediawiki-config/wmf-config as opposed to being managed by puppet? [04:56:01] probably just because the file it replaced was in wmf-config also [04:57:36] most server lists used by MediaWiki are in wmf-config, so maybe it makes sense from that perspective [04:58:15] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [05:01:22] Asher introduced it [05:01:34] IIRC [05:01:57] yeah. [05:02:01] is it worth the trouble? [05:02:37] it hasn't been packaged for trusty; the last commit to the git repository was made more than a year ago; there are >50 issues reported [05:02:59] OTOH, the basic idea seems valid, and there doesn't seem to be a better alternative [05:04:04] well, if you remember, the main reason asher switched to twemproxy was to the retry timeout errors [05:04:17] but I submitted a libmemcached patch upstream to fix the same issue [05:04:26] *was to fix the [05:04:51] yes, but yours didn't have a yaml config interface [05:06:16] I'm not really aware of good reasons to use twemproxy, but maybe others know of some [05:08:19] "Nutcracker enables proxying multiple client connections onto one or few server connections. This architectural setup makes it ideal for pipelining requests and responses and hence saving on the round trip time." [05:08:28] this seems a bit dangerous, tho [05:08:37] https://github.com/twitter/twemproxy/blob/master/notes/recommendation.md says: [05:09:08] "By design, twemproxy multiplexes several client connections over few server connections. It is important to note that "read my last write" constraint doesn't necessarily hold true when twemproxy is configured with server_connections: > 1. [...] with configuration of two server connections it is possible that write and read request are sent on different server connections which would mean that their completion could race [05:09:08] with one another. In summary, if the client expects "read my last write" constraint, you either configure twemproxy to use server_connections:1 or use clients that only make synchronous requests to twemproxy." [05:09:38] we set server_connections to 2 [05:10:57] i can't imagine someone other than you or aaron would be in a position to certify that as acceptable without a lot of careful thinking [05:16:25] (03PS1) 10Springle: move s4 commonswiki api traffic to db1042 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133410 [05:18:01] (03CR) 10Springle: [C: 032] move s4 commonswiki api traffic to db1042 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133410 (owner: 10Springle) [05:18:08] (03Merged) 10jenkins-bot: move s4 commonswiki api traffic to db1042 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133410 (owner: 10Springle) [05:19:59] !log springle synchronized wmf-config/db-eqiad.php 'move s4 commonswiki api traffic to db1042' [05:20:03] Logged the message, Master [05:26:18] ok, yeah, i think we should get rid of it [05:28:23] both twemproxy and MemcachedPeclBagOStuff.php use libketama for hashing so the distribution *should* remain the same but i'd want to test that [05:57:05] PROBLEM - Disk space on dataset1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:14:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [06:24:44] (03PS1) 10Ori.livneh: Factor twemproxy out of mediawiki module [operations/puppet] - 10https://gerrit.wikimedia.org/r/133414 [06:27:31] (03CR) 10Ori.livneh: [C: 032] Factor twemproxy out of mediawiki module [operations/puppet] - 10https://gerrit.wikimedia.org/r/133414 (owner: 10Ori.livneh) [06:30:15] (03PS1) 10Ori.livneh: Apply twemproxy module from I6f1f1039b on mw1063 [operations/puppet] - 10https://gerrit.wikimedia.org/r/133415 [06:31:36] (03CR) 10Ori.livneh: [C: 032 V: 032] Apply twemproxy module from I6f1f1039b on mw1063 [operations/puppet] - 10https://gerrit.wikimedia.org/r/133415 (owner: 10Ori.livneh) [06:33:18] (03PS1) 10Ori.livneh: Follow-up to I3f57347d2: Qualify class path [operations/puppet] - 10https://gerrit.wikimedia.org/r/133416 [06:33:35] (03CR) 10Ori.livneh: [C: 032 V: 032] Follow-up to I3f57347d2: Qualify class path [operations/puppet] - 10https://gerrit.wikimedia.org/r/133416 (owner: 10Ori.livneh) [06:36:25] (03PS1) 10Springle: move s5 api traffic to db1005 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133417 [06:37:15] PROBLEM - twemproxy port on mw1063 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 65534 (nobody), command name nutcracker [06:37:44] that's me, will be fine in a moment [06:38:15] RECOVERY - twemproxy port on mw1063 is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [06:41:34] (03CR) 10Springle: [C: 032] move s5 api traffic to db1005 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133417 (owner: 10Springle) [06:42:15] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [06:44:00] !log springle synchronized wmf-config/db-eqiad.php 'move s5 api traffic to db1005' [06:44:05] Logged the message, Master [06:44:21] (03PS1) 10Ori.livneh: Amend twemproxy config_file path [operations/puppet] - 10https://gerrit.wikimedia.org/r/133418 [06:46:23] (03PS2) 10Ori.livneh: Amend twemproxy config_file path [operations/puppet] - 10https://gerrit.wikimedia.org/r/133418 [06:47:36] (03CR) 10Ori.livneh: [C: 032 V: 032] Amend twemproxy config_file path [operations/puppet] - 10https://gerrit.wikimedia.org/r/133418 (owner: 10Ori.livneh) [07:08:55] PROBLEM - Disk space on labstore1001 is CRITICAL: DISK CRITICAL - free space: /exp/dumps 379802 MB (3% inode=99%): [07:36:03] good morning [07:38:05] good morning hashar! [07:42:47] hey :) [07:43:03] was trying to get the packaging mailing list added to gmane but got some weird email bounce. Going to be fun time this morning [07:50:29] (03PS6) 10Nuria: Make upstart track the wikimetrics PID [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/130588 (https://bugzilla.wikimedia.org/63819) [08:01:06] (03PS7) 10Ottomata: Make upstart track the wikimetrics PID [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/130588 (https://bugzilla.wikimedia.org/63819) (owner: 10Nuria) [08:01:20] (03CR) 10Ottomata: [C: 032 V: 032] Make upstart track the wikimetrics PID [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/130588 (https://bugzilla.wikimedia.org/63819) (owner: 10Nuria) [08:15:17] oh yet another puppet repo [08:15:24] with no jobs!! ottomata ! [08:15:48] haha [08:15:52] wikimetrics? [08:15:56] yeah [08:16:00] <_joe_> commit while you can! [08:16:15] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [08:16:32] <_joe_> hashar: If I write some unit test for a puppet module I write, will they be ran by jenkis automatically? [08:16:34] ottomata: do you remember how to add jobs in JJB and Zuul ? [08:16:44] (03PS1) 10Springle: Redirect analytics traffic away from db71 during maintenance. [operations/dns] - 10https://gerrit.wikimedia.org/r/133428 [08:16:54] UMMmMMMmmmmmmmMM i think so! [08:16:57] i ahve 2 repos checked out [08:16:59] i find the other repos [08:17:05] copy paste, edit [08:17:06] _joe_: if you have 45 minutes ahead we can do a hangout together so I can introduce you to Jenkins/Zuul/Jenkins [08:17:16] (03CR) 10Springle: [C: 032] Redirect analytics traffic away from db71 during maintenance. [operations/dns] - 10https://gerrit.wikimedia.org/r/133428 (owner: 10Springle) [08:17:17] ottomata: yeah that is about it :-] [08:17:27] _joe_: if you guys do that I might join, I only know a little bit [08:17:34] _joe_: in short no. We need to define a job in Jenkins that will execute whatever command you might need [08:17:36] also, would love to know how you are writing puppet unit tests, as I have not really ever done that [08:18:02] <_joe_> ok, can we wait 1 hour? I am moving apartments and I can be summoned by the phone technician any minute [08:18:14] <_joe_> and I have to move there in a hurry, in that case [08:18:30] for puppet, we had some pairings last summers with Andrew Bogott and Alexandros but that did not go very far. We came up with a job executing rspec though (you can see it in gerrit comments on operations/puppet.git [08:18:48] <_joe_> actually, given the tradition of punctuality and QoS of utilities in italy, I'd be more confident in the afternoon [08:18:51] _joe_: sure, we can do it post lunch if you are busy this morning [08:19:01] haha [08:19:10] <_joe_> yeah I'm actually at my desk working [08:19:20] what does this remind me, what does this remind me... [08:19:40] <_joe_> akosiaris: don't laugh, I just discovered we have 5k Kw of unaccounted power in that flat [08:19:47] ? [08:19:50] <_joe_> that I'm not going to pay, but still.... [08:20:11] 5k KWh I assume [08:20:16] <_joe_> akosiaris: the power company says the counter should be at 3000, it's at 8000 [08:20:24] not 5k KW [08:20:25] <_joe_> akosiaris: yes, of course [08:20:34] it is either: (1) neighbor hijacking your power (2) a few secret warez servers hidden behind a secret wall in your apartment [08:20:57] niah, it is usually marijuana growing places :P [08:20:59] <_joe_> or 3) they were sloppy and did not update the spreadsheet they use for technicians [08:21:11] I hear they need too much power due to all the lighting [08:21:13] <_joe_> akosiaris: I hoped so, but no lamps are there [08:21:27] <_joe_> :P [08:21:30] _joe_: shut down the data center you are running at home [08:22:15] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [08:22:23] <_joe_> matanya: well I am at the point where I have more than 2 computer/animal [08:22:29] <_joe_> including the cat :) [08:22:50] :) [08:25:49] <_joe_> springle: when you need to create a database, do you use puppet or plain old mysql client? [08:26:00] <_joe_> I mean, the db and all its grants [08:30:48] _joe_: depends. the misc clusters are manual. MW extensions generally have .sql in a repo. core dbs are only ever cloned with xtrabackup so no initialization of the datadir is done at all by puppet [08:31:13] labs is automated by coren's scripts in operations/software [08:39:27] <_joe_> springle: I mean, do we have a puppet library for those type of sql commands? [08:39:38] <_joe_> like define mysql_user{} [08:39:48] <_joe_> or something along those lines? [08:41:50] (03PS1) 10Nuria: Bumping up wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/133430 [08:42:06] (03CR) 10jenkins-bot: [V: 04-1] Bumping up wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/133430 (owner: 10Nuria) [08:47:55] hm, nuria, looks like you committed that on top of a pretty old head [08:48:03] gerrit can't rebase it [08:48:05] araghhhh [08:48:17] let me abandon it and re-do [08:48:21] you might be able to rebase it locally [08:48:21] sorry about that [08:48:22] ok cool [08:48:24] that's fine [08:48:25] <_joe_> nuria: git rebase [08:48:35] yeah, git rebase origin/production [08:48:37] might be ok [08:48:47] no need to resolve conflicts though [08:48:52] but, _joe_, i find that sometimes these things are easier to just abandon and try again [08:48:56] :) [08:49:01] _joe_: not that i've found. if you add one, the mariadb sub-repo could use it for misc [08:49:23] <_joe_> ottomata: that defies the whole purpose of VCS imo [08:49:34] ottomata: i actually submitted it on top of a local branch, sorry again , re-doing [08:49:58] <_joe_> springle: so If I do them, we may move it to the mariadb module later? [08:50:06] ha, _joe_, agreed, but depends on the change, the one that nuria is submitting is just a submodule update [08:50:09] very easy to reproduce [08:50:16] <_joe_> oooh ok [08:50:18] _joe_, are you talking about puppetizing mysql users? [08:50:24] <_joe_> yes [08:50:33] (03Abandoned) 10Nuria: Bumping up wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/133430 (owner: 10Nuria) [08:50:34] <_joe_> It's not that easy [08:50:43] <_joe_> but I kinda did it before [08:50:56] me too! [08:51:00] puppetizing mysql users ? [08:51:03] this is puppet I wrote a LONG time ago [08:51:05] https://github.com/ottomata/cs_puppet_mysql/blob/master/mysql.pp#L205 [08:51:10] a PITA IMHO [08:51:14] no idea if it is any good anymore (like all old code that we write!) [08:51:29] and not sure it is really worth it [08:51:31] <_joe_> akosiaris: the P in PITA is for 'puppet' [08:51:34] <_joe_> in general [08:51:37] ahahaha [08:52:04] that define uses maatkit...which is now somehting else...percona tools or sumpin? [08:52:18] percona toolkit [08:52:22] <_joe_> oh well, simplediff is *not* in ubuntu [08:53:56] (03PS1) 10Nuria: Bumping up wikimetrics module [operations/puppet] - 10https://gerrit.wikimedia.org/r/133431 [08:54:23] _joe_: if you're keen, feel free. it would need to be quite flexible to handle all our grants use-cases [08:55:55] RECOVERY - Disk space on labstore1001 is OK: DISK OK [08:57:10] <_joe_> springle: executing .sql files is probably easier [08:57:48] yup :) [08:58:15] stick a .sql erb in puppet [08:58:29] i'm sure there is a much better solution that than thing I wrote a long time ago, but that should be 100% flexible, no? [08:58:45] you pass in the exact permission string you want the user to have [08:58:48] <_joe_> a style question: I see noone uses resource titles with spaces here [08:58:49] if the user doesn't have it [08:59:06] it will remove all previous permissions and add the ones passed in [08:59:11] <_joe_> I'm used to do: exec { 'Install puppet 3 bundle': } [08:59:21] <_joe_> so that it shows nicely in logs [09:00:21] hehe, yeah dunno why we don't do that...i don't like it because then the resource title doesn't count as a single word when I try to select it [09:00:24] which makes copy/paste slower :p [09:00:54] <_joe_> so there is no centrally mandated choice on this [09:01:01] possibly there is! [09:01:05] <_joe_> ottomata: the error is using the mouse [09:01:29] shhhhhh, i'm mostly no mouse, but some! [09:01:41] <_joe_> I use it for synercy [09:01:46] <_joe_> *synergy [09:01:47] i'm sure I agree with you deep down inside [09:01:49] haa [09:02:08] i'm reading here for answers to your question [09:02:08] https://wikitech.wikimedia.org/wiki/Puppet_coding#Style.2C_organization_and_class_conventions [09:02:18] <_joe_> I read that [09:02:56] don't see anything there about it [09:03:07] so ja! do it if you wanna, i guess! [09:03:13] <_joe_> yeah and btw, that needs a good revamp [09:03:19] something about it rubs me the wrong way [09:03:23] but for no really good reason [09:03:27] <_joe_> now that we enter in the 2012 puppet :) [09:03:30] haha [09:03:35] <_joe_> with puppet 3 I mean [09:03:49] <_joe_> some of the logic there is a tad obsolete [09:07:27] hashar: [09:07:27] https://gerrit.wikimedia.org/r/#/c/133432/ [09:07:27] https://gerrit.wikimedia.org/r/#/c/133433/ [09:12:14] awesome [09:15:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [09:15:42] i need to have Jenkins jobs to self update once a change is merged [09:15:55] and to integrate Zuul in git-deploy so folks can deploy layout changes by themselves [09:16:52] (03PS1) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/133435 [09:17:08] ottomata: success! [09:17:18] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [operations/puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/133435 (owner: 10Hashar) [09:17:54] nice :) [09:21:22] ottomata: i guess i'd just have to be convinced to let automated grant/revoke run on any production cluster without a human checking the statements first :) [09:21:35] ya for sure [09:21:48] but for misc clusters, absolutely [09:22:05] i think I used that mostly for slaves [09:22:32] would be useful for the anayltics db accounts [09:22:38] yeah, oh yeha [09:22:56] especially if that would make it possible to do per user accounts, rather than one research user that everyone uese? [09:22:58] uses* [09:23:01] or would that be annoying? [09:23:19] no that's something dario brought up recently. i agree with the idea [09:23:34] although i don't really know how many people we're talking about [09:23:49] hmm, i would guess more than 5, less than 20 :) [09:23:52] i'm not sure either [09:24:01] would make it easier to track down authors of silly queries :) [09:24:06] ja for sure [09:27:34] now that we are talking about it, how do we handle mysql users in labs ? [09:27:59] I am facing that problem with postgres and I am wondering how to solve it [09:31:34] akosiaris: coren has perl scripts in operations/software [09:31:47] afaik they create dbs, handle users, views, etc [09:32:14] springle: ok thanks, I 'll take a look [09:38:19] <_joe_> there is a puppetlabs/mysql module [09:38:26] <_joe_> don't know how good that is [09:38:28] please don't [09:38:40] puppetlabs modules are mostly awful [09:38:55] could be that it is good, but somehow I doubt it [09:40:22] <_joe_> akosiaris: well, stdlib is decent [09:40:31] <_joe_> the only one I ever used [09:53:47] i wonder what does that tell us about puppet, if the company that creates the software writes bad modules [10:03:35] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Thu May 15 07:03:07 2014 [10:09:15] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [10:13:28] (03PS1) 10Springle: depool db1009 for raid tests [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133439 [10:13:56] (03CR) 10Springle: [C: 032] depool db1009 for raid tests [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133439 (owner: 10Springle) [10:14:04] (03Merged) 10jenkins-bot: depool db1009 for raid tests [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133439 (owner: 10Springle) [10:15:54] !log springle synchronized wmf-config/db-eqiad.php 'depool db1009 for raid tests' [10:16:02] Logged the message, Master [10:27:32] <_joe_> I was reading https://ripe68.ripe.net/presentations/208-The_Decline_and_Fall_of_BIND_10.pdf and one sentence struck me as strange "Administrators really hate Python. Really. HATE." [10:36:45] (03PS1) 10Phuedx: Enable the anonymous signup invite experiment [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133442 [10:37:57] (03CR) 10Phuedx: [C: 04-1] "-1 until the rest of the Growth team wakes up." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133442 (owner: 10Phuedx) [10:55:53] that's bullshit [11:03:02] ori chasemp, see log from springle, that's a machine similar to tungsten we can run io benchmarks on, I couldn't find any [11:03:29] (and thanks springle for lending db1009 for a week!) [11:08:48] <_joe_> mark: I thought exactly the same about a good part of that slides [11:09:08] I'm pretty sure it's the whole isc leadership changes [11:09:21] i.e. slide 12 [11:09:35] <_joe_> paravoid: yes that looked _bad_ [11:09:55] I had heard before about it [11:09:58] with more details [11:10:02] it was bad [11:14:28] is there a known git problem at the moment ? vagrant@mediawiki-vagrant:/vagrant$ git pull error: RPC failed; result=22, HTTP code = 503 fatal: The remote end hung up unexpectedly [11:16:54] <_joe_> physikerwelt: do you know what is the url it's trying to pull from? [11:17:48] sure https://gerrit.wikimedia.org/r/mediawiki/vagrant [11:18:10] and also on another server https://physikerwelt@gerrit.wikimedia.org/r/mediawiki/extensions/MathSearch [11:18:35] <_joe_> sorry I've been called atm, can't look at it :( [11:18:55] works via ssh [11:18:57] (03PS1) 10Ottomata: Adding libwww-perl on stat1003 for Erik Z's wikistats stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/133444 [11:19:12] (03CR) 10Ottomata: [C: 032 V: 032] Adding libwww-perl on stat1003 for Erik Z's wikistats stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/133444 (owner: 10Ottomata) [11:20:55] gerrit does seem slow to me [11:24:54] indeed cloning mediawiki/vagrant doesn't work for me, taking a look [11:32:07] for me it works via ssh but not via https [11:56:10] !log installed openjdk-7-jdk on ytterbium to attempt gerrit thread dump [11:56:17] Logged the message, Master [12:16:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [12:24:13] !log "in place" reindexing group1 wikis after the deployment train updated cirrus yesterday. They'll need a full reindex after that is done which will take some time but is required to fix issues with redirects not showing up off of the main namespace [12:24:17] Logged the message, Master [12:31:07] (03PS1) 10Giuseppe Lavagetto: puppet-compiler: module for installation (VERY WiP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/133449 [12:31:09] (03CR) 10jenkins-bot: [V: 04-1] puppet-compiler: module for installation (VERY WiP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/133449 (owner: 10Giuseppe Lavagetto) [12:33:05] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Thu May 15 12:32:56 UTC 2014 [12:33:40] (03PS2) 10Giuseppe Lavagetto: puppet-compiler: module for installation (VERY WiP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/133449 [12:39:07] (03CR) 10jenkins-bot: [V: 04-1] puppet-compiler: module for installation (VERY WiP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/133449 (owner: 10Giuseppe Lavagetto) [12:56:17] !log restarting gerrit on ytterbium, clones over https seemingly stuck [12:56:22] Logged the message, Master [13:02:59] (03CR) 10Cmjohnson: [C: 032] Adding mgmt ip's for new misc servers [operations/dns] - 10https://gerrit.wikimedia.org/r/133378 (owner: 10Cmjohnson) [13:06:09] (03PS1) 10Cmjohnson: adding mac addresss for db1072 [operations/puppet] - 10https://gerrit.wikimedia.org/r/133456 [13:07:18] (03CR) 10Cmjohnson: [C: 032] adding mac addresss for db1072 [operations/puppet] - 10https://gerrit.wikimedia.org/r/133456 (owner: 10Cmjohnson) [13:11:32] manybubbles|away: hey [13:11:47] sdehaan: hey! [13:12:54] manybubbles: sorry, getting the sample data took longer than expected and then I got put on a plane somewhere for a day and other work got in the way [13:13:10] manybubbles: how happy are you for me to startup a tsung session with random queries? [13:13:17] sdehaan: ah! now I remember! [13:13:27] sure, but, maybe start slow? [13:13:35] manybubbles: will do [13:13:36] like, 25/sec and first and we see from there? [13:14:02] (03PS3) 10Giuseppe Lavagetto: puppet-compiler: module for installation (VERY WiP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/133449 [13:16:00] manybubbles: ok with 1 minute with an arrival rate of 20 per second? [13:16:06] sure [13:16:11] !log shutting down sodium to replace sdb [13:16:15] Logged the message, Master [13:18:25] PROBLEM - Host sodium is DOWN: CRITICAL - Host Unreachable (208.80.154.61) [13:19:48] (03PS4) 10Giuseppe Lavagetto: puppet-compiler: module for installation (VERY WiP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/133449 [13:22:55] RECOVERY - Host sodium is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [13:30:36] manybubbles: running [13:32:24] sdehaan: http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&s=by+name&c=Elasticsearch+cluster+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&hide-hf=false&sh=1&z=small&hc=4 [13:32:32] I can see the bump - much higher then I'd like, actually [13:32:53] but not killer. otoh, we can't do 10x [13:33:37] I wonder if we can make the api call more efficient - can you send me an email with an example? [13:34:00] but, if you'd like, you can try 60/sec [13:36:57] (03CR) 10Mark Bergsma: [C: 04-1] ircd-ratbox and udpmxircecho puppetized (035 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132495 (owner: 10Rush) [13:37:07] manybubbles: https://gist.github.com/22674f455cd41c88c54a is my current config, the URLs in the bottom and the query is grabbed from a CSV file with sample queries [13:37:43] the api call matches the kind of API call the Wikipedia Text app makes [13:38:15] manybubbles: bumping the arrival rate from 20 to 60 and running again for 1 minute [13:38:41] cool [13:39:22] sdehaan: 200/sec is about what the old search handles. I'm fighting to get the new search able to handle it. Its much more of a struggle then I'd like.... [13:39:35] tsung spewing some errors but letting it finish before I investigate [13:40:13] sdehaan: my guess is request rejected [13:40:31] looks like the load isn't being applied very evenly on my side [13:40:46] which isn't particularly normal [13:40:49] but not something you did [13:41:35] (03CR) 10Faidon Liambotis: "So, since I didn't find anything wrong besides the sudo::labs_project/sudo::default thing, I went on a code style rampage and noted a bunc" (0313 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 (owner: 10Rush) [13:44:28] !log sodium going down again for a different disk replacement [13:44:31] Logged the message, Master [13:44:32] (03PS1) 10Springle: deploy db1070 and db1071 to S1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/133460 [13:46:27] (03CR) 10Springle: [C: 032] deploy db1070 and db1071 to S1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/133460 (owner: 10Springle) [13:46:45] PROBLEM - Host sodium is DOWN: CRITICAL - Host Unreachable (208.80.154.61) [13:50:55] RECOVERY - Host sodium is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms [13:53:08] (03PS4) 10Reza: Lots of rights changes for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133364 (https://bugzilla.wikimedia.org/65348) [13:56:56] (03PS1) 10Springle: depool db1056 while cloning [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133461 [13:57:26] (03CR) 10Springle: [C: 032] depool db1056 while cloning [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133461 (owner: 10Springle) [13:57:38] (03Merged) 10jenkins-bot: depool db1056 while cloning [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133461 (owner: 10Springle) [13:58:10] (03CR) 10Calak: [C: 031] Lots of rights changes for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133364 (https://bugzilla.wikimedia.org/65348) (owner: 10Reza) [13:59:43] !log springle synchronized wmf-config/db-eqiad.php 'depool db1056 while cloning' [13:59:57] Logged the message, Master [14:03:09] !log xtrabackup clone db1056 to db1070 [14:03:13] Logged the message, Master [14:07:35] (03PS1) 10Reedy: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133464 [14:07:37] (03PS1) 10Reedy: testwiki to 1.24wmf5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133465 [14:07:39] (03PS1) 10Reedy: Wikipedias to 1.24wmf4 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133466 [14:07:41] (03PS1) 10Reedy: group0 to 1.24wmf5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133467 [14:08:34] (03CR) 10Reedy: [C: 032] Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133464 (owner: 10Reedy) [14:08:43] (03Merged) 10jenkins-bot: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133464 (owner: 10Reedy) [14:09:04] meow, I'm curious about effects today [14:13:22] (03PS7) 10Rush: admin module for user/group/permissions cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 [14:13:24] (03PS7) 10Rush: one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 [14:15:05] (03CR) 10jenkins-bot: [V: 04-1] one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 (owner: 10Rush) [14:15:31] (03CR) 10Rush: "updated for all comments (I believe except this one which I disagree with -- paste from irc))" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 (owner: 10Rush) [14:28:36] manybubbles: does the search API return HTTP 20x responses for when the search backend fails or are those codes used for that? [14:30:57] manybubbles: mainly asking because all I get back is in the 20x range [14:38:38] (03PS8) 10Rush: admin module for user/group/permissions cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 [14:38:40] (03PS8) 10Rush: one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 [14:40:22] (03CR) 10jenkins-bot: [V: 04-1] one-off to convert admins.pp to yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/129541 (owner: 10Rush) [14:40:36] (03CR) 10Faidon Liambotis: [C: 032] ":)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/129501 (owner: 10Rush) [14:43:26] sdehaan: hmmm - should return appropriate responses but I'm not 100% sure. [14:43:49] manybubbles: ok with me running the 60 msg/s again for 1m? [14:44:15] sdehaan: sure [14:44:58] ok running [14:51:01] manybubbles: the errors I'm seeing in tsung look like connect timeouts or refusals [14:51:16] sdehaan: its likely because we rate limit on our side [14:51:38] and not rate limit, really its more like we limit the number of concurrent requests [14:52:01] yeah: Pool error searching Elasticsearch: pool-queuefull [14:52:08] manybubbles: any idea if that returns an HTTP response or if that just kills the connection? [14:52:08] that means we had too many open requests... [14:52:27] manybubbles: ok, that's the kinds of errors we were seeing in the production environment as well [14:52:42] anything we can do about that from our side? [14:52:46] sdehaan: not really sure, actually. certainly should return whatever 400 code is for please slow down [14:53:03] what would you recommend? Issuing a retry seems like a bad idea if the queue's already full. [14:53:04] sdehaan: for now? rate limit to 20/sec. we saw that was safe. [14:53:15] ok thanks, good to know. [14:53:25] sdehaan: I'm constantly working on making it not slow [14:53:28] because its really slow now [14:53:38] :) I know the feeling, good luck. [14:53:50] rating limiting that in production on our side is going to be hard because it's driven by user traffic [14:54:01] but it's good to know where we can expect errors to start happening [14:54:13] like if we see traffic in excess of ~50 msg/s [14:56:02] Krinkle: has your graphite stuff been solid? the few I was checking seem gtg [14:56:25] Yep [14:56:29] chasemp: http://codepen.io/Krinkle/full/cBGCl/ http://codepen.io/Krinkle/full/zyodJ/ [14:56:31] Looks good again [14:56:36] Too bad about the dip [14:56:50] I assume the database is not affected? [14:57:28] I'm not sure if this sort of thing has happened before, but I recall there being a script to prepopulate or repopulate statd by replaying past data. e.g. if the cron for graphite wasn't set up yet. [14:57:37] Perhaps we can use that to fixup this? [14:57:54] Either way, okay for now, I'll just ignore that part. I'm not using it largely yet. [14:57:57] I genuinely have no idea, that data afaik is transient [14:58:04] but maybe the mediawiki parts are not [14:58:15] cool tho [14:58:23] manybubbles: I'll do the SWAT today, unless you really want to for some reason [14:58:31] anomie: have fun! [14:58:47] marktraceur, gi11es: Ping for SWAT deploy [15:00:30] anomie: Hi! What's going out? (I didn't add it) [15:00:57] marktraceur: https://gerrit.wikimedia.org/r/#/c/133446/ Fix for IE in MultimediaViewer [15:01:06] K [15:01:46] * anomie is starting the SWAT deploy [15:02:26] has anyone heard rumors of uploads not working on commons? I had a monitor test return a 502 a little over 2 hours ago, but it seems OK now. [15:03:34] chrismcmahon: Lots of rumours, mostly they're heisenbugs AFAIK [15:04:34] Nothing in -commons or the VP. [15:04:34] marktraceur: yeah, this rarely goes red: https://wmf.ci.cloudbees.com/job/UploadWizard-api-commons.wikimedia.org/ [15:04:48] * marktraceur looks [15:04:58] Oh [15:04:59] marktraceur: and it looks like a real 502 [15:05:02] chrismcmahon: That's a Gerrit bug! [15:05:13] You didn't even get to run the tests [15:05:18] oh duh [15:05:20] :) [15:05:39] marktraceur: lemme get my first cup of coffee now and be quiet. [15:05:46] * marktraceur raises his own [15:06:23] ^demon|away: "stderr: error: RPC failed; result=22, HTTP code = 502" just hit the cloudbees hosts causing a false failure; wouldn't mind some more investigation [15:11:45] chrismcmahon marktraceur gerrit was in trouble earlier today, with clones (I suppose pushes too) not working over https, I've restarted it [15:13:08] matanya: hey, i'm afraid the monitoring check is in multiple places, heh [15:13:46] Ah, kay [15:14:08] godog: That sort of error worries me because I've seen it on labs a lot, thanks for fixing :) [15:14:46] marktraceur: np, I was able to get a thread dump and gerrit show-queue listed many git-upload-pack in "waiting", the jvm itself didn't seem in particular trouble tho [15:16:13] <^demon|away> gerrit was freaky earlier? [15:17:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [15:18:05] ^d: Super freaky, ow. [15:18:32] <^d> The 90+ retries on replicating that project isn't helping probably. [15:18:39] <^d> *that one project [15:19:13] marktraceur, gi11es: Deploying now [15:20:18] (03PS1) 10Dzahn: use identical check command for all LVS monitors [operations/puppet] - 10https://gerrit.wikimedia.org/r/133480 [15:20:33] !log anomie synchronized php-1.24wmf4/extensions/MultimediaViewer 'SWAT: Deploy change 133475 to fix bug 65225 in MultimediaViewer' [15:20:38] Logged the message, Master [15:20:43] marktraceur, gi11es: ^ Please test [15:20:48] Uhh [15:21:06] ...it was an IE fix, I have no way to test. Crap. [15:21:16] * anomie notices he mentioned the wrong change in the log message... oh well [15:22:08] if you want to fix it, you can edit it on wikitech [15:22:43] I'll do that [15:24:37] <^d> marktraceur, godog: Pretty sure that was it. error_log was nothing but failed replications for two repos leading up to the restart. [15:24:48] <^d> Probably exhausted thread pools. [15:26:44] ^d: yeah there were failed replications, however those seem to have been going on forever? [15:26:59] * anomie is done with SWAT, unless someone comes along saying that gi11es's patch broke things [15:27:06] <^d> godog: They can fail...a lot...before it becomes a problem. [15:27:31] <^d> I've seen instability before when you've got several hundred fails + retries [15:28:09] anomie: At least it didn't make anything worse. :) [15:28:28] <^d> godog: Anyway, seems happier now and those are fixed. [15:28:35] ^d: fascinating, thanks! [15:37:56] (03CR) 10Dzahn: [C: 031] timeout submit_check_result, see rt #5311 [operations/puppet] - 10https://gerrit.wikimedia.org/r/126209 (owner: 10ArielGlenn) [15:45:22] Oh, wait, I'm an hour late. [15:45:33] 16:00 in UTC [15:46:30] Nope, never mind. I'm not late. [15:46:34] train hasn't left the station yet [15:46:35] good :) [15:46:42] Reedy just created the branch early [15:57:42] Krinkle: Branch creation time is randomized to thwart last minute merges :) [15:58:04] (03CR) 10Dzahn: [C: 031] keep two weeks of apache logs instead of a year [operations/puppet] - 10https://gerrit.wikimedia.org/r/130296 (owner: 10ArielGlenn) [15:58:23] Yeah, and to allow some time for sanitity checking / local people's install to fail [15:58:26] I get it :) [16:07:03] (03CR) 10Dzahn: [C: 04-1] "shouldn't the include base::firewall be on the node level? at least that's how we did on matanya's changes that i merged" [operations/puppet] - 10https://gerrit.wikimedia.org/r/96980 (owner: 10Alexandros Kosiaris) [16:18:07] (03CR) 10Alexandros Kosiaris: "Absolutely correct. I will amend" [operations/puppet] - 10https://gerrit.wikimedia.org/r/96980 (owner: 10Alexandros Kosiaris) [16:27:38] (03CR) 10Matanya: [C: 031] "yeah, should have done that too." [operations/puppet] - 10https://gerrit.wikimedia.org/r/133480 (owner: 10Dzahn) [16:31:15] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [16:38:13] (03PS1) 10Manybubbles: Elasticsearch plugin juggling [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133495 [16:42:38] (03PS7) 10Alexandros Kosiaris: bacula: allow mysqldumps to be kept locally [operations/puppet] - 10https://gerrit.wikimedia.org/r/132214 [16:48:06] (03PS1) 10Jgreen: cut civicrm.wm.o and fundraising.wm.o from aluminium-->barium [operations/dns] - 10https://gerrit.wikimedia.org/r/133497 [16:49:18] (03CR) 10Jgreen: [C: 032 V: 031] cut civicrm.wm.o and fundraising.wm.o from aluminium-->barium [operations/dns] - 10https://gerrit.wikimedia.org/r/133497 (owner: 10Jgreen) [17:02:38] (03PS8) 10Alexandros Kosiaris: bacula: allow mysqldumps to be kept locally [operations/puppet] - 10https://gerrit.wikimedia.org/r/132214 [17:10:33] (03CR) 10Gergő Tisza: "The core dependency will hit the Wikipedias next Thursday, this should be merged after that." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132112 (owner: 10Gergő Tisza) [17:31:08] (03PS1) 10Giuseppe Lavagetto: compare-puppet-catalogs: minor tweaks [operations/software] - 10https://gerrit.wikimedia.org/r/133505 [17:33:57] (03PS5) 10Giuseppe Lavagetto: puppet-compiler: module for installation (WiP) [operations/puppet] - 10https://gerrit.wikimedia.org/r/133449 [17:52:02] (03PS2) 10Dzahn: use identical check command for all LVS monitors [operations/puppet] - 10https://gerrit.wikimedia.org/r/133480 [17:53:45] !log adjusted exim conf on mchenry to route donate.wm.o mail to barium instead of aluminium [17:53:50] Logged the message, Master [17:56:01] matanya: I have updated quite a bit https://etherpad.wikimedia.org/p/nodes_with_a_public_IP [17:56:07] thanks akosiaris [17:58:43] (03CR) 10Dzahn: [C: 032] use identical check command for all LVS monitors [operations/puppet] - 10https://gerrit.wikimedia.org/r/133480 (owner: 10Dzahn) [18:01:00] akosiaris: regarding rhenium, it has pmacct on it. it has some ports declared on manifests/role/pmacct.pp [18:01:25] (03CR) 10Dzahn: "well, not for all all, i should have said "for all bits static asset checks". and same in icinga vs watchmouse" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133480 (owner: 10Dzahn) [18:03:10] (03PS2) 10Alexandros Kosiaris: Add gerrit ferm rules for production [operations/puppet] - 10https://gerrit.wikimedia.org/r/96980 [18:04:28] matanya: ok thanks. I will add the rules there, I got another task pending on that anyway [18:05:17] oh my... pmacct init script does not check if it is already running ... [18:05:24] it was running 1000 times on that poor host [18:05:42] i can fix it [18:08:30] (03PS3) 10Rush: ircd-ratbox and udpmxircecho puppetized [operations/puppet] - 10https://gerrit.wikimedia.org/r/132495 [18:09:23] (03PS1) 10Ori.livneh: twemproxy: allow /etc/default/twemproxy to be passed in [operations/puppet] - 10https://gerrit.wikimedia.org/r/133512 [18:09:29] Krinkle|detached: "early" [18:09:38] I always do it a few hours before :P [18:10:20] (03CR) 10Dzahn: [C: 032] "node ytterbium has this role. the node doesn't have base::firewall yet. that means this can be merged anytime and nothing should happen at" [operations/puppet] - 10https://gerrit.wikimedia.org/r/96980 (owner: 10Alexandros Kosiaris) [18:10:41] (03CR) 10jenkins-bot: [V: 04-1] twemproxy: allow /etc/default/twemproxy to be passed in [operations/puppet] - 10https://gerrit.wikimedia.org/r/133512 (owner: 10Ori.livneh) [18:11:20] (03CR) 10Reedy: [C: 032] testwiki to 1.24wmf5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133465 (owner: 10Reedy) [18:12:33] (03Merged) 10jenkins-bot: testwiki to 1.24wmf5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133465 (owner: 10Reedy) [18:12:34] already deployment time? [18:12:37] yeppers! [18:12:47] (03PS1) 10Matanya: dns recurses: add ferm rules [operations/puppet] - 10https://gerrit.wikimedia.org/r/133513 [18:12:50] * aude confused and jetlagged ;) [18:12:53] yup, no early scap today [18:13:28] are the jquery changes in for today? [18:13:45] (03PS2) 10Ori.livneh: twemproxy: allow /etc/default/twemproxy to be passed in [operations/puppet] - 10https://gerrit.wikimedia.org/r/133512 [18:14:05] jquery.migrate? [18:14:33] ori: can i draw your attention to this just because i saw the word twemproxy there [18:14:36] https://gerrit.wikimedia.org/r/#/c/118718/ [18:14:53] yeah [18:15:27] yeah, that is [18:15:30] ok [18:15:32] James_F merged it over the weekend [18:16:12] i'm sure it's fine, but we'll have ot poke around on test.wikidata since we use a lot of jquery [18:16:25] and then of course, migrate if we're using any old stuff [18:16:49] heh [18:16:49] (03PS5) 10Alexandros Kosiaris: Backup role::mariadb::dbstore [operations/puppet] - 10https://gerrit.wikimedia.org/r/132215 [18:16:51] (03PS9) 10Alexandros Kosiaris: bacula: allow mysqldumps to be kept locally [operations/puppet] - 10https://gerrit.wikimedia.org/r/132214 [18:17:25] Hmm [18:17:32] It wouldl look like the module is just in place [18:17:32] https://gerrit.wikimedia.org/r/#/c/133235/ [18:17:37] adds tracking [18:17:45] i see [18:18:12] !log reedy Started scap: testwiki to 1.24wmf5 and build l10n cache [18:18:17] Logged the message, Master [18:18:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [18:19:19] https://www.mediawiki.org/wiki/MediaWiki_1.24/wmf5/Changelog [18:20:08] aude: http://lists.wikimedia.org/pipermail/wikitech-l/2014-May/076340.html [18:20:35] * 12 May 2014 (1.24wmf4 [9]): Phase 1 – Instrumentation and logging starts. This will run for 4 weeks (until June 9). [18:20:36] * 19 May 2014 (1.24wmf5): Phase 2 – "Upgrade and Migrate". This will run for 3 weeks (upto June 9). The instrumentation continues during this period. [18:20:36] * 1 June 2014 (1.24wmf7) Finalise upgrade. [18:20:45] Reedy: I only merged Migrate; siebrand merged jQuery 1.11. :-) [18:20:52] jquery goes to 11! [18:21:11] (03PS1) 10Matanya: dns recurses: add firewll [operations/puppet] - 10https://gerrit.wikimedia.org/r/133515 [18:21:31] i think lydia sent that around to our team [18:21:32] I wonder if https://gerrit.wikimedia.org/r/#/c/133235/ should've been merged into core... [18:22:33] where does the logging happen? [18:22:46] anytime migrate gets used? [18:22:51] to use js console I think [18:23:12] ok [18:23:31] I made a patch to update one the education.wikimedia.org a while ago. Could someone check it? https://gerrit.wikimedia.org/r/#/c/122866/ [18:23:40] (03CR) 10Matanya: [C: 04-1] "depends on https://gerrit.wikimedia.org/r/#/c/133513/ being merged." [operations/puppet] - 10https://gerrit.wikimedia.org/r/133515 (owner: 10Matanya) [18:27:18] (03PS1) 10Dzahn: enable firewall on ytterbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/133517 [18:29:25] !log mw1053 is pingable but not ssh-able [18:29:29] Logged the message, Master [18:31:03] !log mw1053 sits at disk partitioning dialog (via mgmt) [18:31:07] Logged the message, Master [18:31:39] (03PS10) 10Alexandros Kosiaris: bacula: allow mysqldumps to be kept locally [operations/puppet] - 10https://gerrit.wikimedia.org/r/132214 [18:31:59] mutante: https://rt.wikimedia.org/Ticket/Display.html?id=7408 [18:32:14] ah sometimes I hate git-review... previous patchset got overriden [18:32:28] !log mw1053 was already disabled in pybal though and RT 7408,7435 [18:32:32] greg-g: thanks, just saw [18:32:34] Logged the message, Master [18:32:47] if it's disabled in pybal, it should be removed from dsh [18:32:50] Reedy: ^ it's disabled and hardware [18:32:52] though I repeat myself [18:33:19] very much agree [18:33:32] alternately, scap should get the list of hosts from pybal [18:33:49] heh [18:34:02] Is there an API for that? Asking pybal what hosts are pooled? [18:34:11] that would not be wise [18:34:25] the config is on noc [18:34:25] https://noc.wikimedia.org/pybal/eqiad/api [18:34:26] :D [18:34:31] lol yea [18:34:42] removes it from dsh [18:34:45] since pybal's config isn't in vcs? #troll [18:34:52] nooope [18:35:38] I actually agree with it not being in vcs, but it seems like there should be an internal service API to get the settings. [18:36:40] I especially like that the config on noc is *almost* json [18:36:49] getting the settings is one thing, acting on them is another [18:37:16] the current plan is "they should stay manual, but we can put them in a local git repo" [18:37:54] * bd808 wants to be able to de-pool/pool servers from the deployment tool chain someday [18:38:12] that^^ * 1000 [18:38:12] api = wget | grep ? [18:38:22] akosiaris: what do you mean? [18:38:34] as reedy said, it's already on http.. fetching it gets you the config [18:39:15] (03PS1) 10Dzahn: remove mw1058 from dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/133521 [18:39:20] imagine a server there being set to False (because it was just done by an ops person), then have scap get that, skip the host since it is False and then the ops person shows up and sets it to True [18:39:29] greg-g: ^ [18:39:38] get the race condition ? [18:39:42] akosiaris: we already have that problem [18:39:50] we recently ran a scap just for that [18:39:59] one host was out of sync because of something similar [18:40:06] Yeah that's lame but not new [18:40:16] so we are not solving anything with that approach then [18:40:18] Point being: this should be automatable. [18:40:29] so let's figure out a real solution then :) [18:40:35] akosiaris: what is the approach? [18:40:46] sure, I am up for that! [18:40:46] There is some code to run sync-common when puppet runs so ... eventually consitent [18:41:02] if we disable it in one place, we need to disable it in the other place as well [18:41:16] mutante: the scenario i just sketched [18:41:25] mutante: if only there were one canonical place that other places pulled state from, I think is one option? [18:42:17] IMHO the problem is with scap relying on dsh anyway [18:42:31] Agreed [18:42:31] it was just a matter of time until that line above :) [18:42:45] But will salt make it any better? [18:42:57] I never spoke the word salt ;-) [18:43:12] The deploy tool still needs to know which servers should be active [18:43:22] warning, general deployment discussion, possible recursion detected [18:43:33] (03CR) 10Ori.livneh: [C: 032] twemproxy: allow /etc/default/twemproxy to be passed in [operations/puppet] - 10https://gerrit.wikimedia.org/r/133512 (owner: 10Ori.livneh) [18:44:01] Technically scap doesn't use dsh anymore, but it still uses ssh and the /etc/dsh/groups files to find out which hosts it should manage. [18:44:02] well how about the servers knowing in which version of the software they should be at and trying to update themselves at that version ? [18:44:22] (03CR) 10Dzahn: [C: 032] "this server is broken, that's why scap should not try to sync to it. (in salt you would also have to mark it broken somehow)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133521 (owner: 10Dzahn) [18:44:25] bd808: ah!! nice, good to know :-) [18:44:37] how do you mark servers with broken disks as broken in salt? [18:44:39] mutante: :P [18:44:43] will there be a grain "broken" [18:44:56] (03Abandoned) 10Ori.livneh: twemproxy: generic::upstart_job() now uses boolean values [operations/puppet] - 10https://gerrit.wikimedia.org/r/118718 (owner: 10Hashar) [18:45:17] (03Abandoned) 10Ori.livneh: Lint mediawiki::twemproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/121400 (owner: 10Hashar) [18:45:20] sync-* still use dsh but I'm planning on changing that "soon" [18:45:44] mutante: i puppet-merged dzahn: remove mw1058 from dsh groups (e171a5b) [18:45:48] didn't mean to, assume it's ok [18:45:53] ori: i was about.. to .. yes, thanks [18:45:58] same thing :) [18:45:59] !log reedy Finished scap: testwiki to 1.24wmf5 and build l10n cache (duration: 27m 47s) [18:46:21] Logged the message, Master [18:46:38] 14:42 < chasemp> why not generate the dsh lists dynamically at the time they are run? just curious [18:46:47] (did it for ya ;) ) [18:46:58] hey I was trying to lurk here :) [18:47:00] see old RT tickets [18:47:07] about generating dsh from puppet [18:47:12] monitoring if hosts are in dsh etc [18:47:16] ook. [18:47:18] mutante: which ones? I copy/pasted his question because it seemed legit to me [18:47:20] but if you want to replace dsh anyways [18:47:25] you might not want to get into them:) [18:47:47] (03CR) 10Ori.livneh: "Not required for this patch, but perhaps you could add to this Puppetization , which is" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132495 (owner: 10Rush) [18:47:51] https://rt.wikimedia.org/Ticket/Display.html?id=377 [18:48:23] I have done it before w/ how ryan suggests 1. Switch to using marionette collective [18:48:23] tries to find [18:48:37] that is one , yep, what chase said [18:48:40] honestly, it's been awkward for me to not have mcollective here [18:48:46] since it's the other half of puppet [18:48:50] Puppet is *terrible* at this kind of thing. It's as if he is in front of me saying it! [18:48:53] chasemp: +1 [18:48:57] in many many ways the salt integration is just so whacky and not there [18:49:06] chasemp: yes [18:49:08] * akosiaris quotes ryan [18:49:12] it's a known clusterfuck [18:49:18] salt does not replace mcollective discovery at all [18:49:25] stops taking part in the discussion at this point. cya [18:49:32] and discovery is really all i love it for [18:49:35] mutante: :) [18:49:38] mutante: sorry, i am not criticizing anyone [18:49:48] mutante: i think ryan's architecture is actually right [18:50:00] no worries, it's just about the repition [18:50:00] been on my mind, but I since I haven't the time to address it I wasn't going to drop this tidbit and run [18:50:03] (03CR) 10Reedy: [C: 032] Wikipedias to 1.24wmf4 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133466 (owner: 10Reedy) [18:50:14] but my position would be mcollective solves this problem well [18:50:23] mutante: ok :) sorry for the aggressive phrasing [18:50:52] we are constantly questioning what we use .. there is also a cost to that [18:50:59] * matanya has deja vu on this matter [18:51:02] true but if you are going to use puppet [18:51:10] (03Merged) 10jenkins-bot: Wikipedias to 1.24wmf4 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133466 (owner: 10Reedy) [18:51:11] not using mcollective is the more bizarre choice [18:51:17] since it's ...made to go together [18:51:33] (03PS1) 10Ori.livneh: use twemproxy module on mw106* [operations/puppet] - 10https://gerrit.wikimedia.org/r/133522 [18:51:45] s/made to go together/one hacked on top of the other/ :) [18:51:52] yes [18:51:58] chasemp: if you spend a couple of days digging into the stack and write up a recommendation about what we should do, i think it would be appreciated by all [18:52:10] mcollective has it's faults...it is after all ruby :D [18:52:34] oh let's not go into the language stuff as well, we are bikeshedding enough [18:52:35] plus1 to ori [18:52:40] ori: for my own selfish reasons I was going to try to make a case for mcollective if I thought it was there, but I have not had exposure to dsh usage or anything yet [18:52:46] so I have been keeping it in my back pocket [18:52:49] chasemp: no one likes dsh [18:52:55] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf4 [18:52:55] dsh is evil [18:52:59] Logged the message, Master [18:53:04] akosiaris: :) fair enough [18:53:20] the current status quo does not represent anyone's vision of an ideal architecture; it's mostly a case of conflicting requirements, half-completed migrations, and legacy stuff [18:53:22] on the other hand i could name any programming language, web ui or tool i wanted to , and somebody would say it's evil [18:53:26] honestly, it's more weird to use dsh and salt [18:53:30] than salt and mcollective :) [18:53:39] salt + mcollective sounds right [18:53:41] I used fabric before with mcollective together, and it was a good pair [18:54:00] fabric is fine too, but we went with salt for better or worse and we should stick to it for now [18:54:23] (03CR) 10Matanya: [C: 04-1] "until backup::client has a ferm rule. looks good otherwise" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133517 (owner: 10Dzahn) [18:54:24] chasemp: re fabric see https://rt.wikimedia.org/Ticket/Display.html?id=6766 [18:54:25] yes, agreed [18:54:40] (03CR) 10Ori.livneh: [C: 032] use twemproxy module on mw106* [operations/puppet] - 10https://gerrit.wikimedia.org/r/133522 (owner: 10Ori.livneh) [18:54:55] fabric has it's own issues to be sure [18:55:04] !log deploying twemproxy module on mw106*, they may complain for a moment [18:55:08] Logged the message, Master [18:55:13] chasemp: its! its! its! :D [18:55:21] I had a giant wrapper around paramiko and ended up just going w/ it as they had bugs which were not resolved upstream [18:55:37] ori: did it just to melt your brain :) [18:55:37] Hello Commons! Why don't you load for me. [18:56:00] twkozlowski: main page or ?? [18:56:01] aaaaanyways, I have no authoritative thoughts here as I'm so new. but I have used mcollective for this time of on demand host discovery for deployments [18:56:05] and it was good [18:56:05] I'm ignoring bots in this channel - are there any problems with bits? [18:56:08] Hello twkozlowski, it's because I am a bug-riddled wiki that packs more bugs than Windows ME [18:56:10] or as good as it can be [18:56:14] 2c [18:56:16] twkozlowski: none that I know of [18:56:43] fortunately, Wikimedia wikis look beatiful without CSS, so that's not a big issue [18:56:48] beautiful* [18:57:05] anyone else in europe experiencing bits not-there-ness? [18:57:13] just slow [18:58:04] mw1105 is the server I just got served by without any CSS [18:58:25] PROBLEM - Apache HTTP on mw1149 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:58:43] ... [18:58:43] bits in esams is reported in icinga! [18:58:50] 503 [18:58:51] Failed to load resource: the server responded with a status of 503 (Service Unavailable) https://bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&lang=en-gb…ort.style%7Cschema.NavigationTiming&skin=vector&version=20140515T185255Z&* [18:58:52] greg-g: 733 ms [18:58:54] eh, and came back [18:59:15] RECOVERY - Apache HTTP on mw1149 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 5.052 second response time [18:59:24] i didn't touch it [18:59:25] it's just IPv6 now [18:59:27] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=bits-lb.esams.wikimedia.org_ipv6&service=LVS+HTTPS+IPv6 [18:59:30] supposedly [18:59:31] mutante: :-) [19:00:03] icinga is reporting 3 bad apaches [19:00:23] twkozlowski: you are better than icinga, you report even before the error happens [19:00:48] 20:50 Steinsplitter: yeah. you'r too fast for beeing human O_O [19:01:03] just got lucky, I suppose [19:01:03] greg-g: already recovered? [19:01:24] most [19:01:48] both in scheduled downtime / hardware replacing [19:01:55] checks if enabled [19:02:01] (03PS1) 10Ori.livneh: twemproxy: invoke w/ --mbuf-size=65536 [operations/puppet] - 10https://gerrit.wikimedia.org/r/133524 [19:02:01] cmjohnson1: [19:02:31] greg-g: they aren't enabled [19:02:40] gotcha [19:03:30] twkozlowski: so, the bits check says it recovered [19:03:41] and it was via v6 [19:04:37] mutante: Thanks [19:04:50] (03CR) 10Ori.livneh: [C: 032] twemproxy: invoke w/ --mbuf-size=65536 [operations/puppet] - 10https://gerrit.wikimedia.org/r/133524 (owner: 10Ori.livneh) [19:05:11] mutante: mw1149? no idea [19:06:14] * greg-g lunches [19:07:04] cmjohnson1: mw1163 and mw1058 [19:07:30] 1163 is still being worked on....so far 3 bad system boards have been sent and at least 1 bad DIMM [19:07:35] (03CR) 10Reedy: [C: 032] group0 to 1.24wmf5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133467 (owner: 10Reedy) [19:07:43] (03Merged) 10jenkins-bot: group0 to 1.24wmf5 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133467 (owner: 10Reedy) [19:08:52] 1058...didn't know there was a problem [19:09:11] cmjohnson1: https://gerrit.wikimedia.org/r/#/c/133521/ [19:09:18] cmjohnson1: https://rt.wikimedia.org/Ticket/Display.html?id=7408 [19:09:19] (03PS1) 10Jgreen: move fundraising.wm.o back to aluminium until we can burn some outbound fw holes [operations/dns] - 10https://gerrit.wikimedia.org/r/133526 [19:09:29] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf5 [19:09:34] Logged the message, Master [19:09:47] wait, there is mw1053 vs. mw1058 ? arg [19:10:11] (03PS1) 10Ori.livneh: remove mediawiki::twemproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/133527 [19:10:27] ah..well yeah ok..the disk has been replaced I didn't reinstall yet [19:11:15] cmjohnson1: .. and i mixed up 1053 and 1058 ... [19:12:36] (03CR) 10Ori.livneh: [C: 032] remove mediawiki::twemproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/133527 (owner: 10Ori.livneh) [19:13:04] (03CR) 10Jgreen: [C: 032 V: 031] move fundraising.wm.o back to aluminium until we can burn some outbound fw holes [operations/dns] - 10https://gerrit.wikimedia.org/r/133526 (owner: 10Jgreen) [19:17:10] (03PS1) 10Reedy: WIP: Manage mediawiki-config/php symlink automatically [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133528 (https://bugzilla.wikimedia.org/64748) [19:17:42] (03CR) 10Reedy: [C: 04-1] WIP: Manage mediawiki-config/php symlink automatically [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133528 (https://bugzilla.wikimedia.org/64748) (owner: 10Reedy) [19:21:12] (03PS1) 10Ori.livneh: lint mediawiki module's init.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/133529 [19:22:45] (03CR) 10Ori.livneh: [C: 032] lint mediawiki module's init.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/133529 (owner: 10Ori.livneh) [19:23:06] greg-g: What day are you sending your weekly deployment schedule e-mail? Fridays? [19:23:14] What day... do you send* [19:23:46] Do we allow wikis to assign the 'mergehistory' user right? I know it is not commonly used on Wikimedia but I'm not sure if this is just because communities don't want it or a tech reason behind it. [19:25:25] PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2062: active_shards: 6185: relocating_shards: 2: initializing_shards: 1: unassigned_shards: 0 [19:25:26] PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2062: active_shards: 6185: relocating_shards: 2: initializing_shards: 1: unassigned_shards: 0 [19:25:26] PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2062: active_shards: 6185: relocating_shards: 2: initializing_shards: 1: unassigned_shards: 0 [19:25:26] PROBLEM - ElasticSearch health check on elastic1003 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2062: active_shards: 6185: relocating_shards: 2: initializing_shards: 1: unassigned_shards: 0 [19:26:25] RECOVERY - ElasticSearch health check on elastic1004 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2063: active_shards: 6186: relocating_shards: 1: initializing_shards: 0: unassigned_shards: 0 [19:26:25] RECOVERY - ElasticSearch health check on elastic1011 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2063: active_shards: 6186: relocating_shards: 1: initializing_shards: 0: unassigned_shards: 0 [19:26:26] RECOVERY - ElasticSearch health check on elastic1003 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2063: active_shards: 6186: relocating_shards: 1: initializing_shards: 0: unassigned_shards: 0 [19:26:26] RECOVERY - ElasticSearch health check on elastic1014 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2063: active_shards: 6186: relocating_shards: 1: initializing_shards: 0: unassigned_shards: 0 [19:27:50] <^d> What's your problem elastic? [19:28:05] <^d> reallocating. fun times. [19:28:36] it knows nik is talking about it [19:28:41] ^d: Joking-wise; it's probably lose it's ability to stretch :p [19:28:46] *lost [19:29:00] * ^d enables the be_more_elastic setting [19:29:28] If only that worked everywhere. [19:29:51] MediaWiki not displaying images? Set the $wgBeMoreMedia setting to true :p [19:30:13] $wgMuchImageSuchMediaWow = true; [19:30:14] (03PS1) 10Dzahn: dsh groups: remove mw1053, re-add mw1058 [operations/puppet] - 10https://gerrit.wikimedia.org/r/133531 [19:30:27] <^d> Things too slow? Turn on $wgFasterMediaWiki. [19:30:37] ori: eh, so you took mw1053 on purpose because of the hardware fix? [19:31:04] for https://gerrit.wikimedia.org/r/#/c/133414/ [19:31:04] (03PS1) 10Rush: diamond for dataset and ^lvs1* [operations/puppet] - 10https://gerrit.wikimedia.org/r/133532 [19:31:28] and thanks for cleaning up the other twemproxy changes, just saw [19:31:44] Don't want MediaWiki to be publically edited? Enable $wgNotAMediaWiki [19:33:25] (03PS2) 10Rush: diamond for dataset and ^lvs1* [operations/puppet] - 10https://gerrit.wikimedia.org/r/133532 [19:33:43] (03CR) 10Rush: [C: 032] "resource wise these seem gtg. I am going to baby sit this today." [operations/puppet] - 10https://gerrit.wikimedia.org/r/133532 (owner: 10Rush) [19:33:45] (03CR) 10Dzahn: [C: 032] "mixed them up in Ifa46bafcd2" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133531 (owner: 10Dzahn) [19:35:30] (03PS2) 10Reedy: Manage php symlink automatically [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133528 (https://bugzilla.wikimedia.org/64748) [19:35:57] (03CR) 10Mattflaschen: [C: 04-1] "Please add test and test2 as true (convention is, those go at the top). Otherwise, looks fine." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133442 (owner: 10Phuedx) [19:36:12] (03CR) 10Dzahn: "pybal: {'host': 'mw1053.eqiad.wmnet', 'weight': 10, 'enabled': False } , {'host': 'mw1058.eqiad.wmnet', 'weight': 10, 'enabled': True } pl" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133531 (owner: 10Dzahn) [19:36:35] (03PS1) 10John F. Lewis: Change AbuseFilter rights on cbkwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133533 (https://bugzilla.wikimedia.org/65346) [19:36:54] urg [19:37:51] (03PS2) 10John F. Lewis: Change AbuseFilter rights on cbkwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133533 (https://bugzilla.wikimedia.org/65346) [19:38:15] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [19:40:00] Hm. Gerrit not notifying Bugzilla of patches? [19:40:42] JohnLewis: do you know since when? [19:41:21] mutante: https://gerrit.wikimedia.org/r/#/c/133364/ was uploaded yesterday and the Bugzilla bug was not notified. That's as far as I know. [19:42:09] I just patched a bug and that wasn't updated so it's still an on-going issue for reference [19:48:45] JohnLewis: trying to find out more currently.. [19:49:11] i hear qchris made the bugzilla hooks in gerrit [19:49:15] for this feature [19:49:27] http://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines#Auto-linking_and_cross-referencing [19:49:39] "A bot will automatically notify Bugzilla ... " [19:51:15] (03PS3) 10John F. Lewis: Change AbuseFilter rights on cbkwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133533 (https://bugzilla.wikimedia.org/65346) [19:51:31] ^d: ideas why that could have changed recently? [19:51:43] <^d> godog restarting gerrit? [19:52:45] i hear it broke within the last 9 hours [19:52:47] ah [19:53:01] so the plugin in Gerrit is https://gerrit-review.googlesource.com/#/admin/projects/plugins/its-bugzilla [19:53:10] and qchris set it up once upon a time [19:54:08] very last activity in Bugzilla was Gerrit Notification Bot (gerritadmin@wikimedia.org) 2014-05-15 11:41:52 UTC in https://bugzilla.wikimedia.org/show_bug.cgi?id=65225#c3 [19:56:40] twkozlowski: yeah, friday [19:56:57] (03PS1) 10BBlack: Use whole subnets in squid.php list for XFF acceptance [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133535 [20:01:04] the bugzilla plugin is listed as enabled in gerrit [20:01:24] hooks-bugzilla2.8.1 Enabled .. hrmm [20:01:46] https://gerrit.wikimedia.org/r/plugins/hooks-bugzilla/Documentation/index.html [20:02:14] https://gerrit.wikimedia.org/r/plugins/hooks-bugzilla/Documentation/config.html even [20:02:39] "Bugzilla credentials and connectivity details are asked and verified during the Gerrit init." [20:02:42] that? [20:04:18] that. :) [20:04:39] no changes about the admin password, right [20:04:45] on the bz side [20:04:52] at least not by me. let me check the log [20:04:55] not sure yet where the config is on ytterbium.. [20:05:19] oh well, if the problem started ~9h ago I cannot check the log yet, because that script will run again in 3h :) [20:05:26] audit_log [20:05:32] godog: hi? [20:05:49] Reedy: Did you get the media viewer config patch? [20:05:50] godog: did you see anything like the dialog above when restarting gerrit? [20:06:04] where it asks about bugzilla connectivity [20:06:58] (03CR) 10BryanDavis: [C: 031] "I haven't tested it, but the code change looks like what I was thinking about when I wrote the bug." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133528 (https://bugzilla.wikimedia.org/64748) (owner: 10Reedy) [20:07:57] marktraceur: crap, no [20:08:17] (03PS2) 10Reedy: FUTURE: Seventh batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132968 (owner: 10Gergő Tisza) [20:09:01] Well shit [20:09:28] Reedy: Do you want to, should I, or what? [20:10:12] andre__: so.. i have the password it's trying to use [20:10:56] (03CR) 10Reedy: [C: 032] FUTURE: Seventh batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132968 (owner: 10Gergő Tisza) [20:11:02] marktraceur: he's on it [20:11:02] (03Merged) 10jenkins-bot: FUTURE: Seventh batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132968 (owner: 10Gergő Tisza) [20:11:05] he was just rebasing [20:11:14] mutante: I can't get passwords of Bugzilla users displayed in the UI, in case you're after comparing or such [20:11:28] andre__: i'm after the user name currently :p [20:11:50] mutante, gerritadmin@wikimedia.org [20:11:54] OK [20:11:58] andre__: thanks [20:12:00] mutante: user id = email address [20:12:12] greg-g: Psh communication [20:12:14] bblack: You should hit the "Cherry Pick To" button on your IPSet patch and create backports to 1.24wmf4 and 1.24wmf5. Then get them added to a SWAT deploy today or Monday. [20:12:34] <^d> fwiw, gerrit is connecting. [20:12:36] <^d> [2014-05-15 12:57:28,709] INFO com.googlesource.gerrit.plugins.hooks.bz.BugzillaModule : Bugzilla is configured as ITS [20:12:36] <^d> [2014-05-15 12:57:30,106] INFO com.googlesource.gerrit.plugins.hooks.bz.BugzillaItsFacade : Connected to https://bugzilla.wikimedia.org as username=gerritadmin@wikimedia.org, userid=16761 [20:12:36] <^d> [2014-05-15 12:57:30,646] INFO com.googlesource.gerrit.plugins.hooks.bz.BugzillaItsFacade : Connected to Bugzilla at https://bugzilla.wikimedia.org/xmlrpc.cgi, reported version is 4.4.4 [20:13:14] <^d> Must log in. [20:13:16] <^d> Silly. [20:13:21] <^d> Why didn't it stay logged in? [20:13:24] "You tried to log in using the gerritadmin@wikimedia.org account, but Bugzilla is unable to trust your request. Make sure your web browser accepts cookies and that you haven't been redirected here from an external web site. Click here if you really want to log in. " [20:13:29] uh [20:13:37] well, that doesn't mean bad:) [20:13:40] where's that from? [20:13:55] it tried to login using my browser [20:14:01] as gerrit [20:14:49] s/it/i [20:15:35] <^d> When gerrit started up it says it logged in. [20:15:40] <^d> Then it started dying and saying it wasn't. [20:15:43] <^d> Hm. [20:15:48] * ^d slaps gerrit with a 2x4 [20:16:01] mutante, the account is not disabled in Bugzilla, so it should be possible to log in with the right password [20:16:08] (just checked the account settings again) [20:16:08] !log reedy synchronized wmf-config/InitialiseSettings.php 'I5da4aa5db7b5d3c1843a6fd68d0a7c62a2bbfb4e' [20:16:12] marktraceur: ^^ [20:16:13] Logged the message, Master [20:17:07] Grazie [20:17:08] andre__: ^d the password in the config file works for me [20:17:15] i can login with that [20:17:19] <^d> Yeah, that hasn't changed. [20:17:21] the message above was a cookie thing [20:17:22] Looks good to me... [20:17:26] (03CR) 10Calak: [C: 031] Change AbuseFilter rights on cbkwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133533 (https://bugzilla.wikimedia.org/65346) (owner: 10John F. Lewis) [20:17:28] i really did not accept the cookie [20:17:34] Reedy: Confirmed [20:17:54] Sweet [20:18:02] <^d> I wonder how j2bugzilla stores its cookie. [20:18:07] <^d> Memory? [20:18:33] yeah, good question [20:19:40] JohnLewis: did you follow all that ?:p [20:20:08] <^d> Hmm, I reloaded the plugin and cleared the error log. [20:20:08] mutante: Messing around with a nagios installation so no :p [20:20:12] <^d> Once again, Connected to https://bugzilla.wikimedia.org as username=gerritadmin@wikimedia.org, userid=16761 [20:20:16] <^d> Wonder if it'll last. [20:20:25] JohnLewis: since you asked about the gerrit/bz notifications ,, see backlog:) [20:20:39] I will shortlu [20:20:41] *shortly [20:22:09] !imaginary_bot_trigger_that_dumps_backlog_into_an_actual_ticket [20:22:49] bd808|LUNCH: thanks, will do after I get back from late lunch [20:25:51] (03PS2) 10Dzahn: dns recurses: add firewall [operations/puppet] - 10https://gerrit.wikimedia.org/r/133515 (owner: 10Matanya) [20:29:08] (03PS1) 10Odder: Enable FlaggedRevs on more namespaces on dewiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133540 (https://bugzilla.wikimedia.org/65316) [20:29:57] Weird caching aside, looks like we're good. [20:30:10] mutante: hey, mmhh no I didn't see the dialog, restarted via /etc/init.d/gerrit [20:32:14] (03PS1) 10Odder: Correct a domain in wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133570 (https://bugzilla.wikimedia.org/65344) [20:32:47] godog: ok, was just wondering becaues it says it runs as part for gerrit init, there was that plugin that seemed to misbehave since around the restart [20:36:02] for the records, the quick'n'dirty check if it's running again would be any patch created in Gerrit and correctly refering to a Bugzilla ID, which would create a result shown for https://bugzilla.wikimedia.org/buglist.cgi?chfield=bug_status&chfieldfrom=-1h&chfieldto=Now&chfieldvalue=PATCH_TO_REVIEW&email1=gerritadmin%40wikimedia.org&emaillongdesc1=1&emailtype1=exact&query_format=advanced [20:36:15] (03CR) 10Dzahn: "duplicate of I58090c529 - pick one, abandon other" [operations/dns] - 10https://gerrit.wikimedia.org/r/131915 (owner: 10Dzahn) [20:37:07] andre__: is there some generic "test ticket" or so [20:37:25] mutante, not really, but feel free to create one under Wikimedia>Bugzilla :) [20:41:08] mutante: hah, yeah I was reading the backlog now, curious [20:41:35] (03PS1) 10Dzahn: test gerrit->bz notification [operations/puppet] - 10https://gerrit.wikimedia.org/r/133589 [20:41:51] https://bugzilla.wikimedia.org/show_bug.cgi?id=65370 [20:42:07] https://gerrit.wikimedia.org/r/#/c/133589/ [20:42:54] hmmm.not yet [20:43:14] (03CR) 10Dzahn: [C: 031] test gerrit->bz notification [operations/puppet] - 10https://gerrit.wikimedia.org/r/133589 (owner: 10Dzahn) [20:44:08] (03PS2) 10Aklapper: test gerrit->bz notification [operations/puppet] - 10https://gerrit.wikimedia.org/r/133589 (https://bugzilla.wikimedia.org/65370) (owner: 10Dzahn) [20:44:13] mutante, s/Bug 65370/Bug: 65370 [20:44:19] cf. https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines [20:45:15] andre__: arg, yea, and i see you already updated. but it did make it a working link just as well [20:45:29] (03PS1) 10Rush: amsvls diamond deployment [operations/puppet] - 10https://gerrit.wikimedia.org/r/133590 [20:48:58] (03PS2) 10Rush: amslvs diamond deployment [operations/puppet] - 10https://gerrit.wikimedia.org/r/133590 [20:49:15] (03PS3) 10Rush: amslvs diamond deployment [operations/puppet] - 10https://gerrit.wikimedia.org/r/133590 [20:49:24] uhm , yea, that wasn't it though [20:50:00] (03CR) 10Rush: [C: 032 V: 032] "amslvs load seems fine, babysitting this" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133590 (owner: 10Rush) [20:50:36] ^d: did you know about openjdk being upgraded btw? [20:50:44] <^d> No. [20:51:01] @ytterbium:/var/log# grep jdk /var/log/apt/history.log [20:51:14] there was an upgrade today [20:52:22] there were some puppet issues elsewhere to [20:52:38] hmmm. [20:53:57] ^d: where did you see it connecting? [20:54:20] <^d> Gerrit's log. [20:54:32] <^d> /var/lib/gerrit2/review_site/log/error_log [20:54:53] (03PS1) 10Ori.livneh: Provision apache user via Puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/133594 [20:55:01] ah:) looked for logs in /var/log [20:55:51] [2014-05-15 20:19:41,566] INFO com.google.gerrit.server.plugins.PluginLoader : Unloading plugin hooks-bugzilla [20:56:08] Caused by: com.j2bugzilla.base.BugzillaException: An unknown error was encountered; fault code: 410 [20:56:17] Caused by: org.apache.xmlrpc.XmlRpcException: You must log in before using this part of Bugzilla. [20:57:03] can you let it login one more time? [20:57:48] (03CR) 10Ori.livneh: [C: 032] Provision apache user via Puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/133594 (owner: 10Ori.livneh) [20:57:56] Bugzilla DID change some cookie/login related stuff between 4.4.1 and 4.4.4, but we upgraded to 4.4.4 more than one day ago [20:58:25] <^d> I wonder if it "worked" until the old cookie expired. [20:58:25] andre__: i suppose: a) not logged in shoult not be "unknown error" [20:59:31] i wonder if we can run that part of gerrit init [20:59:31] without all the others [21:02:20] (03PS1) 10Ori.livneh: Lint mediawiki::cgroup [operations/puppet] - 10https://gerrit.wikimedia.org/r/133597 [21:03:54] i pasted the error on 65370 and renamed it, let's continue on an actual ticket at this point (we should have earlier) [21:07:14] andre__: yea, that was even before the firewall changes [21:10:22] bd808: look correct? (cherrypicks + SWAY add for this evening?) never done this before :) [21:10:25] https://wikitech.wikimedia.org/wiki/Deployments#Week_of_May_12th [21:10:35] s/SWAY/SWAT/ :p [21:11:40] bblack: Looks right to me. [21:11:53] We'll turn you into a php dev yet :) [21:12:12] * bblack runs [21:12:25] (03CR) 10Quiddity: [C: 04-1] "Please also add "'Talk:Design'," under the other new item, per older request from Jared. Thanks." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133267 (owner: 10Spage) [21:19:29] (03PS1) 10Rush: misc diamond hosts who do not use standard [operations/puppet] - 10https://gerrit.wikimedia.org/r/133598 [21:19:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Last successful Puppet run was Tue May 13 21:03:26 2014 [21:21:49] (03PS2) 10Rush: misc diamond hosts who do not use standard [operations/puppet] - 10https://gerrit.wikimedia.org/r/133598 [21:23:00] (03CR) 10Rush: [C: 032] "filling in some low load random hosts." [operations/puppet] - 10https://gerrit.wikimedia.org/r/133598 (owner: 10Rush) [21:23:36] (03CR) 10Rush: [V: 032] misc diamond hosts who do not use standard [operations/puppet] - 10https://gerrit.wikimedia.org/r/133598 (owner: 10Rush) [21:26:32] (03CR) 10Ori.livneh: [C: 032] Lint mediawiki::cgroup [operations/puppet] - 10https://gerrit.wikimedia.org/r/133597 (owner: 10Ori.livneh) [21:26:46] (03PS1) 10Ori.livneh: tidy up applicationserver::packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/133599 [21:27:15] (03PS2) 10Ori.livneh: tidy up applicationserver::packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/133599 [21:35:35] (03CR) 10Ori.livneh: [C: 032] tidy up applicationserver::packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/133599 (owner: 10Ori.livneh) [21:41:53] (03CR) 10Calak: [C: 031] Correct a domain in wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133570 (https://bugzilla.wikimedia.org/65344) (owner: 10Odder) [21:55:26] (03PS2) 10Phuedx: Enable the anonymous signup invite experiment [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133442 [21:55:46] (03CR) 10Phuedx: "Done." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133442 (owner: 10Phuedx) [21:57:34] mwalker, RoanKattouw_away: I can do swat today [21:57:38] (03PS1) 10Chad: Swapping GeoData backend for enwikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133620 [21:58:17] <^d> MaxSem: ^ :) [21:58:35] wheee [21:59:00] (03CR) 10MaxSem: [C: 031] Swapping GeoData backend for enwikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133620 (owner: 10Chad) [21:59:33] <^d> Can we do it in today's swat? [21:59:36] <^d> Or should we grab a window? [22:00:23] I'd say right now but fear I won't survive it:) [22:00:47] <^d> gogogogo [22:01:24] (03CR) 10Chad: [C: 032] Swapping GeoData backend for enwikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133620 (owner: 10Chad) [22:01:24] (03Merged) 10jenkins-bot: Swapping GeoData backend for enwikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133620 (owner: 10Chad) [22:01:25] o_0 [22:02:06] die solr die die [22:03:41] ori, kk [22:04:22] !log demon synchronized wmf-config/InitialiseSettings.php 'GeoData using elasticsearch on enwikivoyage' [22:04:26] Logged the message, Master [22:05:31] <^d> MaxSem: I think it's working. [22:05:42] <^d> I'm getting coords from the api and no errors, at least. [22:07:03] affirmative [22:07:33] * MaxSem looks into Kibana [22:09:10] also, searches look ridiculously fast:P [22:09:16] <^d> Fast is good [22:09:32] the dataset isn't that large though [22:10:10] moar data [22:11:10] enwiki haz moar than you can digest [22:11:23] <^d> i can totally haz your data [22:12:25] <^d> Actually, my rebuild the other day might've picked up enwiki. [22:12:41] <^d> It has the mapping. [22:12:45] the autocomplete is so.. instant .. that? [22:13:29] <^d> huh? prefix searching isn't related. [22:13:35] <^d> to what we're talking about. [22:14:01] <^d> MaxSem: Give me a page_id that's got a bunch of coordinates on enwiki. I wanna test something. [22:14:25] ein minuten... [22:14:37] (03CR) 10Quiddity: "Also [[mw:Talk:Phabricator/Help]]" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133267 (owner: 10Spage) [22:17:31] ^d, Sortable_list_of_islands_of_Western_Australia [22:17:37] largest in mainspace [22:17:52] 1139 friggin points [22:19:18] <^d> page_id 39509681 [22:20:30] eh, page_id [22:20:31] sorry [22:23:06] <^d> Oh, it's not a leaf node so that won't work anyway. [22:26:27] !log ori synchronized php-1.24wmf5/extensions/EventLogging 'Update EventLogging to master for I89819bd943' [22:26:31] Logged the message, Master [22:28:32] !log ori synchronized php-1.24wmf4/extensions/EventLogging 'Update EventLogging to master for I89819bd943' [22:28:37] Logged the message, Master [22:45:59] (03CR) 10Ori.livneh: [C: 032] Change AbuseFilter rights on cbkwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133533 (https://bugzilla.wikimedia.org/65346) (owner: 10John F. Lewis) [22:46:00] (03Merged) 10jenkins-bot: Change AbuseFilter rights on cbkwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133533 (https://bugzilla.wikimedia.org/65346) (owner: 10John F. Lewis) [22:46:12] (03CR) 10Ori.livneh: [C: 032] Lots of rights changes for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133364 (https://bugzilla.wikimedia.org/65348) (owner: 10Reza) [22:46:21] (03Merged) 10jenkins-bot: Lots of rights changes for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133364 (https://bugzilla.wikimedia.org/65348) (owner: 10Reza) [22:46:51] (03CR) 10Ori.livneh: [C: 032] Correct a domain in wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133570 (https://bugzilla.wikimedia.org/65344) (owner: 10Odder) [22:47:27] (03CR) 10Ori.livneh: [C: 032] Enable FlaggedRevs on more namespaces on dewiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133540 (https://bugzilla.wikimedia.org/65316) (owner: 10Odder) [22:47:39] (03Merged) 10jenkins-bot: Correct a domain in wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133570 (https://bugzilla.wikimedia.org/65344) (owner: 10Odder) [22:47:42] (03Merged) 10jenkins-bot: Enable FlaggedRevs on more namespaces on dewiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133540 (https://bugzilla.wikimedia.org/65316) (owner: 10Odder) [22:48:03] \o/ [22:48:12] twkozlowski: FlaggedRevs = :( [22:48:22] but w/e [22:48:40] ori: whazzup with FlaggedRevs? [22:49:06] just my subjective dislike, nothing wrong with your change [22:49:12] i deploy, i get to moan :P [22:49:34] dewiktionary village pump is thata way, ori :-P [22:49:40] deployer's privilege [22:49:58] hm, the bot is still dead, I notice [22:50:36] !log ori updated /a/common to {{Gerrit|Ifae836de5}}: Swapping GeoData backend for enwikivoyage [22:50:40] Logged the message, Master [22:51:16] (03CR) 10Calak: "Thank you." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133364 (https://bugzilla.wikimedia.org/65348) (owner: 10Reza) [22:52:04] Something's not quite right about css on mw.o. Compare https://www.mediawiki.org/wiki/MediaWiki-Vagrant and https://www.mediawiki.org/wiki/User:BDavis_(WMF)/Notes/test_css [22:52:14] (03CR) 10Calak: "Thank you." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133533 (https://bugzilla.wikimedia.org/65346) (owner: 10John F. Lewis) [22:52:27] ?action=purge seems to clear things up [22:53:20] <^demon|away> ori: If it'll make you feel better I think we should remove FlaggedRevs from mw.org [22:53:33] more alarming are issues on test.wikidata (which i can sporadically reproduce on my dev wiki) [22:53:35] ^demon|away: +1 [22:53:43] aude: ? [22:53:48] https://test.wikidata.org/wiki/Q22 (toc is supposed to be horizontal) [22:53:51] First URL in that compare should have been https://www.mediawiki.org/wiki/User:BDavis_(WMF)/Notes/Thumb.php_with_Vagrant [22:53:55] turn of js and the layout is totally borked [22:53:57] <^demon|away> ori: jfdi? :p [22:54:26] bd808: thumb with vagrant? are you hitchhiking? :) [22:54:29] !log ori synchronized wmf-config 'I51a55c4e2, Ia6c01a913, I594848ce0, and I594848ce0' [22:54:34] Logged the message, Master [22:54:37] i restarted my memcached and fixed it locally [22:54:56] aude: Hmmm.. resource loader cache? [22:54:57] reproduced it again when i changed core version and now can't [22:55:03] yes [22:55:03] (03PS3) 10Dzahn: fix sync-apache for use in eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/130610 [22:55:25] ori: Thanks :) [22:55:29] Do we need a "touch all the resource loader things" from Reedy? [22:55:31] JohnLewis: np [22:55:50] Or ori I guess since he said he's swatting [22:55:59] (03PS4) 10Dzahn: fix sync-apache for use in eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/130610 [22:56:18] bd808: might work, but would like to understand what changed to break things [22:56:25] * bd808 nods [22:56:32] jquery version update? [22:56:39] yeah, that happened today [22:57:10] https://www.mediawiki.org/wiki/MediaWiki_1.24/wmf5/Changelog [22:57:20] Upgrade jQuery to 1.11.1 -- https://gerrit.wikimedia.org/r/#/c/133477/ [22:57:23] can we touch something in https://gerrit.wikimedia.org/r/#/c/126843/6 ? [22:57:34] fixes that issue, but not test wikidata [22:57:51] i can't imagine the problem is related ot jquery since it occurs w/o js [22:59:00] what's going to replace the rsyncd.conf on nfs1? [22:59:31] ah, i guess tin [22:59:58] * twkozlowski hugs ori [23:00:16] :) aw [23:00:43] the rsyncd config on tin is not puppetized it seems [23:00:55] /etc/rsync.d/frag-common [23:01:24] hmm, but it claims it is.. looking harder [23:01:34] (03CR) 10Ori.livneh: [C: 032] Elasticsearch plugin juggling [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133495 (owner: 10Manybubbles) [23:01:42] mutante: It's hidden... let me find it [23:02:09] mutante: misc::deployment::scap_primary [23:02:14] aude: what should i do? [23:02:33] talking to matmarex [23:02:41] (03Merged) 10jenkins-bot: Elasticsearch plugin juggling [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133495 (owner: 10Manybubbles) [23:02:42] e.g. https://www.mediawiki.org/wiki/Extension:Wikibase_Client#Other_projects_sidebar [23:02:46] has double styling or something [23:03:16] bd808: thanks i see (it's not technically just scap i'd have to say) [23:03:44] bd808: i want to add one more config .. hmmm [23:04:04] another rsyncd config that isn't for scap'ing [23:04:27] It should work to add multiple rsync::server::module defines to a host right? [23:04:54] in a another module? yea, true [23:05:05] !log ori synchronized wmf-config/CirrusSearch-production.php 'Iae07852b1: Elasticsearch plugin juggling' [23:05:09] Logged the message, Master [23:05:46] bd808: and it reminded me to als use the $::network::constants .. trying [23:05:55] Exception Driven Development :P [23:06:32] hoo: I was doing that way before it was cool :p [23:08:21] :d [23:12:18] (03PS1) 10Dzahn: add rsyncd config for apache config sync [operations/puppet] - 10https://gerrit.wikimedia.org/r/133633 [23:18:40] (03PS1) 10Dzahn: add /srv/httpdconf to replace /h/w/conf/httpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/133635 [23:19:32] !log ori synchronized php-1.24wmf5/includes 'Ia3b12fb9: Speed up CIDR matching from $wgSquidServersNoPurge' [23:19:36] Logged the message, Master [23:21:17] (03PS2) 10Dzahn: add rsyncd config for apache config sync [operations/puppet] - 10https://gerrit.wikimedia.org/r/133633 [23:23:14] (03PS1) 10Dzahn: retab misc/deployment.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/133636 [23:24:48] (03CR) 10Dzahn: [C: 032] "just adds an empty directory so far" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133635 (owner: 10Dzahn) [23:26:10] (03PS2) 10BBlack: Use whole subnets in squid.php list for XFF acceptance [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133535 [23:35:55] aude, still need touching? [23:36:40] no idea [23:37:13] (03CR) 10BryanDavis: [C: 031] retab misc/deployment.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/133636 (owner: 10Dzahn) [23:38:35] !log ori synchronized php-1.24wmf4/includes 'Ia3b12fb9: Speed up CIDR matching from $wgSquidServersNoPurge' [23:38:40] ^ bblack [23:38:40] Logged the message, Master [23:38:46] (03CR) 10Dzahn: "thanks, matanya will add that we should also do other lint fixes, which i don't disagree with either:)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133636 (owner: 10Dzahn) [23:39:34] staring at graphs and spamming refresh :) [23:39:46] that's my job description [23:39:54] also grepping [23:40:27] MaxSem: css is still messed up for https://www.mediawiki.org/wiki/User:BDavis_(WMF)/Notes/Thumb.php_with_Vagrant (and I'm sure many other pages). action=purge seems to fix but we don't know if the problem is bits or that something else changed. [23:40:27] last time around it was the API servers that were hit hardest [23:40:41] so mostly this should be the canary: http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=API+application+servers+eqiad&m=cpu_report&s=by+name&mc=2&g=cpu_report [23:41:14] (03PS2) 10Withoutaname: Reduce string URLs to defined constant [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/131914 (https://bugzilla.wikimedia.org/48618) [23:41:33] bd808: on that page the styles are missing for me too. i don't see any geshi modules loaded when i run mw.loader.inspect() in the console [23:41:48] Did we put the config change back to use cidrs? Or are we checking baseline first? [23:41:59] bd808: no config change yet, but it's waiting in the wings [23:42:19] ori: Yeah. So does that mean that all pages using geshi need a purge or null edit to fix them now? [23:42:23] bd808, meh - cached HTML [23:42:29] Because that seems yucky [23:42:49] bd808: (as noted in the source, with the current non-cidr list IPSet is actually slower than the old solution, but it should be negligible in the big picture, at least that's the theory) [23:43:32] ewwwwwwww ext.geshi.language.$lang [23:43:57] with all the languages supported by geshi, it's gonna bloat the startup module [23:43:59] bblack: I've tested that very theory before, but not as thoroughly as you have recently :) [23:45:34] the next theory on the chopping block is that if CIDR + IP::isInRange() made CPU graphs jump by 40% on the API servers worst-case, and IPSet benchmarks a little better than 100x faster than IP::isInRange(), we shouldn't really notice that ~0.4% graph bump when the config change goes out. [23:47:27] (03PS1) 10Tim Starling: Give roots access to osmium [operations/puppet] - 10https://gerrit.wikimedia.org/r/133641 [23:48:37] (03CR) 10BryanDavis: [C: 031] add rsyncd config for apache config sync [operations/puppet] - 10https://gerrit.wikimedia.org/r/133633 (owner: 10Dzahn) [23:52:46] (03CR) 10Dzahn: [C: 032] Give roots access to osmium [operations/puppet] - 10https://gerrit.wikimedia.org/r/133641 (owner: 10Tim Starling) [23:54:19] (03CR) 10Dzahn: "maybe we could even say including roots belongs into base" [operations/puppet] - 10https://gerrit.wikimedia.org/r/133641 (owner: 10Tim Starling) [23:54:52] (03CR) 10Ori.livneh: [C: 032] Use whole subnets in squid.php list for XFF acceptance [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133535 (owner: 10BBlack) [23:55:00] (03Merged) 10jenkins-bot: Use whole subnets in squid.php list for XFF acceptance [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/133535 (owner: 10BBlack) [23:57:05] !log ori updated /a/common to {{Gerrit|Id188979c1}}: Use whole subnets in squid.php list for XFF acceptance [23:57:10] Logged the message, Master [23:58:03] (03PS1) 10Dzahn: fix quoting, arrows etc in deployment.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/133644