[07:29:19] 10serviceops, 10Operations, 10Core Platform Team (Needs Cleaning - Services Operations): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10elukey) [07:41:50] 10serviceops, 10RESTBase, 10CPT Initiatives (RESTBase Split (CDP2)), 10Epic, 10User-mobrovac: Split RESTBase in two services: storage service and API router/proxy - https://phabricator.wikimedia.org/T220449 (10mobrovac) [07:44:06] 10serviceops, 10Operations, 10Release Pipeline, 10CPT Initiatives (RESTBase Split (CDP2)), and 4 others: Deploy the RESTBase front-end service (RESTRouter) to Kubernetes - https://phabricator.wikimedia.org/T223953 (10mobrovac) 05Resolved→03Open Reopening as there are two more things we have to do befor... [07:44:10] 10serviceops, 10Operations, 10Release Pipeline, 10Goal, 10Release-Engineering-Team (Pipeline): Self-service Deployment Pipeline - https://phabricator.wikimedia.org/T228676 (10mobrovac) [07:44:14] 10serviceops, 10RESTBase, 10CPT Initiatives (RESTBase Split (CDP2)), 10Epic, 10User-mobrovac: Split RESTBase in two services: storage service and API router/proxy - https://phabricator.wikimedia.org/T220449 (10mobrovac) [09:04:10] o/ [09:06:06] hello! [09:13:49] hey hey [09:29:08] <_joe_> fsero: heya [09:29:25] <_joe_> I was thinking of you this morning [09:29:37] what i left wrong? [09:29:39] :P [09:29:47] <_joe_> I just got an email from envoy-announce [09:29:50] <_joe_> :P [09:29:58] hmm what did they say? [09:30:08] <_joe_> 1.11.2 coming out soon [09:31:33] <_joe_> anyways, we're finally all on php7 [09:31:43] <_joe_> next step is kubernetes! [09:32:21] \o/!! [09:32:22] congrats [09:33:06] remember that 1.16 is released and deprecates extensions/v1beta1 which means patching all deployments, daemonsets etc :P [09:33:20] 80% of helm charts still uses that [09:33:25] <_joe_> grrr I know [09:33:32] <_joe_> I was swearing at that this morning [09:34:02] <_joe_> in the context of me trying to convince the WMF not to do the same fuckup of having the stability status in the URL [09:37:14] haha well it has some benefits as well [09:37:19] users get used to change [09:37:44] and tons of new certifications come around [09:37:48] did you ever heard of CKA? [09:38:02] how can that be sustainable if things doesnt change frequently? [09:40:00] <_joe_> well [09:40:14] <_joe_> we don't offer "mediawiki api user" certifications [09:47:32] you should! in any case i guess the average mediawiki api user expects stability [09:54:41] 10serviceops, 10Operations, 10observability: Errors managed by wmf-errors (like OOMs) lack normalized_message on logstash - https://phabricator.wikimedia.org/T233828 (10Joe) [09:58:14] hola fsero ! [10:04:23] o/ [10:14:06] 10serviceops, 10Operations, 10Release Pipeline, 10CPT Initiatives (RESTBase Split (CDP2)), and 4 others: Deploy the RESTBase front-end service (RESTRouter) to Kubernetes - https://phabricator.wikimedia.org/T223953 (10akosiaris) > * set up the rate-limiting DHT inside k8s for RESTRouter (this is currently d... [12:26:30] 10serviceops, 10Operations, 10PHP 7.2 support, 10Performance-Team (Radar), and 2 others: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262) - https://phabricator.wikimedia.org/T224491 (10Joe) 05Open→03Resolved I will close this bug as resolved. We've not seen a recurrence in a long time,... [13:21:49] 10serviceops, 10Operations, 10PHP 7.2 support: Mysterious, coordinated slowdowns every ~ 25 minutes on mw1347,mw1348 (php7 api servers) - https://phabricator.wikimedia.org/T231011 (10jijiki) Looks like this is back since yesterday: {F30473970} {F30473972} Logs show many requests that take a long time to fi... [14:58:33] no ma rk today, according to the calendar invite responses, for today's subteam discussion [15:04:28] <_joe_> yeah [15:30:25] * _joe_ running a min late [15:42:24] WELP another hangout success [15:42:25] effie: wanna recap what you were saying? [15:42:36] <_joe_> yes, please [15:42:42] sure [15:42:53] <_joe_> also if subbu is around we can invite him [15:43:00] lets check [15:43:16] semi here [15:43:28] we need a chanop [15:43:34] to irc invite him [15:43:45] I will msg him [15:44:16] so the questions were: is the hardware we are ordering dedicated to parsoid or for general use [15:44:41] yes yes [15:44:49] so we found that after refreshing [15:44:55] (but already we do) [15:45:02] have more machines on codfw [15:45:05] than eqiad [15:45:19] so joe and I proposed we move some to eqiad [15:45:29] and even them out [15:45:41] ok, cool [15:45:42] move machines around? [15:45:43] those extra ones can be additional parsoid [15:45:44] servers [15:45:49] between the 2 DCs ? [15:45:55] are you sure this is a sane plan? [15:45:57] no, ship some of the new ones [15:46:01] to eqiad [15:46:08] ah instead of to codfw? [15:46:15] ok that makes sense [15:46:18] I thought you know by now that we are generally sane [15:46:22] regarding the cloud instance for parsoid/php. is done except restbase needs to access it publicly and for that some VCL changes are needed [15:46:28] effie: quite the inverse :P [15:46:34] ah, hi subbu [15:46:38] hello [15:46:41] i was just repeating the questions from yesterday [15:47:41] subbu: effie just said that we will have more machines in codfw than eqiad so their proposal is to move of them to eqiad to even them out [15:47:53] I pasted to subbu what we just said [15:47:55] about having alex to hire a moving truck [15:47:57] and move servers from codfw to eqiad [15:48:00] and then i mentioned the part about the cloud project. how restbase needs to acccess it publicly and the VCL changes [15:48:32] i am enjoying the humour :) [15:49:17] so the first WTP machines we use for this should have both, parsoid/JS and parsoid/PHP and you asked for one in each DC [15:49:44] do you think it is time we give a sane name to those servers? [15:49:47] sure, that seems reasonable .. wrt evening them out .. right now, with live traffic to eqiad, and reparse (changeprop) traffic to codfw, eqiad is ~1% load ... and codfw is < 20% loaded. [15:51:25] parsoid/php so far looks like might be faster, but in terms of cpu load ... it may be about the same or slightly higher since all the work that the api cluster used to do is now moved to the parsoid cluster .. but, based on what I am seeing on scandium, i don't think it is very noticeable. [15:51:31] <_joe_> effie: he's the second person in this team that thought I wanted to move servers physically around [15:51:43] :-) [15:51:56] so i do have a puppet change that applies the profile::mediawiki::php inside profile::parsoid if we set profile::parsoid::use_php on a machine. once i also include role::mediawiki::common and mediawiki::webserver they also need mcrouter certs [15:52:16] <_joe_> subbu: yeah I think for now we'll install parsoid-php along parsoid-js, but depending on how tests go, I might change the plan a bit [15:52:26] <_joe_> mutante: yep [15:52:45] <_joe_> mutante: for now set has_lvs to false probably [15:53:11] mutante url ? [15:53:58] _joe_: ok. so i picked a random wtp, wtp2001 to test it in the compiler, it fails now due to the missing cert. sounds like effie suggests to give them new names [15:54:10] effie: https://gerrit.wikimedia.org/r/c/operations/puppet/+/539181 i'm sure it's not done yet, that's why i didnt link to ticket yet [15:54:36] _joe_, ok. i'll defer to you all as to the best way of partitioning mirrored traffic between parsoid/php & parsoid/js ... all servers handle both js & php traffic OR partition the cluster .. mobrovac expressed a preference for the former, but you all decide what makes more sense ... this dual traffic scenario is hopefully not going to last more than a few weeks. [15:54:48] effie: you think they should have new server names? [15:55:12] <_joe_> subbu: I was thinking that to reduce latency, we could also have the parsoid/js cluster call the api on parsoid/php [15:55:21] mutante: yeah I thin wtp is confusing [15:55:26] at least as a newer person [15:55:41] <_joe_> effie: please not now, not for the existing servers I mean [15:56:04] <_joe_> the big advantage is that we could also see how much drain parsoid/js poses on the api [15:56:11] <_joe_> mi suspect is *a lot* [15:56:15] <_joe_> *my [15:56:27] _joe_, oh ... i don't know about that .. not sure what impact it will have on cpu load on these servers. in any case, codfw traffic is only reparse traffic and not latency-sensitive traffic. [15:56:43] <_joe_> right, also it's in codfw right now [15:56:57] <_joe_> anyhow, in a week we will start procurement [15:57:12] <_joe_> I was hoping it would sneak to this week, but that didn't happen [15:57:43] <_joe_> we will be able to start out even in the absence of the new servers if it needs be [15:58:40] ya .. i am a little anxious for it to happen sooner than later so we can have parsoid/php run in production for a few weeks before we go into thanksgiving break .. after thanksgiving, scheduling becomes quite hard with deploy freezes, holidays, end o fyear, etc. [15:59:10] so, "even in the absence of new servers" is something I am assuming as the default scenario and if we get new servers, that is a bonus. [15:59:31] but, ned of october is still ~5 weeks away. so, there is time. [15:59:33] *end [16:00:06] <_joe_> yes [16:00:43] <_joe_> ok everyone, If there is nothing else, I'd just go offline now [16:01:30] fine for me [16:01:49] ok, later i will create an mcrouter cert for 2 wtp servers and check what issue is next if any [16:01:59] getting breakfast for now then and on a train back from the city [16:03:02] bb [16:03:23] I was ones told where wtp came from [16:03:26] but I forgot [16:07:55] wiki-text processor [16:08:03] but i looked that up on https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions [16:08:06] mutante, i will probably need to update the wmf-config to enable rest api on these 2 servers once they are ready (just like I did for scandium). And, possibly another temporary patch to logstash puppet to redirect logs away from the production fatals. [16:08:17] ah 'wiki-text-processor' .. i didn't know that. [16:08:34] hahah, i thought you were the only one [16:09:09] subbu: ok, ack [16:15:56] subbu: LOL [16:15:58] hahahha [16:16:07] :) [16:16:23] i was not the one who named them ... i was just an IC back then. :) [16:16:50] you were an InterCity train? [16:17:02] something like that ... :) .. individual contributor. [16:17:08] :D [16:17:25] renaming is not going to be easy, for user [16:17:27] sure* [16:17:30] damn [17:20:29] 10serviceops, 10Operations, 10Wikimedia-Logstash, 10observability: Errors managed by wmf-errors (like OOMs) lack normalized_message on logstash - https://phabricator.wikimedia.org/T233828 (10herron) [17:58:21] 10serviceops, 10Operations, 10PHP 7.2 support: Mysterious, coordinated slowdowns every ~ 25 minutes on mw1347,mw1348 (php7 api servers) - https://phabricator.wikimedia.org/T231011 (10Ladsgroup) euwiki heavily uses wikidata, it might be related to wb_terms table. Does it correlate with spikes in https://grafa... [19:05:34] 10serviceops, 10Operations, 10Patch-For-Review: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10Dzahn) [19:58:22] 10serviceops, 10Operations, 10Patch-For-Review: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10ssastry) [20:02:27] 10serviceops, 10Operations: Set up LVS for parsoid/PHP - https://phabricator.wikimedia.org/T233722 (10Dzahn) [20:03:55] will parsoid/PHP on wtp need the "services_proxy" ? [20:07:32] 10serviceops, 10Operations, 10Patch-For-Review: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10ssastry) [20:12:05] 10serviceops, 10Operations, 10Patch-For-Review: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10Dzahn) https://gerrit.wikimedia.org/r/c/labs/private/+/539399/ [22:24:49] 10serviceops, 10Arc-Lamp, 10Performance-Team: Decom the ArcLamp pipeline for HHVM/Xenon - https://phabricator.wikimedia.org/T233884 (10Krinkle) [22:50:03] 10serviceops, 10Operations, 10Patch-For-Review: Make the parsoid cluster support parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10Dzahn) @Joe @jijiki This change compiles now. On wtp2001 it adds all these resources: https://puppet-compiler.wmflabs.org/compiler1002/18613/wtp2001.codfw.wmnet/ and...