[04:36:08] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 3 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Joe) >>! In T220235#5171351, @Ottomata wrote: > An example of environmental differences: service-runner uses statsd.... [05:02:16] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 3 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Joe) >>! In T220235#5174255, @Krinkle wrote: > The status quo is that services always run their code in beta before i... [08:48:44] <_joe_> thinking of a team shared dashboard; which projects should be there? [08:49:42] <_joe_> I would dare say puppet, conftool, operations/docker-images/*, service-checker [08:49:45] <_joe_> what else? [08:57:26] why not start with those? we can add on if something else comes up [09:01:25] it might be good to add debs/(kubernetes,calico,helm) but only if they don't clutter too much the dashboard [09:01:40] i don't think there is going to be much activity on those projects but some [09:04:37] you can put all of those in one section I think [09:06:15] I agree with apergos [09:06:22] let's start with the basic ones [09:21:36] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 3 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Krinkle) >>! In T220235#5175206, @Joe wrote: >>>! In T220235#5174255, @Krinkle wrote: >> The status quo is that servi... [09:26:38] <_joe_> ok [09:26:48] <_joe_> for puppet, we want patches from specific modules I guess [09:27:39] yes, or to exclude certainmodules [09:37:24] <_joe_> ok, we can start with something simple, and iterate [09:38:06] <_joe_> but first we need someone to create a gerrit group for serviceops [09:38:13] <_joe_> because you know, we're not admins anymore [09:38:25] <_joe_> apergos: can you request it? [09:38:46] we want it called serviceops? [09:38:47] sure [09:38:58] <_joe_> wmf-sre-serviceops I think given the nomenclature [09:39:05] <_joe_> but I'm open to whatever [09:39:18] <_joe_> wmf-pinkunicorns would be funnier tbh [09:39:35] boring is good [09:54:34] <_joe_> apergos: do you remember the syntax to search for changes in specific paths in puppet? [09:54:51] nope, I'd have to look it up :-D [09:55:05] <_joe_> ok nevermind I'll RTFM :D [09:55:07] _joe_: you can also rename the group when ever :) (it’s not a fixed name like repos) [09:55:25] ah ha gtk [09:56:12] <_joe_> path: heh [09:56:14] <_joe_> obvious [10:00:43] And I’ve learned that you can have emojis in groups now! [10:02:13] <_joe_> I can't match changes with path: [10:03:11] <_joe_> ah it needs to fully match... [10:22:47] emoji in the group name? [10:22:55] oh please add a unicorn at the end :-P :-P [10:22:58] joe made me do it [11:02:06] <_joe_> https://gerrit.wikimedia.org/r/#/c/wikimedia/+/509794/ <= please review [11:02:28] <_joe_> https://gerrit-review.googlesource.com/Documentation/user-dashboards.html#project-dashboards is the relevant docs [11:02:41] <_joe_> I'm sure it's missing a ton of things [11:03:38] <_joe_> now I really need to eat and then I have an interview [11:11:44] 10serviceops, 10Operations, 10service-runner, 10Services (later): Re-evaluate service-runner's (ab)use of statsd timing metric for nodejs GC stats - https://phabricator.wikimedia.org/T222795 (10Pchelolo) a:03holger.knust Thank you for an impressive level of details :) There's a bunch of other places whe... [11:18:35] mediawiki/our crs and mediawiki/incomingcrs look identical [11:48:02] _joe_: lgtm (so I +2’d) [11:48:29] <_joe_> heh there were a couple issues I think [11:49:17] I didn’t merge it :) [11:49:31] I only +2’d it as it look good syntax wise [11:55:28] <_joe_> sent a second PS [12:04:11] we'll miss role-related stuff, do we care? [12:06:38] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 3 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10hashar) On beta, we had Parsoid and some other services deployed via Jenkins whenever a change got merged. Then each... [12:26:19] 10serviceops, 10Operations, 10service-runner, 10Services (later): Re-evaluate service-runner's (ab)use of statsd timing metric for nodejs GC stats - https://phabricator.wikimedia.org/T222795 (10akosiaris) >>! In T222795#5176181, @Pchelolo wrote: > Thank you for an impressive level of details :) There's a... [12:33:53] 10serviceops, 10Operations, 10service-runner, 10Services (later): Re-evaluate service-runner's (ab)use of statsd timing metric for nodejs GC stats - https://phabricator.wikimedia.org/T222795 (10Pchelolo) > The caveat would be we would have to update all dashboards for all services residing on the same host... [13:27:57] 10serviceops, 10Beta-Cluster-Infrastructure, 10Editing-team, 10Release Pipeline, and 3 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10Ottomata) Could we use image version: latest in beta hiera? And somehow pull down the new latest and restart the ima... [13:48:13] ottomata: eventgate-analytics GC stats are now meaningful [13:48:21] I 've updated the dashboard already [13:49:37] awesooOOOOOOME@! [13:50:42] akosiaris: can we do some lvs stuff today? [13:52:42] e.g. https://phabricator.wikimedia.org/T222899 should be mostly ready to go [13:52:50] and won't affect anything [13:52:53] in prod [13:53:10] https://phabricator.wikimedia.org/T222962 is easier, but i have a Q at the end there about how discovery works when LVS changes [13:56:15] 10serviceops, 10Analytics, 10Analytics-Kanban, 10EventBus, 10Services (watching): Change LVS port for eventlogging-analytics from 31192 to 33192 - https://phabricator.wikimedia.org/T222962 (10akosiaris) >>! In T222962#5172806, @Ottomata wrote: > Hm, question. Currently mediawiki-config ProductionService... [13:56:51] ottomata: yeah, I answered there already, let's do indeed https://phabricator.wikimedia.org/T222899 first [13:57:08] k awesome [13:58:06] if you can afford some downtime you can also delete the old eventgate-analytics deployment [13:58:13] and deploy the new one in the same port as the old [13:58:32] i might be wrong but giving this process events, some downtime could be acceptable [13:58:42] fsero: we could afford it, but I thikn we'd rather not if we don't have to, which is why we went with the new port [13:58:55] there aren't yet any downstream consumers using the new events yet [13:59:00] but they are almost there [13:59:03] yeah i know but if moving it is too complex dont see the upsides [13:59:16] so, not sure I understood alex's comment [13:59:44] iiuc, eventgate-analytics.discovery.wmnet just does discovery for the LVS domain name [13:59:57] so eventgate-analytics.svc.$site.wmnet [14:00:17] then LVS does routing/balancing between the k8s backends [14:00:32] if the 2 services are exactly the same, just on different ports [14:00:48] will the hostnames that eventgate-analytics.svc balances be the same for either? [14:01:14] since the port is on the URL in either case, there's no discovery of the port [14:01:17] the hostnames? yes [14:01:23] the port is a different thing [14:01:31] so the port is mostly for LVS to do monitoring/pooling? [14:01:50] no it's also the port LVS forwards for [14:02:05] LVS-DR can not do any kind of port translation [14:02:19] if something is for port X it will be for port X end-to-end [14:02:45] which is why I said there will be an outage if you change the port on LVS [14:02:51] ahhh i see [14:02:51] ok [14:02:53] and not change it in mediawiki-config [14:03:03] hm [14:03:16] which might be acceptable per fsero's comment above [14:03:20] so is it even possible to keep both running with the same hostnames? [14:03:25] although you are the service-owner, you know better [14:03:40] we can handle an outage if we need to at the moment, just was prepared to do it without one :p [14:04:18] I am pretty sure we can have both running with the same hostname/IP [14:04:27] i guess the only way to configure a new LVS service would be to make all the new necsesary puppet entries [14:04:31] IIRC we did it for years for port 80/443 for mediawiki [14:04:32] but with a new 'service name' [14:04:36] even though the hostnames would be the same [14:04:43] the e.g. config keys would be different [14:04:51] which would be just annoying, more annoying than just changing a port [14:05:09] let's do it with an outage then, that's easier [14:05:13] that's just a mw config deploy [14:05:16] to turn off the event [14:05:19] s [14:05:29] i'll prep for that, maybe we can do that tomorrow [14:05:32] but for today! :) [14:05:40] let's set up eventgate-main [14:05:41] ya? [14:05:46] ok [14:06:29] dns change merged [14:06:30] thanks akosiaris, i see you are merging DNS. lemme know if you want me to do anything... [14:08:55] hmm fsero if we do an outage, we don't even have to change LVS ports....right? [14:09:01] we can use the same port on the new chart [14:09:07] yup [14:09:17] just purge the old one and upgrade the new one. [14:09:18] change LGTM, but passing it through PCC [14:09:24] gr8 [14:09:32] ok cool thaata's even easier. [14:10:27] hi folks [14:10:32] o/ [14:10:39] I have a capex-related question for the subteam :) [14:10:39] paravoid: o/ [14:10:53] paravoid: 42 [14:10:56] hm, fsero if we do it that way, would you all mind if I did that today? in my afternoon after ops meeting? [14:11:07] we're negotiating a buy-out option for the leased servers [14:11:07] just add the required suffix and it should do :P [14:11:16] i can do the mw config change, helm delete --purge, helm upgrade [14:11:19] then mw config change [14:11:25] if we go for it, that means that all the leased servers that we've scheduled for refresh this coming year [14:11:43] will not happen, and will happen at our leisure (typically 2 years later) [14:12:46] for service ops I believe these are [14:12:46] mw[1307-1348], mw[2251-2258], kubernetes[1001-1004], thumbor[2003-2004], wtp[1025-1048], snapshot[1005-1007] [14:13:04] is that ok with you? [14:13:19] this means that if those refreshes incorporated an upgrade factor, this will not happen [14:14:01] ottomata: I have the same question for analytics (analytics[1058-1068] & aqs[1007-1009]) but I can ping you elsewhere/separately too :) [14:14:13] ottomata: why the mw config change? [14:14:15] _joe_: for the mws ^ jijiki for thumber ^. I think the wtps/kubernetes100x is me and /me fine with it [14:14:25] s/thumber/thumbor/ [14:14:53] fsero: to disable the events [14:14:57] IIRC for wtp it may even be more convenient, as we won't have a hard deadline for the replacements [14:15:04] mw is sending 9K events / second to eventgate-analyticis [14:15:08] <_joe_> sorry I was writing a scorecard, let me read [14:15:27] perf reviews deadline is the 17th right ? [14:15:35] I think 18th [14:15:37] i can only speak about kubernetes and kubernetes is fine for now [14:15:42] <_joe_> I'm not very happy with this, no, given the plan was to buy uniform server to then reuse easily in kubernetes [14:16:13] <_joe_> but I don't have a better answer now [14:16:18] <_joe_> :) [14:16:18] ottomata: ok i can bea round yup [14:16:41] <_joe_> and yes, if we don't do those refereshes, we need to expand the mw fleet in eqiad anyways [14:16:55] _joe_: ah so the mw refreshes would have been with the new common spec? [14:17:05] <_joe_> yes [14:17:10] paravoid: ya ping me + luca in analytics, we were just talking about how the hadoop worker refresh would help us avoid requesting new nodes, since they will be replaced with beefier workers instead. [14:17:17] fsero: great thanks [14:17:18] <_joe_> that was the idea, kube/parsoid/mw all the same servers [14:17:28] or in #ops [14:17:57] _joe_: this applies only to the leased servers, not the other refreshes [14:18:15] <_joe_> well we need to recalibrate the ask I guess [14:18:21] <_joe_> at least for wtp for sure [14:18:21] it's 40 boxes in eqiad+7 in codfw [14:18:26] <_joe_> oh right no [14:18:32] wtp? why wtp? [14:18:36] <_joe_> wtp will remain untouched [14:18:39] ah ok [14:18:44] <_joe_> sorry, I'm catching up :D [14:19:31] <_joe_> akosiaris: thumbor was destined to k8s as well, I guess we'll just add uneven machines? [14:20:02] I can calibrate the asks, but only to a certain point [14:20:05] <_joe_> paravoid: so my take is: we'll have to remodulate our refreshes a bit, but overall it's ok [14:20:07] fsero: got any experience on a k8s cluster with uneven machines? [14:20:10] the lease refreshes were supposed to only be about refreshes and not expansions :) [14:20:24] <_joe_> well if you get more for the same amount of money [14:20:28] my thinking is it should not be a problem, but the devil is always in the details [14:21:04] yes, is not a problem since any node will be translated to RAM and milicores, the problem is when diagnosing hardware issues [14:21:08] then is going to be fun [14:21:26] I'd like some specifics on how the budget will need to be adjusted? [14:21:36] fsero: hmm indeed. Good point [14:21:46] <_joe_> paravoid: right now? I'm sorry, I don't know [14:21:51] <_joe_> I will have to do calculations [14:22:05] akosiaris: what ive used to do is to keep the group of the same spec as a nodepool [14:22:06] <_joe_> can this be answered tomorrow morning? [14:22:11] yes [14:22:13] <_joe_> also to consider [14:22:14] a nodepool is not more than a naming scheme [14:22:18] <_joe_> we'll buy less servers from dell [14:22:23] <_joe_> so the unit price will go up [14:22:30] 10serviceops, 10Analytics, 10Analytics-Kanban, 10EventBus, 10Services (watching): Change LVS port for eventlogging-analytics from 31192 to 33192 - https://phabricator.wikimedia.org/T222962 (10Ottomata) Ok, in that case, it will be much less annoying to temporarily disable the events in mediawiki-config,... [14:22:31] <_joe_> we got a huge huge discount [14:22:51] <_joe_> based on the numbers we gave them [14:22:54] fsero: isn't the nodepool concept GKE specific? [14:23:03] <_joe_> anyways, I'll assume constant unit price [14:23:11] 10serviceops, 10Analytics, 10Analytics-Kanban, 10EventBus, 10Services (watching): Use new eventgate chart release analytics for eventgate-analytics service. - https://phabricator.wikimedia.org/T222962 (10Ottomata) [14:23:17] not at all, is also used on AWS cluster and bare metal [14:23:44] set of nodes that share the same spec or purpose, what is tied to GKE is that is also tied to scaling policies and RBAC [14:23:47] <_joe_> so one good thing is the 40 servers in eqiad are the most modern and powerful we have [14:28:33] <_joe_> https://gerrit.wikimedia.org/r/#/projects/wikimedia,dashboards/sre-serviceops:main [14:28:42] <_joe_> very very first attempt [14:30:16] <_joe_> you need the old UI [14:32:04] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10User-fsero: placeholder task for migration problems - https://phabricator.wikimedia.org/T222210 (10hashar) On contint1001, using docker-pkg, I have created a new image `docker-registry.wikimedia.org/releng/tox:0.4.0`. On WMCS instance, I am... [14:37:00] <_joe_> fsero: if replication is this slow, I'd go active/passive for now [14:37:13] <_joe_> even at the traffic layer [14:37:20] is active/passive now for discovery [14:37:28] the missing part is docker-registry.w.o [14:37:32] anyway i agree [14:37:33] :) [14:37:52] <_joe_> yeah I meant at the traffic layer [14:38:14] but i needed to introduce eqiad to avoid redirection loops [14:38:22] i think it should be possible to take downe qiad now [14:38:39] ill look into it with our dear traffic team [14:38:52] i told hashar to comment there :) [14:39:47] ottomata: akosiaris@deploy1001:~$ curl http://eventgate-main.svc.eqiad.wmnet:32192/_info [14:39:47] {"name":"eventgate","version":"1.0.13" [14:39:59] not done yet, but no issues [14:40:14] awesome [14:40:19] lemme POST to it... [14:40:43] HTTP/1.1 201 All 1 out of 1 events were accepted. [14:40:45] looks good! [14:41:08] posting to kafka main just fine :) [14:42:16] codfw still coming up? [14:42:21] yup [14:42:24] aye coo [14:45:03] codfw done as well [14:45:17] _joe_: mw1261 says WARNING: Puppet is currently disabled, message: tests --joe, last run 3 days ago with 0 failures. [14:45:29] I 'll ring the shame bell :P [14:45:43] <_joe_> yeah, well, it was to save us from weekend outages [14:45:52] ? [14:45:55] <_joe_> the tests [14:45:59] <_joe_> not puppet disabled :D [14:46:00] the opcache? [14:46:04] <_joe_> yep [14:46:28] 10serviceops, 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, and 4 others: Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main - https://phabricator.wikimedia.org/T218346 (10Ottomata) [14:46:39] <_joe_> ah no sorry, the french fuckup [14:47:23] <_joe_> french wikipedia had several templates with a switch statement of 10k+ elements [14:47:27] <_joe_> which worked well under hhvm [14:47:32] <_joe_> but exploded under php [14:47:50] <_joe_> so I had first to find out what it was [14:48:20] <_joe_> and one of the debugging steps was to raise the php memory limit several times until I could one of the affected pages to render [14:48:42] <_joe_> so I had to depool mw1261 and do a few tweaks while measuring things [14:48:54] <_joe_> reenabled [14:57:27] 10serviceops, 10Operations, 10RESTBase-API, 10TechCom, and 2 others: Decide whether to keep violating OpenAPI/Swagger specification in our REST services - https://phabricator.wikimedia.org/T217881 (10CCicalese_WMF) [14:57:31] 10serviceops, 10Operations, 10RESTBase, 10RESTBase-API, and 4 others: Make RESTBase spec standard compliant and switch to OpenAPI 3.0 - https://phabricator.wikimedia.org/T218218 (10CCicalese_WMF) 05Open→03Stalled a:05Clarakosi→03Pchelolo This work is stalled until other RESTBase patches are merged. [15:02:08] paravoid: iirc we are good on the capacity end for thumbor [15:02:35] so if we are to keep those, so be it [15:03:02] alright, thanks :) [16:12:28] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10User-fsero: placeholder task for migration problems - https://phabricator.wikimedia.org/T222210 (10kostajh) I see something slightly different when I try to pull locally: > docker pull docker-registry.wikimedia.org/releng/quibble-stretch-php... [16:27:39] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10User-fsero: placeholder task for migration problems - https://phabricator.wikimedia.org/T222210 (10hashar) @fsero I am afraid we will need some hot fix to make it way faster. Would it be possible to temporarily switch `docker-registry.wikime... [16:29:33] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10User-fsero: placeholder task for migration problems - https://phabricator.wikimedia.org/T222210 (10fsero) @hashar the CR is already there https://gerrit.wikimedia.org/r/c/operations/puppet/+/509879 just need a +1 from Traffic and i´ll merge it [17:01:24] ottomata: you might abandon this CR https://gerrit.wikimedia.org/r/c/operations/puppet/+/508582, create one for the mwconfig and then we will do the helm dance [17:01:48] also please update the values file under /srv/scap-helm directory [17:03:15] aye [17:03:25] fsero: mw config here [17:03:26] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/509866 [17:03:51] yes but im not mediawiki wise enough to be able to do a +1 there [17:04:04] less deploy that change [17:04:06] :P [17:04:19] i can do that [17:04:26] let's get Pchelolo on board too if we can [17:04:28] yt Pchelolo ? [17:04:36] I'm here ottomata [17:04:45] hi from the other side of the ocean [17:04:51] so we are going to turn off all eventgate-analytics monolog events [17:04:55] hellooooooo! [17:04:58] 'your continent' [17:05:15] so we can delete the current chart in prod, and upgrade the new one to use the same port [17:05:16] then turn them back on [17:05:42] fsero: there are problably other alarms we'll need to silence too ya? [17:05:50] when we purge the eventgate-analytics chart [17:06:01] yup yup, I know. +1 [17:06:17] ok cool. [17:06:19] great [17:06:22] going to proeed with that part [17:06:38] if we can do this all within an hour it'll be good to not conflict with SWAT [17:06:41] if not its ok too [17:09:31] ok proceeding to disable events [17:11:30] ok, when youa re done ill delete helm releases on staging and you will do a normal deploy and we check everything is as expected [17:11:33] then we move forward [17:11:48] cool. [17:11:58] i'm gongi to silence the health check for eventgate-analytics.svc.eqiad.wmnet ya? [17:12:06] and codfw [17:12:31] yep [17:13:26] waiting 5 mins to verify that events have stopped [17:18:15] should we start? [17:19:33] i'm still seeing a trickle of events... [17:19:48] trickle meaning 300-400 / second [17:19:53] trying to find out why [17:20:04] Pchelolo: could there be long lived api requests maybe that are logging at the end? [17:20:05] hmmm [17:20:09] i see search requests too [17:23:07] from a pretty wide range of mw hosts [17:23:12] the one i just checked has the proper config [17:24:19] the backend times according to mw is low [17:24:25] hm [17:24:30] so it snot like some old lingering request... [17:29:08] is that flag working? because according to grafana there are still lots of events coming in [17:29:52] yeah [17:29:59] it should be working... it turned of most of them. [17:30:52] going to be afk for some time, ill be back [17:31:16] ok sorry fsero [17:31:19] this might take a minute :( [17:34:18] the only other flag is the one for beta... [17:43:03] fsero: weird, releng folks just advised me to touch and sync again [17:43:06] which seems to have worked... [17:44:27] ok fsero yeah, no more events in kafka for either topiic [17:44:30] ready when you are back. [17:44:49] brb too [17:48:11] ottomata: im around [17:56:51] fsero: let's go! [17:56:56] k [18:00:45] 10serviceops, 10Operations, 10Release Pipeline, 10Release-Engineering-Team, and 5 others: Introduce wikidata termbox SSR to kubernetes - https://phabricator.wikimedia.org/T220402 (10Pablo-WMDE) Hi @akosiaris - thanks for getting back to us. > sending a Host: HTTP for the identification of the exact projec... [18:01:41] looking good from my side [18:01:44] can you check? [18:01:59] fsero: looks good. [18:02:12] post to http//kubestage1001.eqiad.wmnet:31192/v1/events works fine [18:02:19] ok [18:02:28] so i will delete the eventgate-analytics-production on codfw [18:02:40] and you redeploy eventgate-analytics [18:02:44] k? [18:02:47] yup [18:02:49] proceed [18:03:52] fsero: so you are just deleting the old one this time? I will just do upgrade instead of install? [18:04:00] yes [18:04:02] k [18:04:05] there is a bunch of pods already deployed [18:04:10] dont see the need [18:04:12] great [18:04:15] agree [18:04:40] ok i don [18:04:43] 't see the production ones [18:04:44] so upgradnig.. [18:05:35] can you check if its working? [18:06:05] yup, watching pod status [18:06:38] rollingupdate has ended [18:06:44] looks good [18:06:46] so lets do eqiad [18:06:51] proceed! [18:07:08] deleting [18:07:15] done [18:07:23] ok upgradaing eqiad [18:08:30] seems ok [18:08:42] ya looks great [18:09:23] awesome, ok going to re-enable events... [18:09:35] yup [18:09:56] oh swat is happening now [18:09:58] might need to schedule it [18:10:27] or at least wait til that's done [18:11:50] i dont think is going to cause any issue [18:11:54] icinga is happy https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=1&host=eventgate-analytics.svc.codfw.wmnet [18:12:01] pods are running [18:13:53] awesome [18:14:31] gimme a little +1 for good measure :) https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/509911 [18:15:57] done [18:20:34] 10serviceops, 10ORES, 10Scoring-platform-team, 10Patch-For-Review: Ores hosts: mwparserfromhell tokenizer random segfault - https://phabricator.wikimedia.org/T222866 (10Harej) p:05Triage→03Normal [18:21:09] ottomata: i believe its working right? [18:21:42] fsero: yes it looks good! [18:21:45] events are coming back [18:22:02] dashboards look as expected [18:22:10] then i'll leave now, it was faster than the other way, right :)? [18:22:19] busiest pods in dash are now e.g. [18:22:20] eventgate-analytics-859954f894-zrwmx [18:22:24] instead of -production [18:22:25] so good! [18:22:33] yes! that was just fine thank you! [18:22:38] thanks for your help, have a good evening [18:23:24] 10serviceops, 10Analytics, 10Analytics-Kanban, 10EventBus, and 2 others: Use new eventgate chart release analytics for eventgate-analytics service. - https://phabricator.wikimedia.org/T222962 (10Ottomata) [18:35:05] 10serviceops, 10Analytics, 10Analytics-Kanban, 10EventBus, and 3 others: Set up LVS for eventgate-main on port 32192 - https://phabricator.wikimedia.org/T222899 (10Ottomata) @akosiaris when you have time tomorrow: https://gerrit.wikimedia.org/r/c/operations/dns/+/509912 Thank you! [19:11:14] for some reason on Wikitech it says "We still don't enable fpm here". But do we care about for just the PHP 7.0 -> 7.2 part ? [19:12:20] otherwise it's much simpler than i expected.. and just Hiera [19:12:44] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509916/1/hieradata/role/eqiad/wmcs/openstack/eqiad1/labweb.yaml -> https://puppet-compiler.wmflabs.org/compiler1001/16497/labweb1001.wikimedia.org/ i had expected before we have to add all the puppet code to add the 7.2 repo et [19:13:46] just had to fix compiling on labweb with a previous change because nobody had added fake secrets in labs/private [19:24:56] 10serviceops, 10Analytics, 10Analytics-Kanban, 10EventBus, and 3 others: Set up LVS for eventgate-main on port 32192 - https://phabricator.wikimedia.org/T222899 (10Ottomata) a:03akosiaris [19:25:09] 10serviceops, 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, and 4 others: Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main - https://phabricator.wikimedia.org/T218346 (10Ottomata) [19:26:51] 10serviceops, 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, and 4 others: Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main - https://phabricator.wikimedia.org/T218346 (10Ottomata) [19:27:11] 10serviceops, 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, and 4 others: Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main - https://phabricator.wikimedia.org/T218346 (10Ottomata) [19:27:23] 10serviceops, 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, and 4 others: Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main - https://phabricator.wikimedia.org/T218346 (10Ottomata) [19:30:49] 10serviceops, 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, and 4 others: Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main - https://phabricator.wikimedia.org/T218346 (10Ottomata)