[08:40:55] 10serviceops, 10Operations, 10SRE-tools, 10Patch-For-Review: Create a cookbook for depooling one or all services from one kubernetes cluster - https://phabricator.wikimedia.org/T260663 (10JMeybohm) Merged the current version as is but the cookbook should be updated in short term with: ` [03.09.20 14:05]... [09:05:38] 10serviceops, 10Operations: Reproduce opcache corruptions in production - https://phabricator.wikimedia.org/T261009 (10jijiki) [09:05:41] 10serviceops, 10Performance-Team, 10Patch-For-Review, 10Sustainability (Incident Followup): Avoid php-opcache corruption in WMF production - https://phabricator.wikimedia.org/T253673 (10jijiki) [09:21:05] 10serviceops, 10Citoid, 10Operations, 10Prod-Kubernetes, 10Services (watching): Citoid automated monitoring times out due to Zotero v2 - https://phabricator.wikimedia.org/T211411 (10Mvolz) >>! In T211411#4913374, @mobrovac wrote: > There have been no timeouts recorded by the automatic check scripts since... [09:39:56] 10serviceops, 10MediaWiki-General, 10Operations, 10Patch-For-Review, 10Service-Architecture: Create a service-to-service proxy for handling HTTP calls from services to other entities - https://phabricator.wikimedia.org/T244843 (10Joe) [09:40:35] <_joe_> effie/jayme: can you take care of adding the service proxy to push-notifications? [09:40:43] <_joe_> given it's still not in production [09:40:57] <_joe_> if it's even applicable, of course - I'm not sure about that [09:41:24] _joe_: Aye. I think it's not as it just talks to goole / apple afaik [09:41:58] <_joe_> ok, great. In that case, just mark it as such in the table in https://phabricator.wikimedia.org/T244843 [09:46:29] effie: maybe you can verify better: Does push-not talk to any of our services? [09:46:38] _joe_: wilco [09:48:05] _joe_: we have everything in place [09:48:23] we have a patch for service proxy [09:48:43] <_joe_> what is that used for in this case? [09:51:51] <_joe_> hnowlan: ping, I need help with some envoy config fu [09:52:07] <_joe_> and something tells me you're quite advanced in that regard [09:52:10] <_joe_> :P [09:52:28] scarred, advanced, same thing [09:52:30] what's up? [09:54:24] <_joe_> so... I find more and more that our software is configured to call https://$somewiki/w/api.php [09:55:00] <_joe_> now say I want to funnel those calls into envoy [09:55:15] <_joe_> I see two routes: [09:55:27] <_joe_> 1 - create a new listener that injects Hoist: $somewiki [09:56:14] <_joe_> 2 - find a way to convert an url like http://localhost:6500/meta.wikimedia.org/w/api.php into [09:56:35] <_joe_> calling /w/api.php with host header meta.wikimedia.org [09:56:47] <_joe_> that would make one listener do all the work for such cases [09:57:11] <_joe_> now I am not sure how to do 2, and I think you might know [09:57:17] _joe_: this service will talk to apple and google to send notifications and mediawiki will talk to it [09:57:20] #2 is more or less what the API gateway does already [09:57:56] <_joe_> hence my question [09:58:09] <_joe_> is there an logically sound and not-awful way to do it? [09:58:17] The bad news is there is no not-awful way to do it *now* [09:58:30] in 1.16 there will be a proper regex-based host setting method [09:58:37] <_joe_> effie: so your service doesn't need the service-proxy [09:58:39] We use lua and an ugly config to do it [09:58:46] https://phabricator.wikimedia.org/source/operations-deployment-charts/browse/master/charts/api-gateway/values.yaml$77 [09:58:49] <_joe_> ok, I'll wait for 1.16 :D [09:58:57] _joe_: not to talk to someone [09:59:09] but for someone to talk to it [09:59:11] https://phabricator.wikimedia.org/source/operations-deployment-charts/browse/master/charts/api-gateway/templates/_config.yaml$303 [09:59:24] <_joe_> effie: oh ok, so in puppet, not in the chart [09:59:40] <_joe_> hnowlan: ok ok you convinced me [09:59:44] <_joe_> separate listener it is [09:59:47] heh [10:10:51] 10serviceops, 10GrowthExperiments-NewcomerTasks, 10Operations, 10Product-Infrastructure-Team-Backlog: Service operations setup for Add a Link project - https://phabricator.wikimedia.org/T258978 (10MGerlach) @kostajh @Joe some current estimates (@DED please correct/add): - once we have a fully running vers... [10:38:49] <_joe_> mdholloway: not sure if you're still the right person to ask for mobileapps stuff, but I am planning to move it to use envoy as a proxy for RPC calls https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/625619 [11:22:33] btw are we meeting today? [11:22:43] it is a US holiday [11:31:33] 10serviceops, 10Operations: Reproduce opcache corruptions in production - https://phabricator.wikimedia.org/T261009 (10jijiki) [12:15:52] 10serviceops, 10observability, 10User-jijiki: Should mwdebug servers contribute to cluster latency? - https://phabricator.wikimedia.org/T262202 (10jijiki) [12:51:01] <_joe_> effie: I assume so, the meeting wasn't cancelled [12:54:25] <_joe_> we need to check the racks that will be acted upon tomorrow in eqiad [12:54:46] <_joe_> if they contain kafka servers and/or too many k8s nodes, we should work on it [13:22:12] <_joe_> jayme, effie you're down with your lvs work? [13:22:22] <_joe_> s/down/done/ [13:23:08] _joe_: we actually never started because it seemed easier to wait until push-not is deployed to production (and do it all in one then [13:23:18] <_joe_> ack [13:23:38] <_joe_> jayme: can I ask you a quick check of https://gerrit.wikimedia.org/r/c/operations/puppet/+/625584 then? [13:24:32] _joe_: I'm currently trying to figure that's going on in the clusters as we seem to have lots of pod/container restarts without deployments. Gimme a sec [13:24:40] <_joe_> uh [13:24:47] <_joe_> I just redeployed mobileapps [13:24:56] ah [13:24:59] might be it [13:25:13] see -operations [13:25:18] <_joe_> yes [13:25:25] <_joe_> mobileapps is a fuckton of container [13:25:29] does it have like a hell of ...ah [13:25:31] :-)( [13:25:35] <_joe_> 160 right now because of the two services [13:25:39] <_joe_> tls and non-tls [13:25:46] <_joe_> hence I want to remove 80 of them [13:26:12] Ok. Won't bother to look then and going to check the CR instead [13:26:32] somehow overlooked the log line of yours [13:57:18] <_joe_> so regarding the team meeting [13:57:31] <_joe_> I was thinking we might ask wolfgang to have it later in the week [13:57:44] <_joe_> and today dedicate it to discussing the service proxy stuff, jayme [13:58:19] <_joe_> but if you feel you need a full dedicated meeting with a smaller audience, that's ok too [14:01:46] we are going to be missing 3 people, including the manager, it probably makes sense to postpone [14:03:44] +1 [14:05:01] i can stand in as the manager if you want ;) [14:07:00] _joe_: okay for me but I don't know if we need the "full-in-terms-of-EU" team for this - plus I'm not fully prepared, but that's okay for me [14:07:12] <_joe_> ok [14:12:01] mark: let us enjoy labour day [14:12:05] :D [15:14:12] akosiaris: _joe_: effie: apergos: are you gonna join wkandek and me? :-) [15:14:28] ugh woops. wait isn't it a us holiday? [15:14:35] * apergos gets on regardless [15:14:49] <_joe_> argh :P