[10:51:29] 10serviceops, 10Operations, 10PHP 7.2 support: (euwiki) Mysterious, coordinated slowdowns every ~ 25 minutes on API servers - https://phabricator.wikimedia.org/T231011 (10jijiki) [16:05:59] 10serviceops, 10Operations, 10PHP 7.2 support: (euwiki) Mysterious, coordinated slowdowns every ~ 25 minutes on API servers - https://phabricator.wikimedia.org/T231011 (10Theklan) Last days I have been heavily merging some templates, and it may be related. [17:42:40] 10serviceops, 10Operations, 10PHP 7.2 support: (euwiki) Mysterious, coordinated slowdowns every ~ 25 minutes on API servers - https://phabricator.wikimedia.org/T231011 (10jijiki) @Theklan We suspect it is something on the production side since we have noticed this behaviour in the past. Moreover, this is not... [18:39:17] o/ i'd like to do T225129 [18:39:23] i'm also looking at moving eventstreams to k8s [18:39:31] so will try to do it properly there first [18:39:41] what is the proper way for logging in k8s? [18:39:45] just stdout, right? [18:40:11] do I need a special sidecar? [19:15:27] 10serviceops, 10Operations, 10PHP 7.2 support: (euwiki) Mysterious, coordinated slowdowns every ~ 25 minutes on API servers - https://phabricator.wikimedia.org/T231011 (10akosiaris) I think it's in the ~35mins "schedule" now, but other than that, it's still present https://grafana.wikimedia.org/d/RIA1lzDZk/a... [19:15:43] ottomata: no, just log to stdout [19:15:54] ok gr8 [19:15:59] using bunyan ofc please, but that's what service-runner does [19:16:03] ya [19:16:57] hmm, that is what eventgate does... [19:17:03] do I need to do something different? [19:17:03] https://phabricator.wikimedia.org/T225129 [19:17:12] I guess i'll ask keith [19:17:20] also, do have a look that your logs are adequate cause IIRC there is some minor bunyan diff with level or something [19:17:40] ottomata: you might need to deploy? [19:17:51] I remember the chart was updated [19:17:52] * akosiaris looking [19:17:55] oh hm [19:19:27] akosiaris@deploy1001:/srv/deployment-charts/helmfile.d/services/eqiad/eventgate-main$ helm list [19:19:27] NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE [19:19:27] main 4 Thu Sep 26 17:14:53 2019 DEPLOYED eventgate-0.0.11 eventgate-main [19:19:32] ottomata: yup, you need to deploy [19:19:35] ah ok cool [19:19:41] then that will happen eventually / soon enough [19:19:44] I see you have published 0.0.12 already [19:19:46] great! i'll comment on ticket [19:20:22] ok, thank you for looking! [19:20:28] yw [19:20:56] I 've started looking into https://phabricator.wikimedia.org/T233629 btw ottomata [19:21:12] I 'll work on the namespaces but there is one caveat and it's the calico network policy [19:21:25] I need to know which "kafkas" that will talk to [19:25:07] 10serviceops, 10Operations, 10PHP 7.2 support: (euwiki) Mysterious, coordinated slowdowns every ~ 25 minutes on API servers - https://phabricator.wikimedia.org/T231011 (10Theklan) Ok! Now there's nothing special happening at euwiki and there are not new merges happening, so it should be another thing. [19:27:07] akosiaris: it it will talk to kafka jumbo [19:27:10] which only exists in eqiad [19:27:34] should be set up the same as eventgate-analytics [19:28:00] the service configs will be slightly different (different message size limits, etc.) and it will be publicly routed to [19:28:06] but other than that it is the same [19:28:12] ok [19:28:18] easy enough for me then [19:28:37] ty! [19:35:31] akosiaris: another q [19:35:42] ema said they wanted HTTPS everywhere for ATS [19:35:47] from ATS -> app [19:36:09] yup. envoyproxy does that [19:36:10] is auto TLS termination being built into our k8s stuff? [19:36:14] yup [19:36:24] blubberoid is the guinea pig right now [19:36:25] do I need to enable it for eventgate? [19:36:42] IIRC _joe_ got it working a few days ago, you will have to take a look in that chart [19:36:50] ok [19:36:54] the scaffolding has also support for it ofc [19:37:04] will look, maybe i can add it for eventgate-logging-external now [19:37:05] since it is new [19:37:11] (and has cache routing) [19:37:23] indeed [19:37:37] you will need a cert/key from cergen ofc [19:37:49] hm k [19:38:25] ok for discovery [19:38:25] cool [19:39:40] ah cool [19:39:44] kube_services.certs.yaml [19:39:46] gr8 [19:43:17] <_joe_> ottomata: see what I did for blubberoid [19:43:24] <_joe_> I can help you with a CR ofc [19:44:10] thanks, ya am looking now, will submit patch, surely will miss some things [19:49:32] hm # To be defined in a private space [19:49:35] where is that? [19:49:45] in puppet private? [19:50:29] <_joe_> yes [19:50:44] <_joe_> ottomata: oh wait I have a horrible todo list [19:50:47] <_joe_> lemme find it for you [19:51:11] <_joe_> https://wikitech.wikimedia.org/wiki/User:Giuseppe_Lavagetto/Add_Tls_On_Kubernetes [19:52:16] <_joe_> I promise it will be a better engineered process in the future [19:52:43] ah great stuff [19:52:44] thank you [19:52:52] yeah i am pioneer in non serviceops k8s user [19:53:10] its ok by using it i make you feel bad and you get motivated to make it better :p (right!? ) [19:53:52] <_joe_> actually I will have to create TLS for the existing services and I was already pissed by the process, but yes you will :D [19:54:11] <_joe_> for now I just used what I found [20:01:26] hm _joe_ is the .fixtures/tls_enabled.yaml for jenkins CI? [20:01:35] at <.Values.tls.certs.ce...>: can't evaluate field cert [20:01:41] <_joe_> yes [20:01:45] i could set some dummy values in the chart's values.yaml [20:01:45] oh ok [20:01:46] hm [20:01:53] <_joe_> and yes [20:02:29] hm do I need to set that up to be used by jenkins? [20:05:02] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/551253 [20:05:24] <_joe_> I'll take a look in a few [20:06:25] <_joe_> ottomata: it's helm lint that fails [20:06:28] <_joe_> 20:00:57 [ERROR] templates/: render error in "eventgate/templates/deployment.yaml": template: eventgate/templates/deployment.yaml:29:96: executing "eventgate/templates/deployment.yaml" at <.Values.tls.certs.ce...>: can't evaluate field cert in type interface {} [20:07:04] AH [20:07:08] cert vs certs hm [20:07:23] hm no [20:07:26] it is right [20:17:33] <_joe_> it's that you define as a default tls.enabled: true [20:17:45] ohhhh opsp [20:17:46] <_joe_> but then have a nil value for tls.certs [20:18:06] right [20:19:04] yes better [20:20:03] <_joe_> ottomata: in your case, also, you probably want only the TLS endpoint in prod to be exposed [20:20:16] <_joe_> I'll take a better look next week [20:20:45] endpoint in prod? [20:21:00] vs ...? [20:21:46] <_joe_> right now your chart defines a service both exposed via TLS and not exposed via TLS [20:21:52] <_joe_> you want just one I think [20:21:58] oh i see [20:22:02] blubberoid has both? [20:22:05] <_joe_> so that you don't need two LVS endpoints too [20:22:11] ah, [20:22:13] yes for sure. [20:22:15] hmmm [20:22:20] <_joe_> because it's a transition and blubberoid is already exposed without TLS right now [20:22:24] ah ok [20:22:46] <_joe_> I gtg to follow the conf sorry [20:22:47] hm can we keep both there and only route to one via lvs? [20:22:52] ok np thanks _joe_ ! [20:22:52] <_joe_> sure [20:22:55] just prepping patches for next week [20:22:56] ok cool. [20:22:58] <_joe_> sure you can [20:23:04] <_joe_> I will be back on nov 20th [20:23:31] ok, hopefully some other serviceopsen can help me next early week. all new stuff here so we can iterate [20:23:33] thank you!