[01:26:38] 10serviceops, 10Operations, 10Wikimedia-Etherpad, 10Patch-For-Review: Migrate etherpad1001 to Buster - https://phabricator.wikimedia.org/T224580 (10Dzahn) @akosiaris @Muehlenhoff Here it is on buster as "etherpad-new". https://etherpad-new.wikimedia.org/p/aXjrQTK8PD6bjj9TqK4Q [01:27:40] 10serviceops, 10Operations, 10Wikimedia-Etherpad, 10Patch-For-Review: Migrate etherpad1001 to Buster - https://phabricator.wikimedia.org/T224580 (10Dzahn) a:03Dzahn [07:49:23] 10serviceops, 10Operations, 10Wikimedia-Etherpad, 10Patch-For-Review: Migrate etherpad1001 to Buster - https://phabricator.wikimedia.org/T224580 (10akosiaris) >>! In T224580#5828562, @Dzahn wrote: > @akosiaris @Muehlenhoff Here it is on buster as "etherpad-new". https://etherpad-new.wikimedia.org/p/aXjr... [09:31:46] 10serviceops, 10Operations, 10Wikimedia-Etherpad, 10Patch-For-Review: Migrate etherpad1001 to Buster - https://phabricator.wikimedia.org/T224580 (10akosiaris) I 've removed the DNS and stopped and masked the service for now on etherpad1002. Since we proved it works, let's just move over to etherpad1002.eqi... [09:40:44] 10serviceops, 10Operations, 10Wikimedia-Etherpad, 10Patch-For-Review: Migrate etherpad1001 to Buster - https://phabricator.wikimedia.org/T224580 (10akosiaris) Pad that per logs have been accessed on https://etherpad-new.wikimedia.org ` 90D1o-quuUNWqCrt0CIV WMCS-2019-06-25 WMCS-2020-01-22 WMCS-2020-02-04 W... [09:58:52] 10serviceops, 10Operations, 10Wikimedia-Etherpad, 10Patch-For-Review: Migrate etherpad1001 to Buster - https://phabricator.wikimedia.org/T224580 (10akosiaris) @Dzahn, I 've merged the required remaining changes to get the migration done. Now etherpad.wikimedia.org uses etherpad1002. Checked a couple of pad... [13:39:13] 10serviceops, 10Release-Engineering-Team, 10MW-1.35-notes (1.35.0-wmf.16; 2020-01-21): Opcache hit ratio dropped after 22/1 train on appeservers - https://phabricator.wikimedia.org/T243601 (10jijiki) [13:52:54] akosiaris hellLllOoooo :) [13:53:28] i got eventstreams in staging yesterday, want to benchmark. I'd also like to get the service-runner prometheus parts working first [13:53:42] but i don't fullly understand how that stuff actuallygets to prometheus [13:54:02] i know there is some special k8s node prometheuus proxyish thing in front, right? [13:54:10] somehow k8s collects the metrics from the pods? [13:54:19] or, do I need to explicitly enable that somehow? [13:55:14] ottomata: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/master/_scaffold/templates/deployment.yaml#25 [13:55:16] those 2 lines [13:55:27] port and toggle respectively [13:55:57] and also open up the port in templates/networkpolicy.yaml [13:56:13] s/9102/whateryourportis/ in that file [13:57:45] hmm [13:57:51] does it have to be 9102? [13:57:52] hm [13:58:04] nope, it's configurable [13:58:29] ok trying and looking [13:58:31] :) [13:58:31] that annotation tells prometheus (which talks to the kubernetes API) which pods and ports to scrape [13:58:47] there is a similar one in the TLS helpers IIRC [13:58:54] cause envoy gets scraped as well [13:59:13] AHH ok i think i changed the port for the Servbice, but not there [13:59:26] but, hm, i should be able to manually hit the port /metrics [13:59:27] right? [13:59:47] nope, due to the networkpolicy [13:59:58] aah [14:00:02] so wait... [14:00:14] that prometheus endpoint is on the same port, right ? [14:00:30] am trying curl -H 'X-Client-IP: 1.2.3.5' -v --resolve 'eventstreams.svc.eqiad.wmnet:9202:10.64.0.247' 'https://eventstreams.svc.eqiad.wmnet:9202/metrics' [14:00:33] so you should be able to reach on the normal ports and /metrics [14:00:45] ok so i should usue 9102 [14:00:54] i think i didn't because of local dev env conflicts [14:01:01] but yeah i should defauilt to 9102 [14:01:02] ok [14:01:03] fixing that [14:01:18] whatever port your pods listen on for exposing prometheus metrics [14:01:19] oh i do have 9202 in the network policy [14:01:45] and then just annotate the pod properly and you should be good to go [14:02:13] ok not 100% sure why i can't hit /metrics though, but i'll switch it all back to 9102 default just in case i'm missing something [14:02:35] lemme try [14:02:40] oh [14:02:42] nodeeport [14:02:50] needs to be set expliciitly, right? [14:02:53] nodeport has nothing to do with this [14:03:17] even for hitting /metrics directly? [14:03:33] i undetsand its not needed for k8s to scrape it [14:03:48] 10.64.75.100 [14:03:53] that's the IP of the pod, right ? [14:03:57] oh oops [14:04:02] kubestage yeah sorry my resolve is wrong [14:04:07] that's a wrong paste...maybe that's it [14:04:24] hmm oh n o [14:04:25] 10.64.0.247 [14:04:25] curl 10.64.75.100:8092/metrics [14:04:26] says 404 [14:04:36] should it answer there? /me not sure [14:05:30] ah not 8092, that's the plaintext, which is not exposed [14:05:34] 4892 [14:05:35] https [14:05:49] it's not exposed, but it is reachable [14:05:56] oh on the pod ip [14:06:00] that's the pod ip [14:06:01] didn't know you could ldo that [14:06:05] forget about the service for a while [14:06:20] curl -H 'Accept: application/json' -H 'X-Ciad.wmnet:4892:10.64.0.247' 'https://eventstreams.svc.eqiad.wmnet:4892/_info' [14:06:20] works [14:06:21] you can always talk directly to all pods from anywhere in production [14:06:28] excluding networkpolicies [14:06:28] I did not know that!!! [14:06:41] :) [14:06:43] COOL [14:06:44] curl 10.64.75.100:9202/metrics [14:06:44] works [14:07:03] ok then, then the only thing you need is to s/9102/9202/ in your chart and deploy [14:07:05] but no service runner stuff hm [14:07:11] i have my custom metric the app makes [14:07:14] awesome. [14:07:53] ok cool, can figure this out now [14:07:55] thank you! [14:07:59] yw [14:34:52] akosiaris: that prometheus.io/port should be the pod 'port', right [14:34:53] ? [14:34:55] not the targetPort [14:34:57] (container?) [15:38:42] ottomata: the pod port [15:41:20] ✔ [15:49:11] hm ok akosiaris stuff looks good, but no service metrics in prometheus yes. [15:49:18] they are exposed, e.g. curl 10.64.75.51:9102/metrics [15:49:44] and prometheus.io/port: "9102" [15:50:14] AH bu tnow there are [15:50:20] ok good thing i asked you....i guess it takes 5-10 mins??? [15:50:22] sheesh [15:50:31] nope it shouldn't [15:50:34] like 1-2 ? [15:50:37] i waited and now i see them. [15:50:55] at least, no results were coming up in the prometheus web ui [15:50:56] even less normally [15:51:12] but now they are so ¯\_(ツ)_/¯ [15:52:07] cool great stuff, making a dashboard now...:) [15:53:55] hmmm just realized something...there isn't a way to make a certain prometheus label private, is there? i'd assume not. i want to keep track of connections per client ip, we've had abusers in the past. buuuut, i can't really make that publc. [15:53:56] hm [15:54:06] if it is in prometheus, but not in a grafana dashboard, is that ok? [15:54:21] iiuc prometheus isn't publicly queryable, and dashboards aren't editable by public [15:54:22] ? [15:55:05] i'd just like to be able to find connections per ip when i need to, don't need it regularly in a dash [15:55:07] hm [16:49:59] ottomata: seems ok to me. prometheus itself isn't exposed and editing dashboards is locked down, including grafana explore. [16:50:19] ok cool