[05:31:32] ottomata: Agreed. As I wrote that down later yesterday I came to that conclusion as well. In a re-deploy scenario we also lack the dynamic configmaps that flink creates so it would not even know where to pick up the latest checkpoint...bummer [06:13:39] 10serviceops, 10Maps, 10SRE-swift-storage: Swift account to store pre-rendered vector-tiles - https://phabricator.wikimedia.org/T283049 (10jijiki) [06:14:01] 10serviceops, 10Maps, 10SRE-swift-storage: Swift account to store pre-rendered vector-tiles - https://phabricator.wikimedia.org/T283049 (10jijiki) [07:41:32] 10serviceops, 10MW-on-K8s, 10SRE: Create a basic helm chart to test MediaWiki on kubernetes - https://phabricator.wikimedia.org/T265327 (10Joe) 05Open→03Resolved [07:46:12] 10serviceops, 10MW-on-K8s, 10SRE: Create a mwdebug deployment for mediawiki on kubernetes - https://phabricator.wikimedia.org/T283056 (10Joe) p:05Triage→03High [08:21:29] <_joe_> question: do we want to volunteer moving deployment-charts to gitlab early in the process? [08:47:52] depends on the definition of early I would say. We'll be down one a.lex and up one to for onboarding the coming weeks so we might be a little short on time [09:49:50] <_joe_> jayme, akosiaris any chance we can switch to ipvs for kube-proxy? [09:53:52] _joe_: technically yes, but we've not tested it and puppet is not prepared for it. What do you need that for? [09:54:10] <_joe_> jayme: mediawiki :P [09:54:22] <_joe_> somehow I trust ipvs' load balancing more than iptables [09:54:27] hrhr. Mediawiki works with iptables as well :) [09:54:30] ah, okay [09:55:01] <_joe_> but it's also true we're getting good results with stuff like sessionstore or eventgate getting 10k rps already [09:55:11] <_joe_> so not immediately necessary, sure [09:56:28] indeed. We can maybe revisit that as part of the "announcing service-ips via bgp" / "better load balancing" project [10:12:35] <_joe_> yeah [11:57:51] _joe_: we got like 30krps on iptables, all of mediawiki is like 6k-7k? [11:58:09] I have doubts we are going to see some benefit performance wise [11:58:47] <_joe_> more like 20k, but yes [12:00:51] https://grafana.wikimedia.org/d/000000519/kubernetes-overview?viewPanel=26&orgId=1 has peaks at 33k [12:01:03] so well over 30k total across multiple services [14:26:13] <_joe_> I'm merging https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/685721 so we'll have the ability to get diffs for all changes in deployment-charts [14:47:17] <_joe_> ottomata: sorry I +2'd that change by mistake (fat fingers), but I wanted to show you https://integration.wikimedia.org/ci/job/helm-lint/4134/console [14:47:36] <_joe_> (there is the diff from the change at the end) [14:48:24] nice! :) [14:49:43] <_joe_> ottomata: btw, it's time we bring you back to the mainline - we added canary support to the common helpers 0.3 [14:50:05] <_joe_> we just used a slightly different naming than you did, so you it should be just a few lines to change [14:51:03] great saw that was in progress too sounds good [14:52:41] morning, was checking on the meeting. next Tuesday is cool, get well soon, joe [15:04:06] _joe_: where is you preso happening? [15:06:38] <_joe_> urandom: next week, I sent a message to y'all [15:06:51] <_joe_> and I moved the calendar invite [15:07:07] <_joe_> I got the vaccine shot on saturday and I felt very sick until this morning [15:07:14] <_joe_> so had no time to prepare it :/ [15:07:41] <_joe_> I was a bit too optimistic about my reaction to the shot [15:07:59] that sucks, but I'm glad you're feeling better now [15:08:10] <_joe_> sorry, I wrote a message on slack, and moved the calendar invite [15:08:25] <_joe_> I forgot you were doing the code jam [15:43:10] 10serviceops, 10Prod-Kubernetes, 10Pybal, 10SRE, 10Traffic: Proposal: simplify set up of a new load-balanced service on kubernetes - https://phabricator.wikimedia.org/T238909 (10akosiaris) PR is at https://github.com/projectcalico/confd/pull/515, waiting for review now. It's been tested locally in a coup... [16:16:00] 10serviceops, 10Maps, 10Product-Infrastructure-Team-Backlog, 10SRE, and 3 others: New Service Request tegola - https://phabricator.wikimedia.org/T274390 (10jijiki) [16:17:16] 10serviceops, 10Maps, 10Product-Infrastructure-Team-Backlog, 10SRE, and 2 others: New Service Request tegola - https://phabricator.wikimedia.org/T274390 (10jijiki) [17:21:30] 10serviceops, 10MW-on-K8s, 10SRE: Create a mwdebug deployment for mediawiki on kubernetes - https://phabricator.wikimedia.org/T283056 (10jijiki) [17:32:38] 10serviceops, 10Maps, 10Product-Infrastructure-Team-Backlog, 10SRE, and 2 others: New Service Request maps-vector-server - https://phabricator.wikimedia.org/T274390 (10jijiki) [17:34:51] 10serviceops, 10Performance-Team, 10Release-Engineering-Team (Radar): Create warmup procedure for MediaWiki app servers - https://phabricator.wikimedia.org/T230037 (10thcipriani) [17:50:32] 10serviceops, 10SRE, 10Developer Productivity, 10Performance-Team (Radar), and 2 others: All debug hosts give (likely spurious) message: PHP Fatal error: The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp) - https://phabricator.wikimedia.org/T214734 (10thcipriani) [20:19:23] 10serviceops, 10Platform Engineering, 10Release Pipeline, 10Release-Engineering-Team, and 5 others: Kask functional testing with Cassandra via the Deployment Pipeline - https://phabricator.wikimedia.org/T224041 (10thcipriani) [21:11:26] shdubsh: is your ECS-fu strong enough that you could map the following log attributes off the top of your head? - https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/kask/+/refs/heads/master/logging.go#46 [21:12:06] time -> @timestamp and msg -> message are straightforward enough [21:12:49] but skimming through the docs, it's not obvious to me how to map the others [21:13:09] I'd guess: level => log.level, RequestID => event.id, appname => service.type [21:14:18] assuming appname reasonably coorelates with the service.type definition [21:17:42] "name of thing that is logging" ? [21:18:36] oh damn, log.level seems obvious now [21:18:41] but event.id? [21:19:14] my guess was trace.id [21:20:01] hmm, good point [21:20:17] shdubsh: RequestID is passed in by the caller, that's definitely one that needs to be consistent everywhere [21:20:58] ah, yes then that would make sense. if it's consistent between services, trace.id makes the most sense. [21:22:19] cool, and service.type is meant to be the name of the service/application? [21:22:25] "the thing that logs"? [21:22:30] yeah, that's how I read it [21:22:42] oh, yeah [21:23:11] oooh. [21:23:31] I see the distinction between service.type and service.name, I think [21:23:44] different than service.name in the sense that service.name indicates purpose. like jobrunner-mediawiki and api-mediawiki. both are mediawiki, but have different purposes [21:24:07] another example, logging-elasticsearch vs. search-elasticsearch [21:24:59] probably have those backwards though :/ - [21:26:10] yeah, or in this case, kask is the service.type and sessionstore is the service.name? [21:26:16] (as an example) [21:29:30] That sounds about right. [21:51:14] shdubsh: thanks! [22:41:22] 10serviceops, 10SRE, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), 10Patch-For-Review, 10Platform Engineering (Icebox): Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10Jdforrester-WMF) [22:57:04] 10serviceops, 10SRE, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), 10Patch-For-Review, 10Platform Engineering (Icebox): Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10Iniquity) [22:58:17] 10serviceops, 10SRE, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), 10Patch-For-Review, 10Platform Engineering (Icebox): Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10Jdforrester-WMF)