[08:39:09] 10serviceops: Create PHP 7.2.26 Wikimedia package - https://phabricator.wikimedia.org/T241224 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff This is already WIP, claiming the task [09:54:49] 10serviceops, 10MediaWiki-JobQueue, 10WMF-JobQueue, 10observability, 10Core Platform Team Workboards (Clinic Duty Team): job queue insert rate metrics gone from Grafana - https://phabricator.wikimedia.org/T238296 (10jcrespo) As an addendum, could something be improved related to T238296#5662905 ? It seem... [12:26:29] 10serviceops, 10Operations: Upgrade and improve our application object caching service (memcached) - https://phabricator.wikimedia.org/T240684 (10jijiki) [12:26:45] 10serviceops, 10Operations, 10ops-codfw: (Need By: Jan 15) rack/setup/install mc-gp200[123].codfw.wmnet - https://phabricator.wikimedia.org/T241796 (10jijiki) [12:26:47] 10serviceops, 10Operations: Upgrade and improve our application object caching service (memcached) - https://phabricator.wikimedia.org/T240684 (10jijiki) [12:31:39] 10serviceops, 10Operations, 10ops-eqiad: (Need By Dec 20) rack/setup/install mw13[49-84].eqiad.wmnet - https://phabricator.wikimedia.org/T236437 (10jijiki) Hey @Jclark-ctr ! Do we have an update about when we will have those servers ready? Thank you! [14:46:56] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install kubernetes10[08-15].eqiad.wmnet - https://phabricator.wikimedia.org/T241850 (10jijiki) [14:48:55] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install kubernetes10[08-15].eqiad.wmnet - https://phabricator.wikimedia.org/T241850 (10jijiki) [15:13:05] akosiaris: q for you if you are there [15:13:17] i'm about to do a deployemnt for eventgate-analytics and main [15:13:31] it will make them use new schema repositories [15:13:34] the schemas should be the same [15:13:44] i tested as much as I could all the events they use in staging [15:13:54] and everythign looks good [15:14:19] but, if something is wrong (some schema is different in a way I don't expect, or some schema I missed can't be loaded), stuff will break [15:14:28] i can deploy to codfw first and try to produce some dummy events [15:14:32] but i still maybe could miss something [15:14:48] is there some way to do an active canary type deployment in eqiad? [15:15:03] e.g. only deploy to a single new pod and include it in the cluster? [15:15:18] it would then get real traffic and I could watch logs for a bit to make sure it looked ok [15:15:52] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10jijiki) [15:16:08] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10jijiki) [15:47:17] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install kubernetes10[07-14].eqiad.wmnet - https://phabricator.wikimedia.org/T241850 (10jijiki) [15:53:34] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install kubernetes10[07-14].eqiad.wmnet - https://phabricator.wikimedia.org/T241850 (10jijiki) [16:10:40] 10serviceops, 10Arc-Lamp, 10Performance-Team: Backups for arclamp application data - https://phabricator.wikimedia.org/T235481 (10jcrespo) helium is no longer the backup server, you could go to https://grafana.wikimedia.org/d/413r2vbWk/bacula?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-job=webperf1002.eqiad.... [16:29:16] <_joe_> I'm going to be a bit late to the meeting [16:29:32] <_joe_> like 2-3 minutes, sorry [16:48:15] ottomata: live canary? Kind off, but in a quick way. It requires changes to the chart of course and it will have to be tested and all. So if you want it fast and easy, the answer is no. If you have a couple of days to spare and meddle with the charts, yes [16:50:45] i have days to spare [16:50:55] akosiaris: is there example? [16:51:22] an* [16:52:22] ottomata: not a working one really. There is some work in https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/469662/ [16:53:22] cool looking [16:53:48] it's really a WIP and untested, I wouldn't rely on it [16:54:19] well, we've got a yet unused eventgate instance out there (logging-external) i could experiment there [16:54:27] also i'll be doing a new one this Q: eventgate-analytics-external [16:54:47] 10serviceops: Update Wikimedia production to PHP 7.2.26 - https://phabricator.wikimedia.org/T241222 (10Jdforrester-WMF) [16:55:00] huh akosiaris how would this work? [16:55:05] in practice? [16:55:32] you do a separate release deployment for canary, and the pod is not given a service ip/port? [16:56:21] yeah, essentially 2 deployments, both addressed by the same service [16:57:05] and then just rely on the ratio of pods between versionX and versionY for the rate of reqs going to canary pods [16:57:34] how does this make the canary pod get addresed by the normal service? [16:58:42] the pods are selected by the service based on the labels [16:59:02] so we just need to have the same labels on both deployments [16:59:57] mainly [16:59:58] app: {{ template "wmf.releasename" . }} [16:59:58] ? [17:00:29] but, if .Release.Name is ...-canary [17:00:32] it won't match? [17:00:58] IIRC my approach in https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/469662/3/_scaffold/templates/service.yaml was mildly wrong [17:01:06] lemme revisit it for a few [17:01:07] ok, but something like that [17:01:27] we just need to make sure that the canary release somehow gets tagged with the exact same labels that the main one does [17:01:34] and then the main one's service will include that pod in it [17:01:35] ? [17:01:38] yes [17:01:42] ok cool [17:02:40] ottomata: so UX question [17:02:47] how would you prefer for this to be [17:03:00] a helm release with 2 version variables? [17:03:18] or 2 different helm releases (e.g. "production", "canary") ? [17:03:37] my approach in that patch went the latter way and then I met some roadbumps [17:04:03] then lost incentive to do it as the staging project was being scrapped [17:04:27] I am thinking that an approach of having a variable in values.yaml like [17:04:51] canary_version: null for not having canary functionality [17:05:03] or canary_version: blahblah for having it might be more doable [17:05:14] 'staging project was being scrapped' ? [17:05:37] the big staging one not the kubernetes cluster one [17:05:37] canary_version being the image version? [17:05:46] ah like mw-staging? [17:05:53] yup [17:06:01] i think canary_version with image version wouldn'tt be enough [17:06:03] that long and winded project [17:06:06] config settings are different [17:06:11] in my case now, i mostly am varying configs, not the image [17:06:15] in values.yaml [17:06:17] aha [17:06:28] in which case, back to the 2 different helm releases then [17:06:48] hmm I 've been sleeping on this for some months now, I think I have a clearer idea of how to do it now [17:06:54] oOo cool :) [17:07:01] what's your ETA though? [17:07:43] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10wiki_willy) a:05Joe→03Jclark-ctr [17:08:59] well i was going to do that stuff soon [17:09:04] but there isn't any pressing need really [17:09:07] we aren't blocking anything [17:09:08] 10serviceops, 10Operations, 10ops-eqiad: (No Need By Date Provided) rack/setup/install kubernetes10[07-14].eqiad.wmnet - https://phabricator.wikimedia.org/T241850 (10wiki_willy) a:05Joe→03Jclark-ctr [17:09:20] the stuff we want to do this quarter is the new instances anyway [17:09:45] for the running ones i was just syncing them up so everything would be the same [17:09:47] with the new ones [17:09:57] but i can do the new ones first and sync up the running ones later. [17:10:24] ok, I 'll take that as having a couple of weeks to get this tested [17:10:32] does that sound reasonable? [17:25:36] sure sounds fine [17:25:41] i'll stall on this then and do other stuff [17:25:42] ty! [20:04:23] 10serviceops, 10Proton, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Profile proton memory usage for Helm chart - https://phabricator.wikimedia.org/T238830 (10MSantos) The first tests indicate that the draft deployment chart doesn't have the proper configuration to run the service.... [20:13:55] 10serviceops, 10Proton, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Profile proton memory usage for Helm chart - https://phabricator.wikimedia.org/T238830 (10MSantos) Highlighting service configuration in the gerrit patch just to make things easier: https://gerrit.wikimedia.org/r/c/... [21:22:14] 10serviceops, 10Arc-Lamp, 10Performance-Team: Backups for arclamp application data - https://phabricator.wikimedia.org/T235481 (10Dzahn) Nice, thank you for confirming that @jcrespo. To test it and create a howto in the form of the pastebin below, I have started the restore of a single random logfile ("2020... [21:25:41] 10serviceops, 10Arc-Lamp, 10Performance-Team: Backups for arclamp application data - https://phabricator.wikimedia.org/T235481 (10Dzahn) Here we go, the example file has been restored. ` [webperf1002:/var/tmp/bacula-restores/srv/xenon/logs] $ sudo file daily/2020-01-09.excimer.thumb.log daily/2020-01-09.e... [22:32:19] 10serviceops, 10Arc-Lamp, 10Performance-Team: Backups for arclamp application data - https://phabricator.wikimedia.org/T235481 (10Krinkle) 05Open→03Resolved Yes.