[06:46:02] morning people [06:49:17] trying to schedule the work for https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/492948/ [06:49:24] but I'd need a +1 from somebody first :) [06:49:52] (mcrouter async for set/delete to codfw) [08:14:42] <_joe_> elukey: I am about to give -2 to that patch. I don't like its all-or-nothing approach [08:14:51] <_joe_> also, did you have time to do proper testing? [08:15:32] <_joe_> I would've gone with defining both pools [08:15:50] <_joe_> (one with poolroute, one with asyncroute) and let the app layer decide which to use [08:20:15] <_joe_> I can help doing that [08:22:53] _joe_ my idea was to first ensure that there was an agreement, then test the config on one host properly. I didn't check the diff of the config in dept, my plan was to do it before applying to the first host via pcc (to avoid missing bits). Fine with the -2, I just wanted to make progress :) [08:23:21] <_joe_> nah it's ok, but that's the way to go long-terms [08:24:26] maybe let's comment in the task about which way we prefer etc.., so Aaron and Timo can comment [08:29:22] hmm just found out we have moved yesterday's evening to today ? [08:29:41] I thought that the Thursday one would be enough [08:31:07] yep and i've just realized i cannot make it today because i have a doctor's appointment [08:31:30] so maybe we should skip meeting today and just do the one on thursday [08:31:37] that sounds sensible [08:33:46] <_joe_> ok! [08:34:03] <_joe_> jijiki: you should deploy the kafka1018 removal :) [08:34:13] oh I saw that yes [08:34:45] :D:D [08:38:36] <_joe_> jijiki: if by end of week you manage to finish the work on the cron, and switch a few more jobs, that would be ok for this week [08:38:56] <_joe_> then I hope next week we merge wmerrors and we can move the needle again [08:39:13] <_joe_> I would prioritize working on the job [08:39:43] I am off tomorrow, I can cont with more async jobs, I am less optimistic about doing crons this weel [08:39:46] week* [08:39:56] should we also focus on the restarts? [08:40:06] <_joe_> what do you mean? [08:40:50] php-fpm restarts due to low free opcache mem [08:41:10] <_joe_> that's what the cron is about, hence my confusion [08:41:21] ah that cron [08:41:36] ok, I was refering to mw cron and maint jobs :) [08:41:53] <_joe_> I can work on the cron if you don't think you can make it, it's a bit time sensitive [08:42:09] let me see how clinic looks like for this week [08:42:16] I will know in a few hours [08:42:31] ok ? [08:42:42] <_joe_> who's substituting you on clinic duty tomorrow? [08:43:07] I will do some things in the morning and check a bit during the day [09:01:02] <_joe_> does anyone remembers who is supposed to write down the SLIs/SLOs for mcrouter/memcached ? [09:02:05] i thought it was a join effort between jijiki and elukey but if they dont want to have no time we can do it [09:03:11] We discussed about general upgrading plans, but it is something that we can do, sure [09:06:30] fsero: nope, the SLI/SLO should be something that your sub-team will do as goal or similar during the next quarters [09:06:49] <_joe_> yeah, don't anyone worry, I'll look at the notes [09:07:04] <_joe_> but I think someone wrote down what we came up with? [09:07:10] elukey: agree i was just sharing what i remember [09:07:28] yep yep no hard feelings, I was only stating my memories as well :) [09:20:44] so is today's meeting on or off? I still see it on the calendar [09:21:07] <_joe_> lol poolcounter has a pretty brutal approach at choosing its running port [09:21:11] <_joe_> client_data.h:#define PORT 7531 [09:26:42] can anyone help with what https://phabricator.wikimedia.org/T223391 is about? [09:29:44] <_joe_> it's about deploying a new version of wikidiff2 [09:30:07] <_joe_> ask moritz for what was the situation atm [09:30:15] <_joe_> but that looks like something we need to do [09:31:02] <_joe_> quite urgently, even, given one month has passed by [09:31:47] <_joe_> looking at it, it seems that it will need a refresh of the package [09:31:56] <_joe_> and then deploying it [09:32:53] <_joe_> if we don't all get better at looking at tasks tagged serviceops, we're going to do board groomings [09:32:59] <_joe_> on our monday meeting [09:33:04] <_joe_> . [09:33:26] maybe we should, it wouldnt help 2-3 people looking at a task at the same time [09:33:32] or not looking at it [09:34:18] <_joe_> ok, in the meantime, I'm triaging it to "high" and assigning it to you. This needs to be worked on starting this week. [09:34:28] sure [09:34:54] 10serviceops, 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, and 3 others: Deploy Wikidiff2 version 1.8.2 with the timeout issue fixed - https://phabricator.wikimedia.org/T223391 (10Joe) p:05Triage→03High a:03jijiki [09:35:01] <_joe_> thanks :) [09:40:04] 10serviceops, 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, and 4 others: Deploy Wikidiff2 version 1.8.2 with the timeout issue fixed - https://phabricator.wikimedia.org/T223391 (10jijiki) [09:44:04] brb [09:53:46] jijiki: let me know if you need any further information wrt the wikidiff update, I handled the previous updates for it [09:54:09] I will nag you next week about it yes, tx ! [09:55:58] ack [12:07:19] 10serviceops, 10Gerrit, 10Operations, 10cloud-services-team: Change /r/p/ to /r/ on all hosts (where https://gerrit.wikimedia.org/r/p/ exists) - https://phabricator.wikimedia.org/T222093 (10Paladox) p:05High→03Normal [14:32:45] 10serviceops, 10Operations, 10Release Pipeline, 10Release-Engineering-Team, and 2 others: TEC3:O3:O3.1:Q4 Goal - Move cpjobqueue, Wikidata Termbox SSR (new service), Kask (session storage service) and ORES (partially) through the production CD Pipeline - https://phabricator.wikimedia.org/T220398 (10akosiari... [14:32:55] 10serviceops, 10Operations, 10Release Pipeline, 10Release-Engineering-Team, and 5 others: Introduce wikidata termbox SSR to kubernetes - https://phabricator.wikimedia.org/T220402 (10akosiaris) 05Open→03Resolved a:03akosiaris ` curl -s -I -X GET 'http://termbox.discovery.wmnet:3030/termbox?editLink=%2... [14:38:00] <_joe_> akosiaris: \o/ [15:16:15] <_joe_> gcal died on me [15:16:28] <_joe_> wer're not having our meeting, right? [15:21:29] no [15:40:54] yeah it died on us all [20:53:02] 10serviceops, 10Operations, 10ops-eqiad: Heating alerts / memory errors on mw1254 - https://phabricator.wikimedia.org/T204491 (10jijiki) 05Resolved→03Open p:05Triage→03Normal [22:24:38] 10serviceops, 10CX-cxserver, 10Citoid, 10Graphoid, and 10 others: Make services swagger specs standard compliant - https://phabricator.wikimedia.org/T218217 (10MSantos)