[09:44:57] 10serviceops, 10Release-Engineering-Team-TODO, 10Scap, 10PHP 7.2 support, and 3 others: Enhance MediaWiki deployments for support of php7.x - https://phabricator.wikimedia.org/T224857 (10Joe) I had a meeting with @thcipriani yesterday and we came up with the following contingency plan: - We will keep opca... [09:47:20] 10serviceops, 10Release-Engineering-Team-TODO, 10Scap, 10PHP 7.2 support, and 3 others: Enhance MediaWiki deployments for support of php7.x - https://phabricator.wikimedia.org/T224857 (10Joe) [11:05:18] 10serviceops, 10Operations, 10media-storage, 10Patch-For-Review: Swift object servers become briefly unresponsive on a regular basis - https://phabricator.wikimedia.org/T226373 (10Gilles) Error rate hasn't gone down at all, now we're just getting errors that time out at 1s instead of 0.5s... ` Jun 27 11:0... [11:06:26] so I don't know if everyone has seen it [11:06:32] i moved today's meeting to 30 mins later [11:06:45] because it overlaps with my own perf review [11:06:58] you can also start earlier if that's better for some [11:07:08] but i assumed maybe there's things you want to discuss with me, e.g. goals, cto search, etc [11:08:14] we need to do goals [11:08:48] we have the CPT meeting with joe before ours anyway [11:08:52] yes I have it marked as starting at 19.00 my time [11:28:38] <_joe_> we also have to discuss some tickets [11:29:37] 10serviceops, 10Operations, 10media-storage, 10Patch-For-Review, 10User-jijiki: Swift object servers become briefly unresponsive on a regular basis - https://phabricator.wikimedia.org/T226373 (10jijiki) [11:38:10] 10serviceops, 10Operations, 10WMDE-QWERTY-Team, 10wikidiff2, and 3 others: Deploy Wikidiff2 version 1.8.2 with the timeout issue fixed - https://phabricator.wikimedia.org/T223391 (10jijiki) @awight I will rollout the new version to production today [12:00:57] 10serviceops, 10Operations, 10media-storage, 10Patch-For-Review, 10User-jijiki: Swift object servers become briefly unresponsive on a regular basis - https://phabricator.wikimedia.org/T226373 (10Joe) Do we have metrics on the swift backends open connections / connections queues? without such information,... [12:04:02] 10serviceops, 10Operations, 10media-storage, 10Patch-For-Review, 10User-jijiki: Swift object servers become briefly unresponsive on a regular basis - https://phabricator.wikimedia.org/T226373 (10jijiki) @Joe I will start a more thorough investigation the following days, we'll see what will come up [12:04:55] 10serviceops, 10Release-Engineering-Team-TODO, 10Scap, 10PHP 7.2 support, and 3 others: Enhance MediaWiki deployments for support of php7.x - https://phabricator.wikimedia.org/T224857 (10jijiki) >>! In T224857#5288470, @Joe wrote: > - Whenver a rolling restart is needed, that will happen only on the serve... [12:17:24] 10serviceops, 10Operations, 10PHP 7.2 support, 10Performance-Team (Radar), and 2 others: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262) - https://phabricator.wikimedia.org/T224491 (10Joe) 05Open→03Resolved a:03Joe The immediate problem seems to be resolved given we've not see corrup... [12:19:37] 10serviceops, 10Release-Engineering-Team-TODO, 10Scap, 10PHP 7.2 support, and 3 others: Enhance MediaWiki deployments for support of php7.x - https://phabricator.wikimedia.org/T224857 (10Joe) >>! In T224857#5288802, @jijiki wrote: >>>! In T224857#5288470, @Joe wrote: > >> - Whenver a rolling restart is ne... [12:19:47] <_joe_> fsero: still need CRs? [12:20:05] <_joe_> I have a spare 30 minutes before an interview I can spend on that or merging my changes [12:21:01] akosiaris: said +1 if you can do a quick review that would be nice in any case I'm good I think [12:29:49] <_joe_> fsero: I have just one doubt - how is helmfile_log_sal.sh called? [12:30:26] <_joe_> I mean we will configure helmfile to use it as a plugin? [12:30:43] <_joe_> I'm just still ignorant, you can tell me "go read the tasks" :P [12:34:49] Helmfile accept hooks [12:35:16] After helmfile is called hemlfile_log_sal is called through thew helmfile hook [12:35:27] And also populate the log [12:35:35] With the release name etc [12:47:40] <_joe_> ok, great [12:47:58] <_joe_> I guess helm-diff is run before helmfile actually deploys things, right? [12:48:44] <_joe_> anyways, need to prepare for the interview [12:54:21] Yup is part of the recommended workflowe [12:54:31] Run helmfile diff before apply [13:55:29] Hey! I'd like to start trying our new Termbox service against test.wikidata.org. For that I think we will need to make another release. Is it ok to just duplicate and tweak my staging values? Or should I be asking someone first? [13:57:43] hi tarrow, we are working towards a solution to keep helm values in code. https://phabricator.wikimedia.org/T212130 i guess you have access to deployment servers and know how to use scap-helm now, right? [13:58:27] fsero: I have access and akosiaris walked it throuh with us. I'm yet to do my first scap-helm personally though [13:59:09] so i'll ask you to keep your tweaked values updated under /srv/scap-helm/ directory [13:59:43] sure; just like production and staging deploys [13:59:54] that way i can update my CR to include it when we switch to helmfile, which for staging is going to happen very soon like today or tomorrow. But instead of holding you please proceed [14:00:38] great! I'll probably start doing it in the next 30mins or so [14:01:00] sure, i'm around if you need more help [14:01:21] Thanks! :) Always good to have people around while I'm learning [14:21:48] fsero: So something like:`scap-helm termbox install production-test stable/termbox -f termbox-test-values.yaml` where my values yaml now has the right env variables and I changed service.deployment to "production-test" [14:22:31] you are missing the CLUSTER=staging env variable [14:23:13] If this will be used to server test.wikidata.org should I keep it in the staging cluster or should it be in both? [14:24:18] (both = both of the real clusters) [14:28:49] 10serviceops, 10Operations, 10media-storage, 10Patch-For-Review, 10User-jijiki: Swift object servers become briefly unresponsive on a regular basis - https://phabricator.wikimedia.org/T226373 (10fgiunchedi) >>! In T226373#5288615, @Gilles wrote: > Error rate hasn't gone down at all, now we're just gettin... [14:29:03] err tarrow hang on looking at your test values [14:29:16] thanks :) [14:29:46] I guess I may want to up the repllicas to also be 4? [14:31:54] it seems you want to create two deployments using the same port 3030 one querying wikidata.org and another deployment for test.wikidata.org [14:32:17] so it means that however uses termbox service sometimes will query wikidata.org and sometimes test.wikidata.org [14:32:33] s/however/whoever/g [14:34:00] fsero: ah! that is not what I want then. I guess I want to tweak either the port? [14:34:22] Or hostname? But I guess that isn't possible [14:38:35] so.. there are two parts on the deployment from your point of view, one are the values you are changing which configures the deployment in the cluster [14:38:57] one thing it doesnt do is LVS/DNS configuration [14:39:33] so if you want another release to point to test.wikidata.org you indeed need to change the port but just changing the port without the LVS part it would not get any traffic [14:39:56] so an LVS configuration and a new DNS record should create just for this [14:47:34] Right; that makes sense [15:04:54] fsero: so am I basically asking for a whole new service? With a new host of termbox-test. However I'll also need to tweak the port to (e.g.) 3031. Then add its own entries in dns and LVS as well as its own liveness checks in puppet? [15:05:37] I think I'd better go and make a ticket; I'm happy to try and put together the patches but I think there is a little to keep track of :) [15:08:15] tarrow: essentially yes, IMHO this shouldnt be this way but is this way right now and we should consider introducing an "ingress" that way we don't need to mangle with LVS and DNS servers every time we want to deploy something new. Thanks for creating the ticket :) [15:08:37] Or would I be better to just change the port and still use termbox.svc.eqiad.wmnet etc..? [15:09:19] since we can totally control where the traffic goes and no-one outside the cluster is hitting it [15:12:58] if you keep using termbox.svc.. that means you will not have a termbox deployment hitting wikidata.org [15:13:16] the question would be what do you want to achieve, then :-) ? [15:15:04] fsero: can't I keep termbox.scv:3030 hitting wikidata.org and termbox.svc:3031 hitting test.wikidata.org? [15:34:10] sure but that means introducing a new LVS block configuration [16:20:00] akosiaris: qq for you. [16:20:08] i'm going to separate out wmf specific stuff from eventgate code repo [16:20:25] going to make a gerrit repo for it, and will use it for deployment pipeline image builds [16:20:31] name pref? [16:20:37] was working with eventgate-wikimedia [16:20:42] but maybe eventgate/deploy ? [16:20:55] not sure if it is a 'deploy' repo, it def has code in it [16:21:42] ottomata: are you asking me to naming bikeshed? [16:21:47] haha i am! [16:21:57] but only a short one! [16:22:04] the docker image will be named after this [16:22:10] so thought you might have a pref [16:22:20] it was pointed out to me btw, that -gate is reminiscent of scandals and a president of the US [16:22:30] yeah but that's dumb. [16:22:32] not sure if you had thought of it :P [16:22:38] never crossed my mind [16:22:39] so is logicgate [16:22:41] haha i had. [16:22:53] s/eventgate/waterevent/ [16:22:54] gate is an older word than stupid watergate and other -gates [16:22:58] hahaha [16:23:25] akosiaris: Clearly you should have called it EventStile. ;-) [16:23:30] i did consider it! [16:23:37] not sure what is wmf specific about eventgate, but don't do the eventgate/deploy stuff [16:23:59] James_F: see line 355 and below in https://etherpad.wikimedia.org/p/event-platform [16:24:18] * James_F grins. [16:24:24] otherwise, I honestly don't have naming prefs [16:24:25] https://github.com/wikimedia/eventgate/blob/master/lib/factories/wikimedia-eventgate.js [16:24:29] ok, good enough for me [16:24:35] was looking for guidance on to use -deploy or not [16:24:36] i won't. [16:24:37] "(EventStile? naw)" [16:24:39] thank you! [16:27:44] yw [16:55:10] 10serviceops, 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10akosiaris) @WMDE-leszek, @Tarrow. Any feedback on the comment above? [16:58:37] 10serviceops, 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10Tarrow) @akosiaris Yep; we've interpreted it as something we really need before exposing it to real traffic. We've got a ticket open about... [17:06:49] 10serviceops, 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10akosiaris) >>! In T212189#5289724, @Tarrow wrote: > @akosiaris Yep; we've interpreted it as something we really need before exposing it to... [19:25:09] 10serviceops, 10RESTBase, 10Core Platform Team (RESTBase Split (CDP2)), 10Core Platform Team Kanban (Doing), and 3 others: Split RESTBase in two services: storage service and API router/proxy - https://phabricator.wikimedia.org/T220449 (10mobrovac) [22:06:41] 10serviceops, 10Operations, 10Release Pipeline, 10Core Platform Team (RESTBase Split (CDP2)), and 5 others: Deploy the RESTBase front-end service (RESTRouter) to Kubernetes - https://phabricator.wikimedia.org/T223953 (10mobrovac) Regarding the deployment plan, the main pain point is that we will need to ha... [22:11:46] 10serviceops, 10Operations, 10media-storage, 10Patch-For-Review, 10User-jijiki: Swift object servers become briefly unresponsive on a regular basis - https://phabricator.wikimedia.org/T226373 (10Gilles) At a glance on a given proxy the same object doesn't occur multiple times in a row. But the same desti...