[00:01:38] 10serviceops, 10Operations, 10decommission-hardware, 10ops-codfw: decommission mc2028.codfw.wmnet - https://phabricator.wikimedia.org/T261168 (10Papaul) [00:06:53] added more info and structure on https://wikitech.wikimedia.org/wiki/Mirrors for T179856 as maintainer i just put "SRE" for now [04:37:38] 10serviceops, 10Operations, 10Platform Team Workboards (Clinic Duty Team): PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10tstarling) Has anyone got an idea for giving the HMAC key to the server without allowing the command to have access to it? Otherwise an... [07:11:16] 10serviceops, 10Prod-Kubernetes, 10Release Pipeline, 10Release-Engineering-Team: Blocker: helm fails to update the kubernetes-charts-incubator.storage.googleapis.com when running in CI - https://phabricator.wikimedia.org/T261182 (10Joe) [07:11:47] 10serviceops, 10Prod-Kubernetes, 10Release Pipeline, 10Release-Engineering-Team: Blocker: helm fails to update the kubernetes-charts-incubator.storage.googleapis.com when running in CI - https://phabricator.wikimedia.org/T261182 (10Joe) p:05Triage→03High Setting priority to high as it's a blocker for m... [07:34:01] 10serviceops, 10Prod-Kubernetes, 10Release Pipeline, 10Release-Engineering-Team: Blocker: helm fails to update the kubernetes-charts-incubator.storage.googleapis.com when running in CI - https://phabricator.wikimedia.org/T261182 (10Joe) Apparently the problem is not a firewall: when firewalling the google... [07:34:07] 10serviceops, 10Prod-Kubernetes, 10Release Pipeline, 10Release-Engineering-Team: Blocker: helm fails to update the kubernetes-charts-incubator.storage.googleapis.com when running in CI - https://phabricator.wikimedia.org/T261182 (10Joe) a:05Joe→03None [08:01:42] 10serviceops, 10Operations, 10Platform Team Workboards (Clinic Duty Team): PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10daniel) >>! In T260330#6408193, @tstarling wrote: > Has anyone got an idea for giving the HMAC key to the server without allowing the co... [08:10:39] 10serviceops, 10Prod-Kubernetes, 10Release Pipeline, 10Release-Engineering-Team: Blocker: helm fails to update the kubernetes-charts-incubator.storage.googleapis.com when running in CI - https://phabricator.wikimedia.org/T261182 (10Joe) a:03Joe The issue was found, thanks to @Legoktm for finding it: th... [08:16:36] when we have some people online, we can discuss T258978 "Service operations setup for Add a Link project" and how it relates to T260330 [08:17:18] <_joe_> effie: yeah I think we need to meet with them, tbh [08:17:31] shouldn't we discuss it here first [08:17:40] <_joe_> they have one main issue [08:17:47] <_joe_> how to distribute the ML models [08:18:03] <_joe_> I don't know if they can wait for that project to be done tbh [08:18:27] <_joe_> let' [08:18:31] <_joe_> s ask on the task [08:18:40] <_joe_> also I assume jayme is here :) [08:18:43] ok I will reply to them [08:19:47] * jayme peeks around the corner... [08:20:01] <_joe_> lemme invite kosta here [08:20:11] <_joe_> hello! [08:20:14] \o [08:20:21] hi :) [08:20:28] o/ [08:20:47] <_joe_> kostajh: so, if we had to choose ourselves, we think T260330 would be the way to go to encapsulate your binary [08:21:15] <_joe_> it would have the added advantage that it would mean it could be used by third-party mediawiki installations as a shellout [08:21:24] OK [08:21:31] <_joe_> but, that project will take some time [08:21:49] <_joe_> both on the PET side and on ours [08:21:50] I'm unclear on the steps of how we get to the point of https://github.com/dedcode/mwaddlink being accessible via T260330, like, what would need to be done exactly? [08:22:32] <_joe_> kostajh: mostly wait for that to be ready [08:22:53] <_joe_> now, I was thinking we could go the following way, but you'll need to check with platform [08:23:28] we are aiming to use this tool in production sometime next quarter, but yeah not sure on the timeline for T260330 [08:23:50] <_joe_> if the dependencies are not crazy, we could deploy the binary as-is and shellout from mediawiki, and move it once that service is ready [08:24:28] <_joe_> kostajh: we also have the unsolved problem of how to distribute the models [08:24:39] <_joe_> how large are those? [08:25:41] _joe_: I'm not sure, but I believe they are in the range of a few MB up to 50 MB. [08:25:55] <_joe_> ok [08:26:35] _joe_: so we would compile a binary from the python source, include the model, and that would all go in the GrowthExperiments extension? Or would it be in a separate repo? [08:27:06] <_joe_> kostajh: usually we try to deploy programs we shellout to as debian packages, but I'm not sure how easy that would be in this case [08:27:41] <_joe_> but, before we get to solutions, I think we need some info, I'm adding my questions on the task [08:27:59] <_joe_> sorry if we lagged behind, but we had a huge outage to deal with and we have the dc switchover next week [08:28:05] <_joe_> so our schedule is very very tight [08:28:22] it's ok [08:29:28] <_joe_> tbh I'm not sure what's the best solution here [08:31:22] hi mgerlach :) _joe_ / effie, mgerlach is working on mwaddlink [08:31:39] 10serviceops, 10GrowthExperiments-NewcomerTasks, 10Operations, 10Product-Infrastructure-Team-Backlog: Service operations setup for Add a Link project - https://phabricator.wikimedia.org/T258978 (10Joe) I have a few questions for you, before giving a refined recommendation: - do you think you'll need to de... [08:31:47] and could probably help with questions you might have specific to the tool [08:31:52] <_joe_> kostajh, mgerlach see the questions there [08:32:08] <_joe_> I was looking at the repo and it's a python notebook, not a python package [08:32:23] <_joe_> so there is no requirements.txt or any setup.py listing the dependencies [08:32:48] _joe_: ok, cool. Those questions look like ones for mgerlach [08:32:56] _joe_: indeed, it is still in a research-state [08:33:10] <_joe_> now, to explain my line of reasoning [08:34:20] <_joe_> if this can be easily packed as a binary with small footprint (including all dependencies), I would probably suggest to go with shellout and later move to the php microservice described in T260330 [08:34:29] would it be reasonable to plan on a toolforge based solution until T260330 is ready? especially since the tool would be called in the context of a job execution, and there would be a graceful fallback in the UX if there is a problem with the tool execution? [08:34:31] <_joe_> but from what I see from the repo, this has fat dependencies [08:35:04] <_joe_> kostajh: frankly, I think this could and should live as a separate service in production, if it has a lot of python dependencies [08:36:51] <_joe_> for doing so, you'd need to basically follow https://wikitech.wikimedia.org/wiki/Deployment_pipeline/Migration/Tutorial, in particular I think you should look at https://wikitech.wikimedia.org/wiki/Deployment_pipeline/Migration/Tutorial#Creating_a_Docker_Image [08:37:02] <_joe_> but we can get there if needed [08:37:14] <_joe_> the process is a lot less scary than you might think :) [08:37:43] btw, in terms of timeline, we are aiming to have the feature which relies on this tool in production by December. So, that would mean we'd probably want this service live by the end of September or early October to give us time to iron out any issues. I realize that's very soon. [08:38:19] <_joe_> that's not really very soon if you can self-serve most stages. Let's start by ironing out those questions first [08:43:35] 10serviceops, 10Prod-Kubernetes, 10Release Pipeline, 10Patch-For-Review: Refactor our helmfile.d dir structure for services - https://phabricator.wikimedia.org/T258572 (10Joe) [08:44:00] 10serviceops, 10Prod-Kubernetes, 10Release Pipeline, 10Release-Engineering-Team, 10Patch-For-Review: Blocker: helm fails to update the kubernetes-charts-incubator.storage.googleapis.com when running in CI - https://phabricator.wikimedia.org/T261182 (10Joe) 05Open→03Resolved Thanks @Legoktm for the help! [08:44:39] _joe_: ok. so as I understand it, by following https://wikitech.wikimedia.org/wiki/Deployment_pipeline/Migration/Tutorial, you'd end up with a container that, if you shell into it, lets you run `python /path/to/some-script.py`. So we would still need the web API to interact with this python script, and that's where T260330 comes in, is that right? [08:45:24] <_joe_> kostajh: not exactly, my suggestion is this is complex and developed enough it should be its own webservice [08:45:57] <_joe_> which could basically be a small flask/django/whatever layer around the current script [08:46:15] right [08:46:17] <_joe_> but let's reason about it once I have all the data [08:46:49] <_joe_> to be clear, the model for the stuff that iwll go via T260330 is: you have a base image, and you just add a binary [08:47:02] <_joe_> and that isn't designed to be actively developed [08:49:17] <_joe_> I am completely unsure about support for python webapps in blubber, but we chan check about it later. [08:50:08] <_joe_> the reason why I think shelling out, even temporarily, is not optimal, is that I guess this would come with a flock of python dependencies, and it might not work on debian stretch (where mediawiki runs) [08:50:25] _joe_ dependencies are a bunch of python packages (all installable via pip) [08:50:49] <_joe_> mgerlach: we have a strict ban on running pip in production [08:51:21] <_joe_> so we'd need to create debian packages for everything missing, or go via scap3, which is a mildly horrible hack [08:53:45] _joe_: I'm guessing this tool would be actively developed, assuming that the user-facing feature is moderately successful. we'd probably be seeing updates to the algorithm, additional features added, etc. But I don't know on what cadence or timeline [08:54:23] <_joe_> that's already "often" for the purposes of my question above [08:55:49] 10serviceops, 10Operations, 10Traffic, 10conftool: confd's watch functionality appears to be partially broken when interacting with etcd 3.x - https://phabricator.wikimedia.org/T260889 (10Joe) For the record, the problem is more general, and also affects servers connecting to etcd 2.x - the watch functiona... [09:58:23] 10serviceops, 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10jcrespo) [11:21:46] <_joe_> tarrow: I have a tiny patch to termbox here https://gerrit.wikimedia.org/r/c/wikibase/termbox/+/622329 [11:22:21] <_joe_> I know nothing about typescript, so I mostly cargo-culted it. It's simple enough that my copypasta wasn't that harmful, hopefully :) [11:43:15] Cargo as in https://en.wikipedia.org/wiki/Cargo_cult or https://www.mediawiki.org/wiki/Extension:Cargo ? :-D [11:47:41] <_joe_> the former ofc [12:20:39] 10serviceops, 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service: Deploy push-notifications service to Kubernetes - https://phabricator.wikimedia.org/T256973 (10jijiki) [12:26:51] 10serviceops, 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service: Deploy push-notifications service to Kubernetes - https://phabricator.wikimedia.org/T256973 (10jijiki) Push-notifications is up and running in staging. Our next step is to perform the LVS steps and expose the ap... [13:51:52] 10serviceops, 10Operations: assess and re-evaluate 'weight' settings of appservers in codfw - https://phabricator.wikimedia.org/T261159 (10Joe) I would rather try to elaborate starting from what eqiad does with similar hardware. The api cluster has, excluding servers to decom 65 servers, distributed as follo... [14:27:43] <_joe_> jaybwe already had two patchsets fail because of random failures in fetching charts [14:50:52] _joe_: is that some weird reference to my name? :-) [14:51:08] <_joe_> yes [14:51:10] <_joe_> ahahaha [14:51:12] <_joe_> typo :P [14:51:23] <_joe_> I think chartmuseum is failing to respond at times [14:51:37] <_joe_> also, we should probably cache the charts in the CDN? [14:55:53] hmm, thought this had happened only once. Will check the logs now that you had issues as well. Do you have a timestamp/jenkins link? [14:57:13] re caching: Yes. We should do that. I guess that the backend needs to provide appropriate headers in the response for stuff to get cached...and unfortunately this seems to not be the case for chartmuseum [15:05:19] _joe_: interestingly there have not been any responses != 20{0,1} from chartmuseum today [15:34:40] <_joe_> uhm [15:48:27] but that does not mean that it's not broken ofc ;-) what we probably do now (with your patch to the linter) is a lot more GETs of the repo index. helm code suggests that it gets unexpected content when that "no API version specified" error is raised. Will try to reproduce later (and talk to traffic if we can do something about caching even if the backend does not send headers) [16:21:18] jayme: yeah we can do something in the VCL for sure [17:44:18] 10serviceops, 10Scap, 10Release-Engineering-Team-TODO (2020-07-01 to 2020-09-30 (Q1)): Deploy Scap version 3.15.0-1 - https://phabricator.wikimedia.org/T261234 (10LarsWirzenius) [20:19:48] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install kubernetes1017.eqiad.wmnet - https://phabricator.wikimedia.org/T258747 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson [20:19:54] 10serviceops, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install kubernetes1017.eqiad.wmnet - https://phabricator.wikimedia.org/T258747 (10Jclark-ctr) Racked and cabled host kubernetes1017 A5. U31. Port 31 [22:59:59] 10serviceops, 10MediaWiki-General, 10MediaWiki-Stakeholders-Group, 10Release-Engineering-Team, and 4 others: Drop official PHP 7.2 support in MediaWiki 1.35 - https://phabricator.wikimedia.org/T257879 (10Reedy) {T261044} is somewhat amusing... The code in parsoid isn't actually compatible with PHP 7.2.22.... [23:08:33] 10serviceops, 10MediaWiki-General, 10MediaWiki-Stakeholders-Group, 10Release-Engineering-Team, and 4 others: Drop official PHP 7.2 support in MediaWiki 1.35 - https://phabricator.wikimedia.org/T257879 (10matmarex) The array_key_first() method is provided in the Composer module 'symfony/polyfill-php73', whi... [23:22:04] 10serviceops, 10MediaWiki-General, 10MediaWiki-Stakeholders-Group, 10Release-Engineering-Team, and 4 others: Drop official PHP 7.2 support in MediaWiki 1.35 - https://phabricator.wikimedia.org/T257879 (10Reedy) ` $ composer depends symfony/polyfill-php73 __root__ dev-master requires symfony/polyf... [23:22:53] 10serviceops, 10MediaWiki-General, 10MediaWiki-Stakeholders-Group, 10Release-Engineering-Team, and 4 others: Drop official PHP 7.2 support in MediaWiki 1.35 - https://phabricator.wikimedia.org/T257879 (10Reedy) >>! In T257879#6410791, @matmarex wrote: > The array_key_first() method is provided in the Compo...