[00:00:23] <bd808>	 !log tools Joining tools-k8s-worker-59 to the k8s worker pool
[00:00:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[00:01:21] <bd808>	 !log tools Joining tools-k8s-worker-60 to the k8s worker pool
[00:01:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[02:46:27] <wm-bot>	 !log tools.openstack-browser <bd808> Updated cache purge crontab to use https://openstack-browser.toolforge.org/ url.
[02:46:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.openstack-browser/SAL
[02:56:09] <hare>	 bd808: has any progress been made on bring-your-own-container workflows since last year?
[02:57:18] <bd808>	 hare: some experiments with build packs to create containers, but honestly I don't think Toolforge will ever support true bring-your-own containers
[02:57:54] <bd808>	 they would break most assumptions that Toolforge workflows expect
[02:58:11] <bd808>	 NFS wouldn't work, user lookups wouldn't work, etc
[02:58:51] <hare>	 Did anything ever come of the "general-k8s" project?
[02:59:18] <bd808>	 not that I have heard of. I think addshore found other things to work on
[02:59:31] <hare>	 he sure did!
[03:01:04] <bd808>	 We are in planning stages for a second k8s deployment using the Puppet roles and tooling that were built for the 2020 Kubernetes cluster in Toolforge (for PAWS). if that goes well we may at least end up with reusable Puppet code and processes for spinning up more k8s clusters
[03:01:45] <bd808>	 but everything is going slow right now because of various roadblocks with hardware and staff time
[03:02:05] <bd808>	 so I guess, don't hold your breath but also never give up hope :)
[03:03:26] <hare>	 What would be used for a multi-tenant Kubernetes cluster anyway? Is there software that already manages this?
[03:05:02] <bd808>	 hare: there are PaaS things like OpenShift and Rancher that build a soft multi-tenant environment on top of a Kubernetes cluster
[03:05:35] <bd808>	 more common is "managed Kubernetes" where a separate cluster is built for each tenant. That's basically what GKE is
[03:06:32] <hare>	 in lieu of building our own infrastructure, we could give out grants to use GKE :)
[03:06:58] <bd808>	 I would not be opposed to that at all honestly
[03:07:39] <bd808>	 I wouldn't want to manage the grant review and fulfillment, but I don't see WMCS as the "only" place for Wikimedia code to run at all
[03:08:34] <hare>	 A long-term goal of mine is to create the first Wikimedia-certified partner cloud
[03:10:22] <bd808>	 ambitious, but not a bad idea at all
[08:45:52] <Jayprakash12345>	 !log tools.indic-wscontest Add edit button (T251907)
[08:45:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.indic-wscontest/SAL
[08:45:56] <stashbot>	 T251907: Indic-wscontest: Add edit button - https://phabricator.wikimedia.org/T251907
[09:58:57] <arturo>	 !log toolsbeta livehacking toolsbeta-puppetmaster-03 with https://gerrit.wikimedia.org/r/c/operations/puppet/+/594471 (T251297)
[09:59:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[09:59:00] <stashbot>	 T251297: Refactor the toolforge::k8s::kubeadm* modules - https://phabricator.wikimedia.org/T251297
[10:30:27] <nokib>	 !help
[10:30:27] <wm-bot>	 If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-kanban
[10:30:49] <arturo>	 hi there nokib 
[10:30:57] <arturo>	 may I help you?
[10:31:00] <nokib>	 Hi
[10:32:09] <nokib>	 I have the directory "public_html" in /data/project/mytool
[10:33:09] <nokib>	 And there is a script named "gipull.php"
[10:34:04] <nokib>	 But when I go to the url "https://tools......./gipull.pho"
[10:34:19] <nokib>	 I find status code 500
[10:34:55] <nokib>	 !help
[10:34:55] <wm-bot>	 If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-kanban
[10:35:19] <arturo>	 nokib: can you provide the full URL?
[10:40:14] <nokib>	 Ok the problem solved. Thanks a lot
[13:33:21] <wm-bot>	 !log tools.wikiloves <jeanfred> Deploy latest from Git master: 1a7a6c8 (T251986), b13167a
[13:33:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikiloves/SAL
[15:35:31] <mutante>	 !log git - instance gerrit-test7 was reachable and suddenly it wasn't, paladox suggested it is ferm but it's unclear what changed anything, rebooting it
[15:35:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL
[17:56:29] <bd808>	 !log tools Cordoned tools-k8s-worker-[16-20] in preparation for decomm (T248702)
[17:56:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[17:56:34] <stashbot>	 T248702: Reconfigure the Toolforge k8s workers to map their unused disk to /var/lib/docker - https://phabricator.wikimedia.org/T248702
[18:04:10] <bd808>	 !log tools Draining tools-k8s-worker-[16-20] in preparation for decomm (T248702)
[18:04:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[18:04:15] <stashbot>	 T248702: Reconfigure the Toolforge k8s workers to map their unused disk to /var/lib/docker - https://phabricator.wikimedia.org/T248702
[18:14:23] <bd808>	 !log tools Shutdown tools-k8s-worker-[16-20] instances (T248702)
[18:14:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[18:14:27] <stashbot>	 T248702: Reconfigure the Toolforge k8s workers to map their unused disk to /var/lib/docker - https://phabricator.wikimedia.org/T248702
[18:24:23] <bd808>	 !log tools Updated "profile::toolforge::k8s::worker_nodes" list in "tools-k8s-haproxy" prefix puppet (T248702)
[18:24:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[18:24:28] <stashbot>	 T248702: Reconfigure the Toolforge k8s workers to map their unused disk to /var/lib/docker - https://phabricator.wikimedia.org/T248702
[19:05:17] <ebernhardson>	 exploring some ideas for an upcoming project, is it still disallowed to manipulate wmcs vm's programatically via API's? I'm thinking something like an instance that boots, loads a fresh dump of data, and can be replaced programatically by a new instance with a new dump
[19:06:04] <ebernhardson>	 but would require being able to programatically boot instances, change web proxies, probably more
[19:08:02] <bd808>	 ebernhardson: we don't have a good way to expose that kind of automated power to Cloud VPS tenants right now. The things you are thinking about are actually possible with OpenStack APIs. And we actually have some tooling for WMCS admins that can do some of them already.
[19:08:31] <bd808>	 The hard part today is securing the authn aspect of exposing these things
[19:09:29] <bd808>	 ebernhardson: It would be great to talk about this more though. Even if we can't make it a simple process I think we could make it a possible process
[19:09:36] <ebernhardson>	 yea makes sense, certainly openstack allows it (how else could rackspace exist), but making it available with our public nature certainly adds complications
[19:10:17] <ebernhardson>	 chances are we will let the system have a little downtime and do a reload on the instance, but thought there might be some interesting ways to avoid the downtime
[19:10:26] <ebernhardson>	 this is essentially a beta wdqs for commons
[19:11:06] <bstorm_>	 !log toolsbeta updated toollabs-webservice to 0.69 for toolsbeta
[19:11:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[19:11:17] <bd808>	 ebernhardson: to get some idea of things we have made possible for the WMCS team itself, take a look at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Deploying#worker_nodes and expand the "Shell script for quickly creating N new worker node instances" box
[19:11:47] <bd808>	 that's a script that I hacked up to automate provisioning many Kubernetes worker nodes
[19:12:52] <ebernhardson>	 that doesn't look too bad, seems the api's are all behind the various shell commands being invoked.
[19:13:23] <bd808>	 yeah, the same things could be done with python or whatever against the raw api. But there are also cli tools that can do most things
[19:14:04] <Krenair>	 I don't think it's our public nature that makes it difficult
[19:14:12] <bd808>	 "wmcs-openstack" is just a thin wrapper around the upstream `openstack` cli tool that does a bit of the authn dance for us
[19:14:55] <bd808>	 The hurdles are less that we are open/public and more that we didn't build things to allow raw api access at thestart
[19:15:15] <bd808>	 so we have to carefully move towards opening that up
[19:15:34] <Krenair>	 I have this repo of random things that rely on access to these APIs: https://github.com/Krenair/wmcs-misc-scripts/
[19:15:39] <Krenair>	 it's all read-only, novaobserver creds, etc.
[19:15:44] <ebernhardson>	 i suppose i was thinking that often internal-only services can be more soft and gooey, not that they should be but often enough are. With something more publicly facing you can't run for 18 months in a sorta-ok because we trust people state
[19:15:56] <Krenair>	 but they have to be run from inside labs because of this current constraint, can't be run from my laptop
[19:16:51] <bd808>	 ebernhardson: the squishy thing that we could do is code that we collaborate on that ends up running from the trusted hosts that can talk to the control plane apis directly.
[19:17:21] <bd808>	 basically putting scripts/services into Puppet that do the work
[19:17:27] <ebernhardson>	 bd808: hmm, that might be a possiblity. Indeed the orchestration should be minimal
[19:17:28] <Krenair>	 (novaobserver creds being the only thing allowed to authenticate to those APIs from within labs IIRC)
[19:18:11] <bd808>	 Krenair: yeah, there is a special authn patch that lets novaobserver work from the VPS IP space
[19:18:12] <Krenair>	 ((with the exception of my dns-manager stuff for handling dns-01 cert challenges))
[19:18:32] <bd808>	 oh yeah, there is that route too
[19:18:50] <Krenair>	 so there's
[19:18:57] <bd808>	 but opening that up to automated instance control is a tiny bit scary :)
[19:19:00] <Krenair>	 1) IP restrictions only allowing traffic from labs
[19:19:21] <Krenair>	 2) authentication problems
[19:19:58] <Krenair>	 I think we had something custom to enforce MFA?
[19:20:06] <Krenair>	 did that work in horizon but not the raw keystone API or something?
[19:20:23] <bd808>	 yeah, the 2fa stuff is jsut hacked into horizon
[19:21:28] <bd808>	 Getting to the point where common tools like terraform could be used to automate instances in Cloud VPS is a long term goal
[19:21:55] <bd808>	 just not something that we have prioritized working on at this point
[19:22:15] <bd808>	 in part because there have been bigger fires to put out :)
[19:22:56] <bd808>	 most of the last 3 years of work by the WMCS team has been paying down really big piles of technical debt
[19:23:49] <bd808>	 but we are in a pretty decent place now modulo a few snags in our instance storage plans
[19:24:21] <bd808>	 thos snags are just temporary slow downs too, not really huge hurdles
[19:25:22] <bd808>	 anyway, ebernhardson if you want to send along some perfect world ideas I would be glad to look at them and talk with the team about what approximation we could make of them in reality :)
[19:25:24] <ebernhardson>	 having terraform or something in that same vein would certainly open up possibilities, there are lots of things we don't even consider because they are hard with the way something is setup
[19:26:27] <ebernhardson>	 will ponder it some more, see what the team thinks. thanks for the background!
[19:26:40] <bd808>	 If giovanni hadn't left us I think we would have terraform support by now :)
[19:26:55] <ebernhardson>	 heh
[19:27:55] <bd808>	 but there would be other things we have now that probably wouldn't have been built in exchange (like a lot of the control plane HA work that jeh has led)
[21:20:47] <bd808>	 !log tools Kubectl delete node tools-k8s-worker-[16-20] (T248702)
[21:20:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[21:20:52] <stashbot>	 T248702: Reconfigure the Toolforge k8s workers to map their unused disk to /var/lib/docker - https://phabricator.wikimedia.org/T248702