[01:56:37] !log codesearch retargetting codesearch.wmflabs.org to codesearch6 (T242319) [01:56:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [01:56:40] T242319: Puppetize codesearch - https://phabricator.wikimedia.org/T242319 [01:57:05] :) [01:57:45] !log codesearch deleting temporary codesearch6.wmflabs.org domain [01:57:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [01:59:11] !log codesearch shutting down old codesearch4 instance (but not deleting yet) [01:59:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [02:00:10] !log codesearch shutting down and deleting temporary codesearch5 instance [02:00:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [02:02:49] mutante: thank you! :))) [02:53:00] !log toolsbeta Added legoktm as project admin [02:53:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [03:06:56] !log toolsbeta Added legoktm to "roots" sudoer policy [03:06:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [03:07:04] legoktm: ^ try now :) [03:09:41] bd808: works, ty :) [03:11:05] * bd808 wonders why we don't just set that sudoers group to "all project admins" [03:11:42] ah. because that is not currently an option [03:11:56] easy to do "all project members", but not admins [03:14:21] !log toolsbeta Demoted projectadmins not listed in the "roots" sudoer policy to project members just to avoid random confusion [03:14:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [04:36:47] !log tools wmcs-openstack quota set --cores 768 --ram 1536000 [04:36:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:39:01] !log tools wmcs-openstack quota set --instances 192 [04:39:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:15:41] !log tools Building tools-elastic-04 [05:15:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:37:04] CloudVPS hypervisor maintenance will begin in ~30 minutes. For more information and a list of virtual machines that will be rebooted please see https://lists.wikimedia.org/pipermail/cloud-announce/2020-January/000251.html [15:31:42] !log deployment-prep deploying ores 283f627a [15:31:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [19:21:00] James_F: the 1.23wmf style branch naming is no longer used, right? It's all wmf/1.23-wmf.1 now [19:22:05] valhallasw`cloud: Yes. I've kept adding the aliases to keep RTB working. It'd be nice to drop that. [19:22:22] valhallasw`cloud: See e.g. https://phabricator.wikimedia.org/project/view/4468/ [19:24:13] James_F: https://github.com/wikimedia/labs-tools-forrestbot/blob/master/utils.py#L106 [19:26:04] James_F: so REL1_23 still goes to #mw1.23, 1.23wmf6 is no longer used, wmf/1.26wmf9 is no longer used (?), wmf/1.27.0-wmf.1 goes to #1.27.0-wmf.1, the last three get ignored [19:26:08] So, we don't use the mw1\.\d\d\.0-wmf\d\d slug any more, just mw1\.\d\d\.0-wmf\.\d\d (aka with a .)? [19:26:25] OK, so I can drop the non-dot versions and not break stuff. Great. [19:26:45] I still would need to write the code to make it do that :-p [19:27:02] But given I've created them out to https://phabricator.wikimedia.org/project/view/4480/ [19:27:15] So I'll not be dropping them until 1.36.0-wmf.* [19:27:20] to clarify, the code I linked maps branches to tags [19:27:39] so the question is also whether the branches have now been standardized on the wmf/1.27.0-wmf.1 style [19:27:46] plus REL1_23 [19:27:47] They have. [19:27:55] Since 1.27 or so I think. [19:27:59] Ok, then I'm going to remove support for the other ones [19:28:11] Cool. [19:40:17] !log tools.forrestbot Deployed https://gerrit.wikimedia.org/r/566844 and https://gerrit.wikimedia.org/r/566845 [19:40:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.forrestbot/SAL [19:42:03] James_F: Exception: No PHID found for slug #1.35.0-wmf.16! [19:42:22] oh, wait. The name starts with that but that's not actually the hashtag [19:42:51] so that should still be #wm1.35.0-wmf.16 just not #wm1.35.0-wmf16 [19:43:30] Yeah. [19:44:39] !log tools.forrestbot deployed https://gerrit.wikimedia.org/r/#/c/labs/tools/forrestbot/+/566852/ (revert 566845) [19:44:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.forrestbot/SAL [19:45:26] Ok, everything is alive again. [19:45:53] Related Gerrit Patches: [19:45:58] WHAT IS THIS BLACK MAGIC [19:46:02] :D [20:16:49] !log openstack cloudvirt1013 set icinga downtime and powering down for hardware maintenance [20:16:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [20:17:52] !log openstack cloudvirt1013 set icinga downtime and powering down for hardware maintenance T241313 [20:17:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [20:17:59] T241313: cloudvirt1013: server down for no reason (power issue?) - https://phabricator.wikimedia.org/T241313 [20:27:07] how does the memory limit on the grid work? does it limit virtual address space usage? [20:30:47] bennofs: we use the "h_vmem" limit. The man page says "They impose a limit on the amount of combined virtual memory consumed by all the processes in the job. If h_vmem is exceeded by a job running in the queue, it is aborted via a SIGKILL signal" [20:31:31] hmm ulimit says my virtual memory is limited. I think that's the reason that java requires huge -mem values :/ [20:33:23] that might be, yes. Java certainly is not very happy running on the grid without a very high h_vmem limit. I haven't done any testing to see if java is less resource greedy on the Kubernetes cluster [20:34:19] I'll check kubernetes, would be nice to run a small webservice in java (but don't really want to consume 6g just for a small web API service) [20:35:59] bennofs: if you are starting fresh on testing, I would recommend following the instructions at https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration to do that testing on the new 2020 cluster [20:37:42] bd808: cool, didn't know about that. I can do everything with kubectl, and webservice CLI isn't necessary, right? [20:39:32] It is possible, yes. There are some needed parts in your pod description to make NSS (user/group lookups) and directory mounts work that webservice takes care of for you, but they can also be set manually. [20:40:08] ok, thank you! webservice does too much magic for me personally :) [20:40:44] that's pretty much the point of it ;) [20:40:57] not to do too much, but to hide a lot of tedious details [20:41:48] bennofs: putting a "toolforge: tool" label on your container is the triggering magic for both NSS and $HOME mounts into the pod [20:42:17] we ahve an admission controller that looks for that label and adds a lot of other stuff when matched [20:42:18] i think I'll just create a pod with webservice and then export the config as template [20:44:00] Webservice creates a deployment which includes a replicaset and a pod template. It also manages a Service and an Ingress. The Ingress is new in the 2020 cluster. [20:44:32] The old cluster only made a service and then relied on things outside of Kubernetes to route traffic [20:45:12] nice. I remember that I found that a bit confusing, how traffic was routed to my pod [20:46:33] The new cluster is double proxied, but once the first proxy figures out that the webservice is not running on grid engine it jsut passes off to the nginx-ingress on the new cluster [20:58:40] The CloudVPS maintenance for today is complete, there are no more virtual machine reboots or migrations scheduled for today. [21:02:57] jeh: thanks and sorry things did not go as planned :/ [21:05:39] Murphy's law was in full effect today. Even the replacement hardware turned out to be the incorrect part [21:06:04] but we were able to fix a couple other hosts and some software bugs [21:09:32] !log openstack cloudvirt1024 set icinga downtime and powering down for hardware maintenance T241884 [21:09:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Openstack/SAL [21:09:35] T241884: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 [21:11:05] Hey folks. I want to get a simple HTML file online in toolforge. What's the easiest way to do that? [21:11:24] I already have a tool in mind (ores-demos) [21:11:31] I don't need any server-side stuff. [21:12:59] halfak: the php7.3 kubernetes webservice is the "best" right now. `webservice --backend=kubernetes php7.3 start`. That will serve html + php from $HOME/public_html [21:13:26] legoktm and I are keen on making a lighter "static" type but have not settled on the details yet [21:13:31] Can I just run that from the login host after "become ores-demos"? [21:13:36] yup [21:13:40] Cool thank you! [21:14:05] halfak: oh! for a new tool, do these things too -- https://wikitech.wikimedia.org/wiki/News/2020_Kubernetes_cluster_migration#Manually_migrate_a_webservice_to_the_new_cluster [21:14:29] kubectl config use-context toolforge; alias kubectl=/usr/bin/kubectl; echo "alias kubectl=/usr/bin/kubectl" >> $HOME/.profile [21:14:48] that will all be the default in ~3 weeks [21:15:56] Should I find what's in public_html here: https://tools.wmflabs.org/ores-admin ? [21:16:03] Woops. Wrong URL. [21:16:10] https://tools.wmflabs.org/ores-demos/ [21:16:12] And I see it! [21:16:23] Excellent. mwahahaha [21:16:26] Thanks bd808 [21:16:29] np [21:50:35] !log deployment-prep deploying ores 039251f (reverting to last good state) [21:50:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [21:52:03] halfak: thanks; last entry for that error I see is from 2020-01-23T21:13:51 [21:52:41] Gotcha. I just brought it back now. It should be back to normal. [21:53:06] Sorry for the trouble and thanks for creating the task. [21:53:06] okie [21:53:28] No troubles! [21:53:47] better on beta than in prod ;-) [21:56:02] Now that's a t-shirt phrase :) [21:56:54] I'll make sure to claim any royalties [21:57:35] * bd808 sends hauskatze his internet money [22:13:25] I'm trying to use pyexiv2 but getting this: OSError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found [22:14:35] DatGuy: is that something that is installed globally, or something you installed in a virtual env? [22:15:40] I installed the module locally, but I'm rewriting code now and it doesn't seem to work. Also saw this https://wikitech.wikimedia.org/wiki/Nova_Resource:Bots/SAL#October_2 so I suppose it at least worked for him [22:16:41] Did you install and run from the same environment? The versions of Python on the bastions+grid are different than the versions inside Kubernetes containers [22:20:12] !log tools.my-first-flask-oauth-tool Updated to 6eb70b916dceb9fac4ed48a9789dd8d388f1f47d [22:20:12] Yes. The error is being thrown from the venv site-packages [22:20:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.my-first-flask-oauth-tool/SAL [22:22:05] DatGuy: the error sounds to me like the wrong binary wheel was installed. One way that has happened to me before was running the pip from the wrong host (bastion vs webservice shell), but maybe there are other ways it can get messed up? [22:23:29] Also, the tool isn't using webservice [22:23:50] https://pastebin.com/MDyrCUC1 full traceback [22:28:16] the error looks a lot like this one (different binary python package) -- https://github.com/aksnzhy/xlearn/issues/256 [22:29:16] Our stretch hosts (bastion + grid) have GLIBC 2.24-11+deb9u3 [22:29:34] so I would expect something linked against glibc 2.27 to blow up [22:31:36] ah wait [22:31:44] it's because I'm using python3 isnt it [22:37:34] Would it be possible to support a python3 version of pyexiv? Something like https://pypi.org/project/py3exiv2/ or https://github.com/LeoHsiao1/pyexiv2 [22:41:05] DatGuy: you can install it in a virtual environment, yes. libexiv2-dev is installed so you can compile things against it [22:41:42] It does not look to me like a py3 version was packaged for Stretch [22:45:00] Welp. Building py3exiv2 throws '/usr/bin/ld: cannot find -lboost_python3 [22:45:00] ' [23:16:30] !log tools Building 6 new tools-k8s-worker instances for the 2020 Kubernetes cluster [23:16:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:38:39] !log tools Halted tools-k8s-worker build script after first instance (tools-k8s-worker-10) stuck in "scheduling" state for 20 minutes [23:38:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL