[10:13:53] <arturo>	 !log admin moved cloudvirt1023 to 'maintenance' host aggregate. Drain it with `wmcs-drain-hypervisor` to reboot it for T275753
[10:13:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[10:43:02] <arturo>	 !log admin moved cloudvirt1013 cloudvirt1032 cloudvirt1037 back into the 'ceph' host aggregate
[10:43:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[11:59:07] <arturo>	 !log admin cloudvirt1023 is affected by T276208 and cannot be rebooted. Put it back into the ceph hos aggregate
[11:59:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[11:59:12] <stashbot>	 T276208: cloud: libvirt doesn't support live migration when using nested KVM - https://phabricator.wikimedia.org/T276208
[11:59:18] <arturo>	 !log admin moved cloudvirt1012 to 'maintenance' host aggregate. Drain it with `wmcs-drain-hypervisor` to reboot it for T275753
[11:59:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[12:26:25] <arturo>	 !log integration shutdown integration-agent-docker-1009 because it was stuck in nova MIGRATING status, trying to fix that by hand
[12:26:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/SAL
[15:22:47] <bstorm>	 !log tools cleared queue error states...will need to keep a better eye on what's causing those
[15:22:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[15:23:59] <bstorm>	 !log tools depooling tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs for reboot. It isn't communicating right
[15:24:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[16:04:04] <suriname0>	 Question about toolforge web services: can I run two webservices for the same tool?  Specifically, I have a React frontend and a Flask backend. I could use the WSGI server running the Flask backend to serve the static index.html file, but the internet seems to say that serving static files via a WSGI server is terrible practice. If I were
[16:04:05] <suriname0>	 configuring this myself, I might have nginx both (a) function as a proxy server for the WSGI/Flask server and (b) serve the frontend index.html file. Not a web developer, but it seems like I have three options here: (1) if possible, run lighttpd and the python wsgi server under the same tool-name/URL, (2) serve the static React files from the wsgi
[16:04:05] <suriname0>	 server anyway, or (3) create a separate tool (named something like `<tool-name>-api`) to run the backend independently from the tool serving the front-end. This is probably a very beginner question, so any insight is appreciated.
[16:10:58] <arturo>	 !log admin [codfw1dev] restart nova-compute on cloudvirt2002-dev
[16:11:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:11:20] <Lucas_WMDE>	 suriname0: you could use tools-static.wmflabs.org/toolname to serve your JS/CSS
[16:11:43] <Lucas_WMDE>	 probably not the best fit for index.html – I’d probably serve that through Flask
[16:11:59] <Lucas_WMDE>	 (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Serving_static_files )
[16:19:14] <suriname0>	 Hmm. It does seem a little awkward to serve the "main page" of the app on a tools-static.wmflabs.org link. Is there anything _wrong_ with serving the HTML/CSS/JS via the Flask server?
[16:19:45] <suriname0>	 e.g. Should I be worried about load times, or will it be a pretty small increase?
[16:23:49] <Zppix>	 suriname0:  I wouldn't worry about load times, in my personal opinion
[16:27:36] <bd808>	 suriname0: serve the compiled frontend code from the uwsgi container. Chances are incredibly high that your tool will never see enough traffic for that to be a problem.
[16:29:05] <bd808>	 a dedicated webserver would have slightly better latency for a cold path and much better latency for a hot path, but in practice such things seldom matter
[16:29:43] <bd808>	 make it work first and worry about performance when it is an actual, rather than theoretical, problem
[16:31:58] <Zppix>	 Is it possible to upgrade an instance on cloud vps to buster without deleting the instance and recreating?
[16:34:04] <bd808>	 Zppix: technically, yes. You would do that like you would do a bare meta in-place Debian upgrade. In practice it really creates a mess that will haunt you and the Cloud VPS admins until you kill the instance. We loose track of the base OS and will alert on the image used being old at some point.
[16:35:10] <suriname0>	 Thanks for the advice Lucas_WMDE, Zppix, and bd808.  I'll move ahead with serving from the WSGI container for now, and if it ever becomes a problem I'll rearchitect.
[16:35:49] <Zppix>	 ugh, i guess just recreating it and having to play puppet cert roulette is easier
[16:36:23] <arturo>	 Zppix: depending on your use case, you may be interested in cinder volumes as a way to decouple VM image from data https://lists.wikimedia.org/pipermail/cloud-announce/2021-February/000366.html
[16:36:48] * Zppix looks
[16:36:51] <bd808>	 ^ yeah, that's a better idea that in-place upgrades
[16:38:04] <Zppix>	 is this similar to NFS?
[16:41:03] <arturo>	 Zppix: in this post https://techblog.wikimedia.org/2021/02/05/cinder-on-cloud-vps/ andrew compared it with NFS
[16:41:56] <dcaro>	 to some extent yes, but instead of having a 'filesystem tree' to attach to, you have a whole block device. Note that it's not meant to be writable by more than on VM, but for now used by only one VM at a time.
[16:42:39] <Zppix>	 arturo:  dcaro  bd808  awesome thanks guys
[16:44:47] <bd808>	 It's probably better to think of Cinder as being a pile of USB drives than as NFS. A drive that can be attached to one instance at a time, but that you can remove from instance A and mount on instance B
[16:46:41] <Zppix>	 I get an error when trying to run sudo prepare_cinder_volume:
[16:46:44] <Zppix>	 https://www.irccloud.com/pastebin/kfYVEskQ/
[16:56:29] <arturo>	 Zppix: that sounds like a bug. Please open a phab task
[16:56:44] <Zppix>	 arturo:  just under cloud vps project?
[16:57:01] <arturo>	 yes, you can also tag `cloud-services-team (kanban)`
[16:57:06] <Zppix>	 ok
[17:01:09] <Zppix>	 {{Done}} T276241
[17:01:10] <stashbot>	 T276241: Cinder: running prepare_cinder_volume results in error - https://phabricator.wikimedia.org/T276241
[17:16:09] <andrewbogott>	 !log admin rebooting cloudvirt1039 to see if I can trigger T276208
[17:16:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:16:13] <stashbot>	 T276208: cloud: libvirt doesn't support live migration when using nested KVM - https://phabricator.wikimedia.org/T276208
[17:33:07] <Zppix>	 arturo:  i found the issue and fixed it by doing a local hack of the script
[17:33:26] <Zppix>	 i can do a patch, wheres the script source located?
[17:35:22] <Majavah>	 Zppix: operations/puppet/modules/profile/files/wmcs/instance/prepare_cinder_volume.py 
[17:36:20] <Zppix>	 thanks
[17:45:06] <legoktm>	 !log tools.forrestbot resubscribed to mediawiki-commits list
[17:45:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.forrestbot/SAL
[17:49:30] <legoktm>	 !log tools.gerrit-reviewer-bot resubscribed to mediawiki-commits list
[17:49:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gerrit-reviewer-bot/SAL
[18:40:03] <wm-bot>	 !log tools.pbbot <peterbowman> Deploy 4e128e4: adopt new multi-instance replica arch
[18:40:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.pbbot/SAL
[21:14:41] <duesen>	 Uh, how do I get files from my home directory into the tool account? https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Tool_Accounts#Manage_files_in_Toolforge doesn't say
[21:24:53] <bd808>	 duesen: commonly you would "push" the files to the tool's $HOME with `cp` and then `become $tool; take $HOME/files-that-were-copied`
[21:25:34] <bd808>	 `take` is a little utility program in Toolforge that does chmod to the calling user (after some safety checks)
[21:26:29] <duesen>	 my problem is actually the cp
[21:26:52] <duesen>	 i can't fidn the home dir to write to :)
[21:27:05] <bd808>	 ~tools.{toolname}
[21:28:23] <duesen>	 ah, i missed the "tool." prefix! that explains it. silly me
[21:28:31] <duesen>	 bd808: thanks!
[21:30:58] <Zppix>	 I gave up trying to do standalone puppet on buster... 6  instances later... I just went back to using stretch for it.
[21:35:05] <duesen>	 for recurring jobs, do I just use cron?
[21:44:54] <legoktm>	 duesen: the simple answer is yes
[21:45:07] <duesen>	 legoktm: i'll take that :)
[21:46:28] <legoktm>	 https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid explains how to run stuff on the grid, cron will automatically run your stuff with jsub using mostly-sane defaults
[21:47:28] <duesen>	 yea, i don't need anything fancy. just a daily git pull + a schell script to collect stats
[21:53:29] <suriname0>	 Question about toolforge user databases (https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#User_databases): is there guidance on the amount of storage that is considered acceptable?  I'm currently scoping about 10-20GB for database tables (with relevant indices). What kind of storage reqs would require special permission?
[21:55:29] <bd808>	 suriname0: we do not have quota control on that shared service, so there is no hard limit. The biggest tool db is ~400GiB, but "normal" size is more like ~100MiB.
[21:55:57] <bd808>	 20GiB would put your tool in the top 20 largest dbs
[21:56:05] <bd808>	 https://tool-db-usage.toolforge.org/
[21:57:15] <bd808>	 someday™ we will have a replacement system based on OpenStack Trove or something similar that will give quota control. :)
[21:58:52] <suriname0>	 Thanks bd808!  In that case, I'll watch my usage and if I think I need to jump beyond a dozen GB I'll discuss with WMF first... Data is mostly processed revision data, with a lot of duplication with what exists in the elasticsearch and replica dbs.
[22:02:14] <RhinosF1>	 I should drop our old DBs
[22:03:32] <bd808>	 +1 for cleaning up any old junk that can be cleaned up
[22:04:33] <RhinosF1>	 I'll login
[22:04:43] <RhinosF1>	 I need to generate a new ssh key first
[22:06:42] <RhinosF1>	 bd808: would DROP DATABASE "s<id>_*"; work?
[22:06:51] <RhinosF1>	 With the right tool id
[22:08:29] <bd808>	 probably not. * is not what you think it is in SQL. It might work with %, but I don't think I've ever been bold enough to do a wildcard db drop. :)
[22:08:43] <RhinosF1>	 I'll just write the full name then
[22:11:10] <Zppix>	 worse case, you drop all of toolforge right </s>
[22:16:48] <wm-bot>	 !log tools.zppixbot <rhinosf1> MariaDB [(none)]> DROP DATABASE s53093__quotes;
[22:16:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL
[22:16:58] <wm-bot>	 !log tools.zppixbot <rhinosf1> MariaDB [(none)]> DROP DATABASE s53093__wiki;
[22:17:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL
[22:17:30] <RhinosF1>	 It worked
[22:17:42] <RhinosF1>	 I think
[22:18:59] <wm-bot>	 !log tools.zppixbot-test <rhinosf1> MariaDB [(none)]> DROP DATABASE s54287__quotes;
[22:19:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot-test/SAL
[22:19:13] <RhinosF1>	 Tools db usage
[22:19:15] <RhinosF1>	 Looks fine
[22:19:19] <RhinosF1>	 So I guess it worked
[22:19:33] <RhinosF1>	 Which is surprising for me & sql
[22:23:48] <wm-bot>	 !log tools.phabsearchemail <rhinosf1> Removed *.(out|log|err) full of useless messages saying it worked/general start stop stuff or empty and empty logs dir
[22:23:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.phabsearchemail/SAL
[22:23:59] <RhinosF1>	 I cleaned the world up now
[22:25:55] <RhinosF1>	 That tools works too well
[22:26:10] <RhinosF1>	 It's been running perfectly since just after last Hackathon
[22:46:17] <SPF|Cloud>	 I suppose https://phabricator.wikimedia.org/T127717 requires me to request a "Special" Cloud VPS project? per https://wikitech.wikimedia.org/wiki/Help:Access_policies#Application_Process
[22:48:03] <bd808>	 SPF|Cloud: it will certainly need a dedicated project, yes.
[22:48:34] <bd808>	 https://phabricator.wikimedia.org/project/view/2875/ is the place to go :)
[22:49:09] <SPF|Cloud>	 that seems plausible to me, given the sensitivity of the data
[22:49:35] <SPF|Cloud>	 I'll create a task using the form (you have given me) and will attach as a subtask, thanks :)
[23:04:49] <andrewbogott>	 !log cyberbot rebooting cyberbot-exec-iabot-02 for T276208
[23:04:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Cyberbot/SAL
[23:04:53] <stashbot>	 T276208: cloud: libvirt doesn't support live migration when using nested KVM - https://phabricator.wikimedia.org/T276208
[23:20:17] <ariutta>	 I just created an instance today, and 'sudo fdisk -l' shows:
[23:20:18] <ariutta>	 But the docs seem to say that's deprecated? https://wikitech.wikimedia.org/wiki/Help:Adding_Disk_Space_to_Cloud_VPS_instances#With_LVM_(deprecated_as_of_February,_2021)
[23:20:19] <ariutta>	 Should I not use the space in '/srv'?
[23:21:15] <Wurgl>	 ERROR 2003 (HY000): Can't connect to MySQL server on 'tools.db.svc.eqiad.wmflabs' (111 "Connection refused")
[23:21:20] <Wurgl>	 ?
[23:21:21] <ariutta>	 This IRC did like the fdisk output. It was: forwardslash dev forwardslash sda3  39845888 167770111 127924224     61G Linux LVM
[23:21:29] <andrewbogott>	 Wurgl: give it a minute
[23:21:34] <Wurgl>	 I do
[23:24:30] <andrewbogott>	 ariutta: Whatever you tried to paste didn't paste I think?  But in any case, yes, you probably want to use a Cinder volume for extra storage.
[23:25:03] <andrewbogott>	 Wurgl: sorry, I meant 'give it a minute while we fix it'
[23:25:52] <Wurgl>	 I did understand your message, no problem.
[23:27:00] <ariutta>	 Yeah, it didn't come through. It just showed that an instance created today still seems to be using LVM. Should I expect this to change soon?
[23:29:22] <bstorm>	 !log bringing toolsdb back up 😟
[23:29:23] <stashbot>	 bstorm: Unknown project "bringing"
[23:29:35] <bstorm>	 !log clouddb-services bringing toolsdb back up 😟
[23:29:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL
[23:30:23] <ariutta>	 Just today, I made an instance w/ flavor "g2.cores4.ram8.disk80", and it had /dev/sda3 as a "61G Linux LVM" partition. I'm guessing if I do that in a month or so, it'll be something else?
[23:31:04] <andrewbogott>	 ariutta: I wouldn't expect that to happen by default, can you tell me the fqdn of the VM?
[23:31:36] <andrewbogott>	 Soon we'll remove all flavors with anything other than disk.20
[23:32:44] <ariutta>	 andrewbogott: ok, good to know. I created an instance from the web GUI and chose flavor "g2.cores4.ram8.disk80"
[23:32:45] <ariutta>	 https://horizon.wikimedia.org/project/instances/
[23:33:09] <andrewbogott>	 ariutta: what is the name and project of the VM?
[23:33:46] <ariutta>	 andrewbogott: name is wpdata, project is wikipathways
[23:34:47] <andrewbogott>	 thanks
[23:36:43] <andrewbogott>	 ok, I see what you're seeing now; it's all good
[23:37:20] <andrewbogott>	 If you need more than the default allocation for cinder you can open a quota ticket which we'll review tomorrow.  
[23:41:58] <ariutta>	 andrewbogott: done :-)  https://phabricator.wikimedia.org/T276184
[23:41:58] <ariutta>	 When estimating how much storage we need, we can assume the OS files will not be on Cinder, right?
[23:42:54] <SPF|Cloud>	 requesting a Cloud VPS project: the first time I don't have to 'pay' for virtual machines :P
[23:43:19] <SPF|Cloud>	 although someone's contributions are worth a lot too..
[23:52:44] <ariutta>	 I read the docs on storage, and it sounds like the OS and other software will go onto the disk storage that's built into the instance, Cinder is for my data, and Shared NFS directories are for sharing data between instances. Does that sound right?