[04:04:26] Noticing some suspiciously outdated responses at https://global-search.toolforge.org/?q=parseJSON&namespaces=2%2C4%2C8&title=%28Gadgets-definition%7C.*%5C.%28js%7Ccss%7Cjson%29%29 [04:04:33] > bug.wikipedia MediaWiki:Gadget-Twinkle.js |;\n*$)/g, "" ); } try { var options = $.parseJSON( optionsText ); // Assuming that our options [04:04:43] https://bug.wikipedia.org/w/index.php?title=MediaWiki:Gadget-Twinkle.js&action=history [04:04:54] It was last modified over 6 months ago [04:04:59] and there has been an edit after that as well [04:05:08] the before last edit made it no longer contain this phrase [04:05:51] [#mediawiki] 05:04 maybe the same cause as https://phabricator.wikimedia.org/T278721 - something being cached and not updating? [04:05:58] Yeah, I guess. [04:08:09] OK, posted on there :) [08:17:34] !log admin testing the upgrade_mons cookbook on codfw1 ceph cluster (T280641) [08:17:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:17:40] T280641: ceph: Upgrade to latest Nautilus/Octopus to fix CVE-2021-20288 - https://phabricator.wikimedia.org/T280641 [08:33:44] * Majavah wonders why my mail server is fine with mail from .eqiad.wmflabs domains [09:17:44] !log admin testing the upgrade_osds cookbook on codfw1 ceph cluster (T280641) [09:17:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:17:48] T280641: ceph: Upgrade to latest Nautilus/Octopus to fix CVE-2021-20288 - https://phabricator.wikimedia.org/T280641 [09:32:30] !log admin finished upgrade of ceph cluster on codfw1 using exclusively cookbooks (T280641) [09:32:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:32:34] T280641: ceph: Upgrade to latest Nautilus/Octopus to fix CVE-2021-20288 - https://phabricator.wikimedia.org/T280641 [10:03:36] hi! sort-of random question, I'm trying to estimate the cloud vps resources needed for an observability "full stack", ATM we have 25 vCPUs allocated, in your experience how does that translate into "real" cpu cores? what's the typical oversubscription from vCPUs to cores ? [10:09:48] godog: we try not to reach 2:1 in CPU oversubscription, currently we have 1896 physical CPUs and 2371 vCPUs [10:11:11] there are apps running in VMs that are way more sensitive to CPU things than others [10:11:47] the CI stack is an example of something traditionally over-sensitive to CPU virtualization quality [10:11:56] arturo: makes sense, thank you. to clarify physical cpus is that cores without hyperthreading or with hyperthreading or actual cpu packages ? [10:12:04] nice https://grafana-rw.wikimedia.org/d/000000579/wmcs-openstack-eqiad1?orgId=1&refresh=1m [10:12:13] ^ you can see there the current subscription level [10:12:35] nice [10:12:52] godog: I believe we use hyper-threading in all our hypervisors, but I may be wrong. [10:13:56] ack, yeah that'd be my expectation too [10:14:22] dcaro: yeah, that's the dashboard I was checkin [10:19:18] super useful, thank you [10:20:17] * dcaro is starting to play around grafana, so discovering little jewels [10:20:46] (though that was already shared by arturo during onboarding, just realizing the info they have xd) [10:21:45] we may want to create a similar table plane for the ceph cluster, not sure how we are dashboard-wise with the ceph cluster [10:22:00] s/plane/panel/ [10:23:02] yep, I got that as OKR this quarter (three dashboards at least, capacity planning, health and usage) [10:23:57] there's some nice dashboards already around, but I'll try to consolidate and remove non-working widgets/add more meaningful ones (leave some of the itty-gritty detailed ones for debugging) [10:27:43] 👍 [10:27:44] cool [10:28:23] as a side note, all the metrics I added surrounding "ceph network usage" may need a revisit [10:31:26] ack, you were already my "go-to" for those anyhow ;) [10:49:27] 🤩 [11:12:20] !log admin testing the drain_cloudvirt cookbook on codfw1 openstack cluster (T280641) [11:12:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:12:24] T280641: ceph: Upgrade to latest Nautilus/Octopus to fix CVE-2021-20288 - https://phabricator.wikimedia.org/T280641 [13:49:25] !log admin testing the drain_cloudvirt cookbook on codfw1 openstack cluster, draining cloudvirt2001 (T280641) [13:49:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:49:30] T280641: ceph: Upgrade to latest Nautilus/Octopus to fix CVE-2021-20288 - https://phabricator.wikimedia.org/T280641 [14:55:32] hi, I have just create a volume and attached to a VPS running in Wikimedia Cloud [14:55:48] From the instance details it tells me that the volume (called "data") is attached [14:56:18] to /dev/sdb [14:56:30] but if I login the VPS there is no /dev/sdb [14:56:44] should I reboot the machine or am I missing something? [14:57:03] it should work without rebooting [14:57:19] does `lsblk` show the drive? [14:59:56] Majavah: # lsblk [14:59:56] NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT [14:59:56] vda 254:0 0 160G 0 disk [14:59:56] ├─vda1 254:1 0 1007.5K 0 part [14:59:56] ├─vda2 254:2 0 19G 0 part / [14:59:56] └─vda3 254:3 0 141G 0 part [14:59:56] vdb 254:16 0 80G 0 disk [15:00:12] it's /dev/vdbm not /dev/sdb ... lol [15:00:16] é/dev/vdb [15:00:56] ok, how can I mount it? I want to mount to the dir `/srv` on the server [15:01:49] CristianCantoro: see https://wikitech.wikimedia.org/wiki/Help:Adding_Disk_Space_to_Cloud_VPS_instances [15:02:13] hmm, interesting, on my VM it is on /dev/sdb [15:03:45] Majavah: on most Unices I have used it's /dev/sd*, and IIRC it depends on the type of disk [15:05:00] https://tldp.org/HOWTO/Partition-Mass-Storage-Definitions-Naming-HOWTO/x99.html [15:05:02] is that a new VM or an old one? [15:05:42] I created it ~2.5 months ago [15:05:46] (we changed the driver on qemu side at some point, newer VMs get a faster one ;), iirc virtio-iscsi, that's probably /dev/sdX) [15:05:51] Majavah: thanks for the link [15:06:06] and the volume today [15:08:02] volume mounted, awesome :-)... thanks Majavah [17:17:11] hey! would anyone be around to answer some questions around cloud vps usage? [17:21:33] nikkinikk: it's better to just ask your question [17:24:45] !log toolsbeta rebooting toolsbeta-test-k8s-control-6 because it was "notready" for some reason [17:24:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:25:13] RhinosF1: haha fair enough. I was using toolforge to host a node api, but was running into performance issues. To see whether it was a code problem or hardware limitation i spun up a cloudvps instance under the "services" project and its performing much better. The documentation under cloudvps says to make sure it cant run on toolforge before putting it in cloud vps which seems like i have. [17:25:31] I wanted to make sure that this was an appropriate use of cloudVPS and if i was openly able to just spin up an instance like i had. [17:26:38] nikkinikk: I believe Toolforge has some resource limits. It depends how you were running it on Toolforge. [17:27:17] I was starting it up with the max about of cpu and memory that was available without requesting more [17:27:21] webservice --backend=kubernetes --mem=4Gi --cpu=1 node10 start [17:27:39] oh wait maybe not the max amount, i think --cpu=2 is the max. [17:27:57] cpu=1 is the max for a container [17:28:03] nikkinikk: yes, that's a fine thing to do. We try to suggest Toolforge over running a dedicated VPS because in the longer term that is typically less work for a volunteer. Toolforge gets rid of needing to take care of the underlying virtual machine. But since you are a paid contributor, do what works for you. :) [17:28:06] cpu=2 is the max for an entire namespace [17:28:14] So you are right with that [17:28:15] 👍 [17:28:17] There you go [17:28:25] Like magic the lovely folks have an answer [17:28:28] Also, what bd808 said :) [17:28:43] so in that case, am i ok to maintain the cloud vps instance that ive created? its performing much better : ) [17:29:02] woops didnt see bd808 message! [17:29:07] Now that's a toolforge command. Are you in toolforge or cloud-vps? [17:29:49] With cloud-vps you have your own VM, but it sounds like you are running in Toolforge using the webservice command. [17:30:00] Was originally toolforge but then moved to cloud vps bstorm [17:30:12] I was in toolforge using that command. Now im in cloud vps starting up my node application inside the vm via cli and keeping it running with the `screen` utility [17:30:29] bd808: thanks for the input! [17:30:35] nikkinikk: which cloud vps instance/project are you running it on? [17:30:50] under `services` project [17:31:16] I just named it `nikkiv2` to test it out [17:31:24] Majavah: [17:31:31] I think I've got it rereading the backscroll! [17:31:39] :) [17:32:16] I was just trying to get it all straight. [17:32:41] BTW: the CloudVPS interface is beautiful it was so easy to spin up and the documentation was so clear. :) [17:33:03] That's great to hear! [17:33:20] nikkinikk: if this is intended to be a permanent think, I'd recommend requesting a dedicated cloud vps project for this, we're trying to discourage team-specific/shared projects per "Umbrella projects" in https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_project#Reviews_of_Cloud_VPS_Project_requests [17:33:54] Majavah: Gotcha, will request a project in that case [17:35:16] andrewbogott: good feedback on horizon and the docs ^^ :) [17:43:23] Thanks nikkinikk [18:35:56] Majavah: Thought more about it and it is technically intended to be a proof of concept API, for a few internal teams to experiment with for a small subset of users. If thats the case, should I still create an individual project? [18:39:09] HTTP/1.1 301 Moved Permanently\n\nLocation: actual WMCS staff [18:39:33] since you seemed to be around, ping bstorm bd808 ^ [18:39:34] nikkinikk: It helps us sort out quotas and things when people do that, yes. I don't recall the purpose of the services project that well... [18:39:38] * bstorm checks... [18:41:03] We definitely have projects that have a relatively short lifespan [18:44:29] testing and proof of concept stuff have ended up in projects quite often [18:51:12] !log quarry ran apt updates without issues on all 4 servers. T266386 looks fixed. [18:51:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [18:51:17] T266386: APT sources are broken on Quarry hosts: "GPG error, the following signatures were invalid" - https://phabricator.wikimedia.org/T266386 [18:53:45] andrewbogott: Do you know what the `services` cloudvps project is for? It looks like a kind of random testbed. Apparently my computer needs a reboot so I cannot get on a shell to look for more info. [18:54:53] I don't know offhand but it looks pretty old [18:55:03] nikkinikk: I'd say that it doesn't matter much if you are only going to have it around for a few days/weeks anyway. If it needs to be messed with and hacked on for more time, it would be good to request a project. We turn those around pretty quickly. We use that as a way to remind ourselves to clean things up that are left around and forgotten by owners. [18:55:37] If in doubt, request a project :) [18:56:33] I can't tell whose it was, maybe it was Gabriel's thing? [18:56:56] In the last round of deletions, it had a lot of test VMs and was not deleted [18:57:12] 🤷🏻‍♀️ [18:57:40] nikkinikk: https://phabricator.wikimedia.org/project/view/2875/ we'll have it set up pretty quickly. Just helps us keep track. [18:58:24] here's an old hint, https://phabricator.wikimedia.org/T148788 [19:18:44] where do I upload a CSV file that I want to be public and linked from wikitech? I've tried phabricator's paste but it throws an exception back at me. the file is 26MB [19:19:27] hm just remembered this https://paste.toolforge.org/ going to try [19:26:26] bstorm, andrewbogott: yeah, the "services" project is an ancient holder over from the days where we had lots of WMF team projects. [20:28:57] In that case, definitely please make a project nikkinikk. [20:48:41] I'm following an old url to a tool that is currently down (webservice not running, when approaching the $tool.toolforge.org domain), but the old url is instead redirecting to www.toolforge -> Wikitech portal page. [20:48:43] https://tools.wmflabs.org/simple/feedback/?action=results&wiki=simple.wiktionary [20:48:45] as an example [20:48:48] is that expected/intentional? [20:50:45] Krinkle: hmmm.... no I don't think that is expected. [20:50:51] * bd808 pokes a bit [20:51:15] I thikn it might be confusing the JS code thining its a success, not a big deal as the tool is down now anyway, but yeah, woudl expect to get the 50x error instead [20:53:12] I'm not 100% sure which layer of the front proxy is redirecting to https://www.toolforge.org/ but my guess is the very front layer. I would expect the visitor to end up at a page saying that the tool exists but has no active webservice [20:53:41] which would be served up by the fourohfour tool [20:54:41] Krinkle: this is what I would expect -- https://fourohfour.toolforge.org/?url=https://simple.toolforge.org/ [20:55:03] although the url it was served from would be https://simple.toolforge.org/ [20:55:20] right, yeah [20:55:23] very much worth a bug report :) [20:55:29] ok [20:59:03] > https://phabricator.wikimedia.org/T281003 [20:59:53] !log tools.simple Restarting webservice which seems to have died due to grid engine instability [20:59:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.simple/SAL [21:01:06] weird, even with the webservice restarted things are still bouncing to that unexpected page [21:02:33] !log tools.simple Hard restart in an attempt to reset state information at the Toolforge front proxy [21:02:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.simple/SAL [21:07:08] did the lua + nginx proxy get replaced with something else yet? [21:07:22] yuvipanda: its... complicated :) [21:08:06] :D [21:08:11] Krinkle: I at least got https://simple.toolforge.org/ up for you. I have a vague memory of there being some config file that keeps new tools from using the old tools.wmflabs.org URL naming. Trying to find that now. [21:09:48] yuvipanda: For Toolforge, we still have urlproxy at the front to handle grid engine webservices. It alls through to the Kubernetes ingress, and then that has a bit of magic to hand off to the "fourohfour" tool if no ingress is found. [21:10:24] (and there is some other crap in there like an haproxy between dynamicproxy and k8s ingress) [21:11:01] bd808: also noting how the wikitech portal just flipped to the mobilefrontend variant all of a sudden [21:11:08] thanks for cache poisoning and hte known issue there with wikitech [21:11:28] yeah, that bug is going to live as long as mobilefe :/ [21:11:51] yuvipanda: we have diagrams! https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Networking_and_ingress [21:12:50] oooh, interesting re: haproxy but separate from nginx-ingress! [21:12:54] the dream is direct k8s ingress, but the grid must die first or we have to make a grid ingress inside k8s [21:13:32] yeah, grid ingress doesn't seem super bad [21:13:52] since you can create a service that just points to an IP + port and have ingress target that, without a deployment / replicaset / pods [21:14:15] ah, haproxy gets traffic into nodeports. makes sense [21:15:02] yeah, because we are in OpenStack so there is no other platform routing trick to bundle the nodeports [21:15:27] makes sense. I also see it's being used to HA k8s masters [21:17:45] oh, and it's calico inside - not flannel! That's pretty cool [21:19:06] yuvipanda: b.storm and a.rturo did a lot of neat stuff when they rebuilt it all from scratch a couple of years ago. [21:19:16] \o/ [21:19:19] v cool [21:19:59] there is all kinds of PSP and admission controller magic to do the things that you had to hack directly into the source in the long ago [21:22:16] oh yeah, I totally forgot about the custom k8s build :| [21:22:28] PSP wasn't a thing yet! I think it was alpha or beta just as i was leaving [21:23:19] right, it was "soon" then and we piddled around for like another 18 months before tricking b.storm into working it all out :) [21:24:25] :D [21:25:05] in between we tried to go all in on following the production cluster, but then ripped that all out and started from scratch again using kubeadm because debs for k8s is a hard and ugly road [21:26:21] Krinkle: I figured out what is causing that redirect. Now I guess the question is should I fix it or not. https://phabricator.wikimedia.org/T281003#7030523 [21:26:24] yessssssssssss [21:27:17] bd808: yeah, ti appers to have been down for a while [21:27:21] I don't know much about it [21:27:35] the data report that I am now able to see suggests that maybe it was down since Nov 2019 (!) [21:27:42] maybe intentionally so? [21:28:00] kubeadm is the way to go, I think i did that for the PAWS cluster because the deb way was too hard [21:28:13] https://simple.toolforge.org/feedback/?action=results&wiki=simple.wiktionary [21:28:42] Krinkle: when I looked at the state of the tool's $HOME I think it was unintentionally down because of a grid crash. But this is strong evidence that the maintainer does not care for the tool anymore [21:29:51] yuvipanda: yeah, your PAWS 2.0 cluster was a source of inspiration. And now I think PAWS is in it's 4.0 cluster that follows the basic template from the Toolforge 2.0 cluster. [21:33:25] bd808: aye, ok. well, given it is still broken then I suppose leaving it up to show the aggregate data is nice, but I'll probably disable the script for now untl someone can decide what they want with it [21:34:30] any webservice still running on grid engine backend is suspect :) [21:40:39] bd808: glad it wasn't a general issue, np. [21:40:50] Ive disabled the gadget meanwhile [21:41:09] me too! And it took me a while to remember about the selective redirecting from the old domain name