[05:53:33] !log admin Hard reboot of clouddb1001 via Horizon. Console unresponsive. [05:53:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [05:58:36] !log admin `systemctl start mariadb` on clouddb1001 following reboot [05:58:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [06:03:19] !log admin `systemctl start mariadb` on clouddb1001 following reboot (take 2) [06:03:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:07:42] <[1997kB]> labs down? [08:23:13] [1997kB]: can you be more precise? [08:23:49] <[1997kB]> I was getting 503, but it's working now. [10:58:08] !log toolsbeta running `aborrero@toolsbeta-test-k8s-control-1:~ $ sudo -i kubeadm upgrade apply v1.16.10` and this time it works! (T246122) [10:58:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:58:11] T246122: Upgrade the Toolforge Kubernetes cluster to v1.16 - https://phabricator.wikimedia.org/T246122 [10:58:52] !log toolsbeta running `aborrero@toolsbeta-test-k8s-control-1:~ $ sudo apt-get install kubelet -y` in the 1.16 version from the component repo (T246122) [10:58:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:02:58] !log toolsbeta upgraded the rest of the k8s control plane nodes to 1.16.10 (T246122) [11:03:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:05:13] !log toolsbeta trying `modules/kubeadm/files/wmcs-k8s-node-upgrade.py --control toolsbeta-test-k8s-control-1 --project toolsbeta --domain eqiad.wmflabs --src-version 1.15 --dst-version 1.16.10 -n toolsbeta-test-k8s-worker-1 -n toolsbeta-test-k8s-worker-2 -n toolsbeta-test-k8s-worker-3` (T246122) [11:05:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:05:16] T246122: Upgrade the Toolforge Kubernetes cluster to v1.16 - https://phabricator.wikimedia.org/T246122 [11:45:40] !log tools.zppixbot-test git pull & kubectl pods delete --all to sync pip3 [11:45:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot-test/SAL [11:47:58] !log tools.zppixbot git pull & kubectl pods delete --all to sync pip3 [11:47:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [12:02:37] !log toolsbeta the k8s cluster is now running v1.16.10 (T246122) [12:02:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:02:40] T246122: Upgrade the Toolforge Kubernetes cluster to v1.16 - https://phabricator.wikimedia.org/T246122 [12:14:54] hi, FYI I'll be restarting prometheus on cloudmetrics hosts shortly, no impact expected [12:15:13] as in, other than a dip in the metrics [12:22:02] {{done}}, it is coming back up now [12:41:13] thanks godog ! [12:42:36] np arturo, roll-restarted all prometheis in production for a puppet change [14:38:27] should https://wikitech.wikimedia.org/w/index.php?title=Template:ToolsGranted&action=edit use login.toolforge.org ? [15:28:39] yeah, let me change that RhinosF1 [15:28:49] Ty bd808 [15:29:12] * RhinosF1 finally fixed his ssh config the other day [17:23:53] !log tools deleting "tools-k8s-worker-20", "tools-k8s-worker-19", "tools-k8s-worker-18", "tools-k8s-worker-17", "tools-k8s-worker-16" [17:23:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:28:25] hi: [19:28:25] maybe someone knows how can I cancel a Wdumper task? https://tools.wmflabs.org/wdumps/dump/348 [19:28:59] I don't want to consume the computing resources because I think the dump configuration I made is wrong. [19:31:00] Olea: bennofs is the only maintainer -- https://tools.wmflabs.org/admin/tool/wdumps [19:32:00] bennofs: ping [19:33:21] Olea: I'm not sure if they are on irc at all or what their nick would be if they are [19:33:38] and sadly their wikitech user page is empy [19:34:20] * bd808 is not sure that tool is a good idea [19:34:53] > * <@freenode_bd808:matrix.org> is not sure that tool is a good idea [19:34:53] why? :-m [19:35:09] it eats up a huge amount of disk space [19:35:18] T247449 [19:35:18] T247449: wdumps custom generated dumps storage space - https://phabricator.wikimedia.org/T247449 [19:36:14] Understand. I feel guilty because maybe my queries are not well tunned and I'm consuming too much resources :-| [20:41:17] bd808: part of me wonders if we could just set it up on IA's servers [20:42:39] I remember when I studied this whole thing the IA export story was a pretty unique creature on Cloud VPS [20:43:34] wdumps isn't for IA is it? My understanding is that it is trying to provide one-off filtered exports of wikidata dumps for end users. [20:43:50] which is noble, but also a giant resource hog [20:46:01] Ah, so now there are 2 services providing dumps. [20:50:18] hare: 2 that I know of, so probably 9+ ;) [20:51:32] Toolforge is the land of TIMTOWTDI (there is more than one way to do it) even if we have kind of sketchy Perl support [20:52:21] ls -alS /home/tools | top [20:52:28] that should find you the dump providers pretty quickly :D [20:54:23] The current top 5 are: zoomviewer, wikidata-exports, templatetiger, paws, and oar [20:54:29] * bd808 now wonders what oar does [20:55:41] oh, templatetiger got cleaned up since I made that list. I forgot that. So wikidata-analysis slides into the top 5 [20:56:07] wdumps ranks 8th [20:56:26] which is pretty impressive for a tool that is only a couple of months old :) [20:59:41] There are examples in nature of things suddenly growing very large, very fast. We tend to call that "cancer" but that's beside my point. :) [20:59:49] oar is owned by a single maintainer and they seem to have been inactive since 2016... [21:00:48] why did people think my proposal to have a "check for life" for a tool every 6 months was a bad idea? [21:01:00] I myself am no stranger to questionable engineering decisions. I built an entire application around Redis without considering what would happen once there was more data than RAM. [21:03:24] hare: heh. that's a super common issue in tools and programming in general [21:04:58] I was looking at toolsdb today because it crashed last night my time and interrupted my sleep. The number of tools that have huge databases there is a good indication that many folks don't think about what happens when you have 500M rows to deal with [21:59:53] hello WMCS! I'm working on migrating the pageviews tools to toolforge.org and am running into a bizarre problem [22:00:26] The webservice for pageviews-test is currently stopped. https://pageviews-test.toolforge.org/ shows "No webservice", which is good [22:01:10] but https://pageviews-test.toolforge.org/redirectviews/ is still up and running! It appears to be loading https://tools.wmflabs.org/redirectviews [22:02:31] as far as I can tell it shouldn't be mapping to the redirectviews tool (I don't even have a .lighttpd.conf), and even if it did, I would expect `webservice stop` to stop anything and everything that's under https://pageviews-test.toolforge.org [22:03:02] right? [22:08:21] musikanimal: looking, but so far all I can say is "weird" [22:08:46] it's also happening for a few other subpaths like https://pageviews-test.toolforge.org/userviews/ [22:09:31] my first guess was that there was a bug somehow that made an unhandled *.toolforge.org url act like tools.wmflabs.org, but that does not seem to be the case [22:12:59] one thing I can say for sure is that the code that is running at https://pageviews-test.toolforge.org/redirectviews/ is not the code under /data/project/pageviews. If you view the source it's loading resources from https://tools.wmflabs.org/redirectviews, but I changed the code to use relative paths like /redirectviews (hence it should look under pageviews-test.toolforge.org) [22:15:59] musikanimal: was this running on the job grid at some point? I'm wondering if I should be hunting in the grid or the k8s ingress layers [22:17:14] pageviews-test has been on k8s for a while I believe, but redirectviews is currently on the grid [22:18:32] I think you're on to something though, because langviews is on k8s and that subpath correctly isn't loading https://pageviews-test.toolforge.org/langviews [22:18:48] let me try moving redirectviews to k8s [22:21:05] that indeed fixes it! I'm still confused why it would map to a different tool like that [22:21:08] musikanimal: yeah, you found a bug! [22:22:33] here's an example -- https://totally-random-tool-name-1234567.toolforge.org/dspull/ [22:22:42] haha wow [22:23:26] if the toolname is not alive, but includes a path that matches a grid webservice then it all owrks like the hostname was tools.wmflabs.org [22:23:55] ahh, I see [22:25:07] I'll write it up and then stare at lua code until I give up and leave it for a.rturo :) [22:25:19] well it's easy to fix on my end at least. I'll move all the tools to k8s. Many thanks for the quick assistance! [22:28:47] bd808: oh, it's a problem for live tools, too! https://sql-optimizer.toolforge.org/siteviews [22:29:20] ugh. that's worse [22:29:48] so if the path matches a grid tool that;s the magic [22:30:12] yeah, I guess coincidentally there aren't many conflicts given this hasn't come up yet [22:40:21] T253816 [22:40:23] T253816: *.toolforge.org hostnames unexpectly treated as tools.wmflabs.org when the URL's path starts with a match to a grid engine webservice - https://phabricator.wikimedia.org/T253816 [22:42:23] musikanimal: yeah, the path matching must not have bitten anyone else yet. I bet if I moved the 'static' tool to the job grid it would cause more chaos [22:43:01] that does sound scary! [22:44:22] Now I just need to figure out a good way to fix the lua. [23:29:13] !log admin disabling the backup job on cloudbackup2001 (just like last week) so the backup doesn't start while Brooke is rebuilding labstore1004 tomorrow. [23:29:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL