[05:53:33] <bd808>	 !log admin Hard reboot of clouddb1001 via Horizon. Console unresponsive.
[05:53:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[05:58:36] <bd808>	 !log admin `systemctl start mariadb` on clouddb1001 following reboot
[05:58:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[06:03:19] <bd808>	 !log admin `systemctl start mariadb` on clouddb1001 following reboot (take 2)
[06:03:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[07:07:42] <[1997kB]>	 labs down?
[08:23:13] <arturo>	 [1997kB]: can you be more precise?
[08:23:49] <[1997kB]>	 I was getting 503, but it's working now.
[10:58:08] <arturo>	 !log toolsbeta running `aborrero@toolsbeta-test-k8s-control-1:~ $ sudo -i kubeadm upgrade apply v1.16.10` and this time it works! (T246122)
[10:58:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[10:58:11] <stashbot>	 T246122: Upgrade the Toolforge Kubernetes cluster to v1.16 - https://phabricator.wikimedia.org/T246122
[10:58:52] <arturo>	 !log toolsbeta running `aborrero@toolsbeta-test-k8s-control-1:~ $ sudo apt-get install kubelet -y` in the 1.16 version from the component repo (T246122)
[10:58:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[11:02:58] <arturo>	 !log toolsbeta upgraded the rest of the k8s control plane nodes to 1.16.10 (T246122)
[11:03:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[11:05:13] <arturo>	 !log toolsbeta trying `modules/kubeadm/files/wmcs-k8s-node-upgrade.py --control toolsbeta-test-k8s-control-1 --project toolsbeta --domain eqiad.wmflabs --src-version 1.15 --dst-version 1.16.10 -n toolsbeta-test-k8s-worker-1 -n toolsbeta-test-k8s-worker-2 -n toolsbeta-test-k8s-worker-3` (T246122)
[11:05:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[11:05:16] <stashbot>	 T246122: Upgrade the Toolforge Kubernetes cluster to v1.16 - https://phabricator.wikimedia.org/T246122
[11:45:40] <RhinosF1>	 !log tools.zppixbot-test git pull & kubectl pods delete --all to sync pip3
[11:45:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot-test/SAL
[11:47:58] <RhinosF1>	 !log tools.zppixbot git pull & kubectl pods delete --all to sync pip3
[11:47:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL
[12:02:37] <arturo>	 !log toolsbeta the k8s cluster is now running v1.16.10 (T246122)
[12:02:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[12:02:40] <stashbot>	 T246122: Upgrade the Toolforge Kubernetes cluster to v1.16 - https://phabricator.wikimedia.org/T246122
[12:14:54] <godog>	 hi, FYI I'll be restarting prometheus on cloudmetrics hosts shortly, no impact expected
[12:15:13] <godog>	 as in, other than a dip in the metrics
[12:22:02] <godog>	 {{done}}, it is coming back up now
[12:41:13] <arturo>	 thanks godog !
[12:42:36] <godog>	 np arturo, roll-restarted all prometheis in production for a puppet change
[14:38:27] <RhinosF1>	 should https://wikitech.wikimedia.org/w/index.php?title=Template:ToolsGranted&action=edit use login.toolforge.org ?
[15:28:39] <bd808>	 yeah, let me change that RhinosF1 
[15:28:49] <RhinosF1>	 Ty bd808
[15:29:12] * RhinosF1 finally fixed his ssh config the other day
[17:23:53] <bstorm_>	 !log tools deleting "tools-k8s-worker-20", "tools-k8s-worker-19", "tools-k8s-worker-18", "tools-k8s-worker-17", "tools-k8s-worker-16"
[17:23:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[19:28:25] <Olea>	 hi: 
[19:28:25] <Olea>	 maybe someone knows how can I cancel a Wdumper task? https://tools.wmflabs.org/wdumps/dump/348 
[19:28:59] <Olea>	 I don't want to consume the computing resources because I think the dump configuration I made is wrong.
[19:31:00] <bd808>	 Olea: bennofs is the only maintainer -- https://tools.wmflabs.org/admin/tool/wdumps
[19:32:00] <Olea>	 bennofs: ping
[19:33:21] <bd808>	 Olea: I'm not sure if they are on irc at all or what their nick would be if they are
[19:33:38] <bd808>	 and sadly their wikitech user page is empy
[19:34:20] * bd808 is not sure that tool is a good idea
[19:34:53] <Olea>	 > * <@freenode_bd808:matrix.org> is not sure that tool is a good idea
[19:34:53] <Olea>	 why? :-m 
[19:35:09] <bd808>	 it eats up a huge amount of disk space
[19:35:18] <bd808>	 T247449
[19:35:18] <stashbot>	 T247449: wdumps custom generated dumps storage space - https://phabricator.wikimedia.org/T247449
[19:36:14] <Olea>	 Understand. I feel guilty because maybe my queries are not well tunned and I'm consuming too much resources :-| 
[20:41:17] <hare>	 bd808: part of me wonders if we could just set it up on IA's servers
[20:42:39] <hare>	 I remember when I studied this whole thing the IA export story was a pretty unique creature on Cloud VPS
[20:43:34] <bd808>	 wdumps isn't for IA is it? My understanding is that it is trying to provide one-off filtered exports of wikidata dumps for end users.
[20:43:50] <bd808>	 which is noble, but also a giant resource hog
[20:46:01] <hare>	 Ah, so now there are 2 services providing dumps.
[20:50:18] <bd808>	 hare: 2 that I know of, so probably 9+ ;)
[20:51:32] <bd808>	 Toolforge is the land of TIMTOWTDI (there is more than one way to do it) even if we have kind of sketchy Perl support
[20:52:21] <hare>	 ls -alS /home/tools | top
[20:52:28] <hare>	 that should find you the dump providers pretty quickly :D
[20:54:23] <bd808>	 The current top 5 are: zoomviewer, wikidata-exports, templatetiger, paws, and oar
[20:54:29] * bd808 now wonders what oar does
[20:55:41] <bd808>	 oh, templatetiger got cleaned up since I made that list. I forgot that. So wikidata-analysis slides into the top 5
[20:56:07] <bd808>	 wdumps ranks 8th
[20:56:26] <bd808>	 which is pretty impressive for a tool that is only a couple of months old :)
[20:59:41] <hare>	 There are examples in nature of things suddenly growing very large, very fast. We tend to call that "cancer" but that's beside my point. :)
[20:59:49] <bd808>	 oar is owned by a single maintainer and they seem to have been inactive since 2016...
[21:00:48] <bd808>	 why did people think my proposal to have a "check for life" for a tool every 6 months was a bad idea?
[21:01:00] <hare>	 I myself am no stranger to questionable engineering decisions. I built an entire application around Redis without considering what would happen once there was more data than RAM.
[21:03:24] <bd808>	 hare: heh. that's a super common issue in tools and programming in general
[21:04:58] <bd808>	 I was looking at toolsdb today because it crashed last night my time and interrupted my sleep. The number of tools that have huge databases there is a good indication that many folks don't think about what happens when you have 500M rows to deal with
[21:59:53] <musikanimal>	 hello WMCS! I'm working on migrating the pageviews tools to toolforge.org and am running into a bizarre problem
[22:00:26] <musikanimal>	 The webservice for pageviews-test is currently stopped. https://pageviews-test.toolforge.org/ shows "No webservice", which is good
[22:01:10] <musikanimal>	 but https://pageviews-test.toolforge.org/redirectviews/ is still up and running! It appears to be loading https://tools.wmflabs.org/redirectviews 
[22:02:31] <musikanimal>	 as far as I can tell it shouldn't be mapping to the redirectviews tool (I don't even have a .lighttpd.conf), and even if it did, I would expect `webservice stop` to stop anything and everything that's under https://pageviews-test.toolforge.org
[22:03:02] <musikanimal>	 right?
[22:08:21] <bd808>	 musikanimal: looking, but so far all I can say is "weird"
[22:08:46] <musikanimal>	 it's also happening for a few other subpaths like https://pageviews-test.toolforge.org/userviews/
[22:09:31] <bd808>	 my first guess was that there was a bug somehow that made an unhandled *.toolforge.org url act like tools.wmflabs.org, but that does not seem to be the case
[22:12:59] <musikanimal>	 one thing I can say for sure is that the code that is running at https://pageviews-test.toolforge.org/redirectviews/ is not the code under /data/project/pageviews. If you view the source it's loading resources from https://tools.wmflabs.org/redirectviews, but I changed the code to use relative paths like /redirectviews (hence it should look under pageviews-test.toolforge.org)
[22:15:59] <bd808>	 musikanimal: was this running on the job grid at some point? I'm wondering if I should be hunting in the grid or the k8s ingress layers
[22:17:14] <musikanimal>	 pageviews-test has been on k8s for a while I believe, but redirectviews is currently on the grid
[22:18:32] <musikanimal>	 I think you're on to something though, because langviews is on k8s and that subpath correctly isn't loading https://pageviews-test.toolforge.org/langviews
[22:18:48] <musikanimal>	 let me try moving redirectviews to k8s
[22:21:05] <musikanimal>	 that indeed fixes it! I'm still confused why it would map to a different tool like that
[22:21:08] <bd808>	 musikanimal: yeah, you found a bug!
[22:22:33] <bd808>	 here's an example -- https://totally-random-tool-name-1234567.toolforge.org/dspull/
[22:22:42] <musikanimal>	 haha wow
[22:23:26] <bd808>	 if the toolname is not alive, but includes a path that matches a grid webservice then it all  owrks like the hostname was tools.wmflabs.org
[22:23:55] <musikanimal>	 ahh, I see
[22:25:07] <bd808>	 I'll write it up and then stare at lua code until I give up and leave it for a.rturo :)
[22:25:19] <musikanimal>	 well it's easy to fix on my end at least. I'll move all the tools to k8s. Many thanks for the quick assistance!
[22:28:47] <musikanimal>	 bd808: oh, it's a problem for live tools, too! https://sql-optimizer.toolforge.org/siteviews
[22:29:20] <bd808>	 ugh. that's worse
[22:29:48] <bd808>	 so if the path matches a grid tool that;s the magic
[22:30:12] <musikanimal>	 yeah, I guess coincidentally there aren't many conflicts given this hasn't come up yet
[22:40:21] <bd808>	 T253816
[22:40:23] <stashbot>	 T253816: *.toolforge.org hostnames unexpectly treated as tools.wmflabs.org when the URL's path starts with a match to a grid engine webservice - https://phabricator.wikimedia.org/T253816
[22:42:23] <bd808>	 musikanimal: yeah, the path matching must not have bitten anyone else yet. I bet if I moved the 'static' tool to the job grid it would cause more chaos
[22:43:01] <musikanimal>	 that does sound scary!
[22:44:22] <bd808>	 Now I just need to figure out a good way to fix the lua.
[23:29:13] <andrewbogott>	 !log admin disabling the backup job on cloudbackup2001 (just like last week) so the backup doesn't start while Brooke is rebuilding labstore1004 tomorrow.
[23:29:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL