[00:02:09] Has anyone seen either Phe or Tpt active lately? [00:05:24] !help [00:05:24] Coren: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [00:05:31] Mah. [00:06:12] Not entirely sure if I would remember if I have, unfortunately. [00:06:45] Well, to be fair, I know they are mostly unreachable I was hoping someone might know if there's a hackathon or something like that. [00:07:14] I've a wikisource begging for my help, one of Phe's critical tools that's also managed by Tpt is down and nobody can get a hold of either. [00:07:29] I'm added to most of Tpt tools but not that one so I can't help directly right now. [00:08:25] And @andrewbogott is, like, doing something less useful like eating, and such. :-) [00:08:34] How dare he. :-) [00:14:06] :) [00:14:28] !log toolsbeta Fully enabled encryption at rest for toolsbeta kubernetes [00:14:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [00:16:50] Of, ffs, and I've been away so long I don't even know how to log into https://toolsadmin.wikimedia.org/ [00:17:06] Wait. Ldap username? Do I even have a password on that? [00:18:32] Oh, right, should be the same as phab right? [00:20:26] I was in the shower :) [00:20:32] I don't know that I can do anything that brooke can't do though [00:20:43] That's fair! [00:20:45] Coren, toolsadmin login is the same as wikitech [00:20:54] Yeah, I'm in. [00:20:58] :) [00:21:02] Hi Andrew! [00:21:06] hello! [00:21:17] Can I ask you to bend the rules a bit and add me to phetools? [00:21:24] If you let me know what you need, I can help...or Andrew can. I'm rushing through some tasks to find time to log out. [00:21:48] I have hordes beating down my door to help them and I can't reach either Tpt nor Phe. [00:22:13] Unfortunately (and/or fortunately) there's a policy for that https://wikitech.wikimedia.org/wiki/Help:Toolforge/Abandoned_tool_policy [00:22:22] I can poke at it though if you think it just needs a resart? [00:22:53] It's not abandonned; I'm on the other wikisource tools they maintain as a backup hand we just seemed to have forgotten this one. [00:23:10] how will I know if it's working or not? [00:23:29] andrewbogott: Yeah, one of the tools in there needs a restart - the match and split - but it'd need poking around because there are several of them. I was planning on inspecting the shell history. [00:23:44] I can tell you if it starts because there's like 3 days of jobs queued up for it. :-) [00:23:44] ok, looking... [00:24:11] this looks promising: "~/phe/run_service.sh restart match_and_split" [00:24:23] that does! :) [00:24:25] That's very promising indeed! [00:24:38] curious... [00:24:41] https://www.irccloud.com/pastebin/N8oPf3cX/ [00:24:45] kind of mixed signals there [00:25:18] qstat shows it running though... [00:25:24] Ah, no, that seems sane - it looks like it's trying to kill by name unconditionally first just in case. [00:25:33] I'm guessing he's not doing -once. [00:25:59] It's running now. [00:26:05] great! [00:26:08] Thanks, Andrew old buddy. :-) [00:26:13] you bet! [00:26:18] Are you at a hackathon or something? [00:26:20] Also, I need to corner one of them so they add me to that tool as well. [00:27:02] No, I'm home - but I have the lead contributor to fr.wikisource sharing my bed, remember? So if something is broken there I will hear about it *really* soon. [00:27:14] He says "hi" by the way. [00:27:20] hi back [00:27:27] πŸ‘‹πŸ» [00:27:30] so in a sense you're /always/ at a hackathon :) [00:27:37] I suppose I am! [00:27:39] * Coren laughs. [00:28:42] In theory, all tools related to wikisource should have Tpt, Phe and me listed; but there's a couple that have only two or one of us we need to finish. [00:29:01] * Coren groans. [00:29:09] And it'd have to be one of /those/ that broke. [00:29:25] Well, thanks for the hand, Andrew. How goes the... toolsforge now? :-) [00:30:02] (For the record, I aprove of fixing that labs/labs/labs mess) :-) [00:30:40] Mostly good lately! The big thing happening (which is mostly brooke and arturo's story) is rebuilding the k8s cluster so (among other things) we can get off of Debian Jessie. [00:30:58] But I seem to have broken Debian Stretch over the weekend which isn't going to help… bstorm_ is the new k8s stuff stretch or buster? [00:31:10] FYI, I definitely can jump in and restart tools and things as well. I was happy to let Andrew do it, though, since I'm doing relatively dangerous things to the new kubernetes cluster :) [00:31:21] buster! :) [00:31:29] So it's immune to that. [00:31:38] The grid is stretch [00:31:40] oh good [00:31:44] Eew. I just finished the upgrading project to 1.16 at $dayjob. [00:32:31] Fun :) All the new /v1's [00:34:19] Yeah, well, fun in theory. Right now I'm just happy we done. [00:36:10] At least they didn't break the networking; we have an unusual setup with a dedicated L2 fabric underlying the pods and just dumb kubenet on top; but I keep hearing rumors that they want to deprecate this. [00:36:55] Anyways; thanks again and have fun fam! [00:37:13] so long β€” take care! [00:40:43] πŸ‘‹πŸ» [00:45:12] Le sigh. [00:45:31] !log tools enabled encryption at rest on the new k8s cluster [00:45:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [00:45:50] It went boom again. andrewbogott think you can pastebin me the error log? See if I can figure out why and circumvent? [00:45:59] yep, stay tuned [00:47:28] Did you restart it or did it restart itself? [00:48:02] I didn't restart it [00:48:11] but I did notice that the .out log file is newer than the .err logfile [00:48:15] there's nothing useful in the logs so far [00:48:16] Okay, at least /that/ part worked. [00:48:57] .out looks like this: [00:48:59] https://www.irccloud.com/pastebin/ItQR5AIo/ [00:49:23] Well, there really isn't anything more I or you can do that would be useful in the current circumstances. Ima try to corner Phe or Tpt so they can add me to that one too. [00:49:30] and .err is pretty much just [00:49:32] Yeah, that looks like the normal verbose chatter. [00:49:32] https://www.irccloud.com/pastebin/d7G8SmFs/ [00:49:58] well, wait I also have 'wsircdaemon.err' is that related? [00:50:04] It says ImportError: No module named irclib [00:50:26] I don't think so; that would be another of phe's tools and one I don't know of -- it may be outdated. [00:50:38] I think Phe has been sorta fading away lately. [00:51:07] Thanks for all the help, Andrew. If nothing else, you tided things over for a bit. :-) [00:51:09] up a directory there's a plain old 'error.log' that's more useful [00:51:13] https://www.irccloud.com/pastebin/9SKFyKb3/ [00:51:53] I'll try to find a longer-term solution. [00:52:10] Yeah, that's a symptom of someone trying to poke the UI while it was down I think. [00:52:16] oh, ok [00:52:33] Thanks a bundle man. [00:52:49] np [01:05:25] !log beginning the first run of the new maintain-kubeusers in gentle-mode -- but it was just killed by some files setting the immutable bit T214513 [01:05:27] bstorm_: Unknown project "beginning" [01:05:27] T214513: Upgrade Toolforge Kubernetes - https://phabricator.wikimedia.org/T214513 [01:05:34] !log tools beginning the first run of the new maintain-kubeusers in gentle-mode -- but it was just killed by some files setting the immutable bit T214513 [01:05:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [01:25:13] !log tools unset the immutable bit from 1704 tool kubeconfigs T214513 [01:25:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [01:25:16] T214513: Upgrade Toolforge Kubernetes - https://phabricator.wikimedia.org/T214513 [01:26:01] !log tools running the first run of maintain-kubeusers 2.0 for the new cluster T214513 (more successfully this time) [01:26:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:48:40] !log tools completed first run of maintain-kubeusers 2 in the new cluster T214513 [04:48:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:48:44] T214513: Upgrade Toolforge Kubernetes - https://phabricator.wikimedia.org/T214513 [05:04:33] wow, an actual sighting of Coren [07:21:13] !log admin deploying horizon/train to labweb1001/1002 [07:21:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:38:45] @seen matanya [09:38:45] zhuyifei1999_: Last time I saw matanya they were quitting the network with reason: *.net *.split N/A at 12/16/2019 4:09:00 AM (1d5h29m44s ago) [09:38:56] I will let you know when I see matanya around here [09:38:56] @notify matanya [16:50:18] !log tools updated the maintain-kubeusers docker image for beta and tools [16:50:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:53:33] !log tools maintain-kubeusers app deployed fully in tools for new kubernetes cluster T214513 T228499 [16:53:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:53:37] T214513: Upgrade Toolforge Kubernetes - https://phabricator.wikimedia.org/T214513 [16:53:37] T228499: Toolforge: changes to maintain-kubeusers - https://phabricator.wikimedia.org/T228499 [19:20:54] !log tools deployed the changes to the live proxy to enable the new kubernetes cluster T234037 [19:21:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:21:08] T234037: Toolforge ingress: decide on final layout of north-south proxy setup - https://phabricator.wikimedia.org/T234037 [19:59:58] Hello folks! Need your help in approving this request: https://toolsadmin.wikimedia.org/tools/membership/status/675 (it is related to a Google Code-in task) Thanks! [20:08:25] srish_aka_tux: done [20:08:44] @bstorm_ thank you! [20:25:24] !log tools Fixed https://tools.wmflabs.org/ to redirect to https://tools.wmflabs.org/admin/ [20:25:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:21:32] musikanimal: For random reasons I'm looking at 404 requests in Toolforge right now and I'm seeing a relatively large number of POSTs to https://tools.wmflabs.org/metaviews/api.php. There is no webservice running there. Is that expected? [21:44:24] Hi - I have a php server running on toolforge that's been freezing after a few minutes today. It was fine yesterday and before. Is there something going on? I've restarted it about 6 times now, and it works for a few minutes each time, then freezes up... [21:45:13] I can get a 404 response instantaneously so the server is running. But even phpinfo either takes minutes or doesn't respond at all. [21:47:42] Uh - !help ? [21:47:45] !help [21:47:45] apsmith: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [21:49:53] apsmith: is your tool running on the job gird or on Kubernetes? [21:49:58] *job grid [21:50:07] kubernetes [21:50:24] author-disambiguator (wikidata service) [21:51:59] +bd808: I can connect via kubectl into the container and look around, but didn't see anything obvious. One odd thing was that /var/log/lighttpd was no longer readable. [21:52:28] I'm pretty sure I've checked those log files before. [21:52:59] I don't think that logsx normally go to /var/log/lighttpd. We direct them to $HOME/{access,error}.log instead [21:55:16] How do you get the access.log restored? I remember a message about it going away but I looked around just now and couldn't figure out how to get it back... [21:55:44] At least that way I could see if something external is hitting the app and causing the problem... [21:56:19] apsmith: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Lighttpd#Web_logs [21:56:30] Thanks! [22:00:56] Ok, I guess this might help a little with debugging. If anybody has any other ideas let me know! [22:30:52] !log tools.bd808-test Moved webservice to new k8s cluster [22:30:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bd808-test/SAL [22:53:16] (is it frozen again?) [22:55:07] bd808: thanks for the ping! I started it back up. It's not important, just records usage of the Pageviews apps. I might retire it actually [22:56:08] (yeah it is, debugging) [22:57:32] !log tools.bd808-test Manually updated ingress rules (T241008) [22:57:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bd808-test/SAL [22:57:35] T241008: New k8s cluster routing behaving strangely for bd808-test tool - https://phabricator.wikimedia.org/T241008 [23:13:44] I will let you know when I see apsmith and I will deliver that message to them [23:13:44] @notify apsmith From the strace output of author-disambiguator my guess is that the tool is simply slooded [23:13:55] You've already asked me to watch this user [23:13:55] @notify apsmith From the strace output of author-disambiguator my guess is that the tool is simply Flooded [23:14:02] argh