[10:54:24] Is there any known errors? [10:54:30] https://www.irccloud.com/pastebin/9VVxDxmy/ [10:54:36] bot has just spout that out [10:55:38] I can investigate! [10:55:54] arturo: pm'ing [10:57:44] use the public channel please, so others can see the context [10:58:06] arturo: per pm, hitting only one tool (zppixbot), zppixbot-test is fine [10:58:25] can't see anything standing out as to why, full trace caused by it is at sftp://login.tools.wmflabs.org/mnt/nfs/labstore-secondary-tools-project/zppixbot/.sopel/logs/default.error.log [10:58:56] default.sopel.log seems to have same error in [10:59:10] ignore after [2020-03-25 10:55:42,121] [11:05:41] RhinosF1: this is running in the grid, right? [11:05:54] arturo: it's on a kube pod [11:06:00] so yes I think [11:06:20] ok, k8s, so it is a webservice? [11:06:33] it's an irc bot [11:07:02] nothing crazy as happened nor has it alerted again so whatever happened seems like a weird blip [11:09:17] the kubernetes cluster is working just fine RhinosF1 [11:09:46] Strange, weird blip I’ll call it then. [11:10:38] using IRC without a bouncer is so disorienting [11:10:55] you always feel like people are talking behind your back, which they are. [11:11:02] I’ll hang round in case anything fires off again or the Sopel devs come up with an idea. [11:11:22] yuvipanda: true. But not really :-) nothing exiting in the backscroll [11:11:34] ya, but FOMO [11:11:49] RhinosF1: also https://grafana-labs.wikimedia.org/d/toolforge-k8s-namespace-resources/kubernetes-namespace-resources?orgId=1&var-namespace=tool-zppixbot&from=now-2d&to=now doesn't contain anything weird [11:12:12] RhinosF1: I mean, you don't seem to be hitting resource limits or anything [11:17:44] arturo: https://grafana-labs.wikimedia.org/d/toolforge-k8s-namespace-resources/kubernetes-namespace-resources?orgId=1&var-namespace=tool-zppixbot&from=1585133400000&to=1585134600000&fullscreen&panelId=13 seems to drop for input as it happened then peak after [11:19:12] the scale is ~200 bps ... I don't think that's anything meaningful [11:19:13] That could be a coincidence though [11:19:22] Probably [11:20:58] toolforge seems to be working fine anyway [11:21:07] arturo: backend devs say shit happens so I’m considering it resolved. [11:38:15] !log tools.zppixbot turn on log_raw [11:38:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [11:38:22] !log tools.zppixbot-test turn on log_raw [11:38:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot-test/SAL [11:48:50] !log tools.zppixbot rebooted bot, config change [11:48:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [11:50:26] !log tools.zppixbot-test restarted bot, config change [11:50:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot-test/SAL [12:18:46] !log project-proxy disable puppet in the 2 VMs to try refactoring the puppet role https://gerrit.wikimedia.org/r/c/operations/puppet/+/583316 (T135046) [12:18:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [12:18:49] T135046: Whitelist labs instances that need XFF header passed through the web proxy - https://phabricator.wikimedia.org/T135046 [12:34:53] !log project-proxy enable puppet in the 2 VMs. Role is now role::wmcs::novaproxy. Hiera was updated accordingly too (T135046) [12:34:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [12:34:56] T135046: Whitelist labs instances that need XFF header passed through the web proxy - https://phabricator.wikimedia.org/T135046 [12:51:57] !log project-proxy disable puppet in the 2 VMs again for testing XFF changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/583098 (T135046) [12:51:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [12:52:00] T135046: Whitelist labs instances that need XFF header passed through the web proxy - https://phabricator.wikimedia.org/T135046 [13:09:47] !log tools.stewardbots Restart StewardBot [13:09:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [13:35:59] !log project-proxy re-enable puppet in the 2 VMs again [13:36:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [14:14:31] !log tools.stewardbots Restart StewardBot [14:14:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [14:20:05] !log tools.stewardbots Deploy 03d88e3 [14:20:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [17:09:20] !log admin depool labweb1002 for horizon testing T240852 [17:09:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:09:23] T240852: CloudVPS: horizon giving http/500 intermitently - https://phabricator.wikimedia.org/T240852 [17:56:17] !log admin add labweb1002 back into the pool - completed horizon testing T240852 [17:56:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:56:20] T240852: CloudVPS: horizon giving http/500 intermitently - https://phabricator.wikimedia.org/T240852 [18:42:00] Hi everyone! I have been trying to connect my ToolForge via ssh with this command ssh suecarmol@login.tools.wmflabs.org and am getting the following error: Connection closed by 185.15.56.48 port 22 . I have verified that my ssh command works by trying it on other clients. I have uploaded my key to ToolForge, to Gerrit and WikiTech and have verified that it is the same key I have in ~/.ssh/id_rsa.pub . My [18:42:00] OpenSSH version is 7.9. I have also set up my ~/.ssh/config according to the documentation (https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances#ProxyCommand_(older_ssh_clients)). Does anyone know what might be causing this? [18:47:10] is there still an approval process? I believe susana_ basically went this route in the beginning https://wikitech.wikimedia.org/wiki/Help:Create_a_Wikimedia_developer_account#VPS_and_General_Users [18:47:32] now we're just trying to get her into Toolforge [18:47:53] there is, steps are https://wikitech.wikimedia.org/wiki/Help:Create_a_Wikimedia_developer_account#Toolforge_users [18:48:16] yeah, so she was able to login into the admin console [18:48:47] What does https://toolsadmin.wikimedia.org/tools/membership/apply look like? [18:48:56] I should let her talk :) Just sharing what I helped with thus far. My guess is her account still needs to be approved, though the docs don't mention this [18:51:43] That help page is only for creating a Wikimedia developer (LDAP) account, not about access to the Toolforge project. The docs for that are at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Quickstart [18:52:29] oh, there we go [18:54:50] Thanks for the quick answer. I am requesting a ToolForge membership now [18:55:52] susana_, musikanimal: {{done}} -- https://toolsadmin.wikimedia.org/tools/membership/status/743 [18:56:23] wee! [18:56:37] Awesome! I'll try running the ssh command again and I'll let you know it that works [18:59:09] I'm in! Thanks for the help! [18:59:30] bd808: approved that while I was still hunting for my 2fa token [18:59:48] I had the tab open :) [19:00:14] musikanimal: y'all should probably update your onboarding docs ;) [19:00:51] haha yes I'm taking notes [19:17:01] does anyone know what would be causing https://tools.wmflabs.org/signatures/ to be timing out right now? [19:17:20] all my other tools are fine, just that one is not responding [19:18:07] and now it's up again. [19:19:40] AntiComposite: what does the tool do to serve its content? Database access? API access? I don't know of any widespread issues at the moment. [19:20:04] * bd808 got a 502 after a long wait [19:20:52] the /signatures/ page has no external dependencies, it's just a python/flask page [19:22:46] from `kubectl get event` as that tool it looks like it was restarted pretty recently [19:23:11] which may account for the 502 I got [19:23:12] yeah, I was going to deploy code, but when i went to load the page it timed out [19:23:30] I complained, then it started working again, so I continued what I was doing [19:24:00] It seems like every other load is working for me. That might mean something is not quite right in the shared ingress layer [19:24:08] * bd808 pokes some other tools [19:26:52] AntiComposite: I can't find any other tools that are acting strangely. I'm not wondering about the `git rev-parse` call in the tool and if the node it is running on is pushing up against NFS rate limits [19:27:00] *I'm now wondering [19:27:56] Unless I've misunderstood something about how uwsgi/flask work, that /should/ only be called once on startup [19:28:03] definitely could be wrong though [19:29:08] !log admin dumping a bunch of VMs on cloudvirt1015 to see if it still crashes [19:29:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [19:29:17] yeah I think you are correct that it should only run as the UWSGI container builds new workers [20:00:39] Krenair: Hi! WOuld you be a dear and help restart https://tools.wmflabs.org/oabot/get-random-edit? which has been "dead" for a few days now... [20:00:54] (restart the webservice) [20:00:54] did you ask the maintainer? [20:02:12] hey folks, i keep getting nfs errors trying to set up instances with mediawiki-vagrant even with running vagrant destroy -f [20:03:02] i got this before and i thought it was the os i launched it with but now i'm using the os (debian-10.0-buster) that i have a working install with and still getting nfs errors [20:03:09] any suggestions appreciated! [20:03:16] hm, that might be a smart way to do it as well (duh)...will go that way of course [20:07:53] mepps: that's a persistent bug we have with Vagrant + LXC. My best advice if `vagrant reload` isn't working is a full restart of the instance you are running it on. [20:08:48] Josve05a, that's the preferred first step before asking toolforge admins to step in :) [20:09:51] seems like both I (and the maintainer) has problems with the page generating a 502-error page and needing to clear cookies due to something up with OAuth...after clearing the OAuth cookies, the page worked again [20:10:17] thanks bd808! i'll try another restart [21:37:43] !log tools.totoazero deployed 482ee9a hotarticles.py [21:37:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.totoazero/SAL [21:57:58] !log tools.totoazero deployed 5c07be5 hotarticles.py _errorhandler.py [21:57:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.totoazero/SAL [22:32:40] Josve05a, glad you got it fixed :)