[12:27:21] !log wikilink removing user/projectadmin `jsn` from the project and add it again (T250365) [12:27:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilink/SAL [12:27:24] T250365: Unable to SSH into instances project 'wikilink' - https://phabricator.wikimedia.org/T250365 [15:02:11] AntiComposite: great response to that sql question on the cloud list! If you have any ideas about how we could make some better docs for the views I would love to hear them. [15:06:34] The biggest problem is that they're spread over 3+ pages [15:07:00] if you want to know about the actor views, you have to look at the actor page [15:07:16] if you want to know about the userindex views, that's mostly on the toolforge page [15:07:52] but the mention of logging_logindex is on Help:MySQL queries, and it doesn't actually explain what it does [15:10:07] AntiComposite: great feedback. thanks! [16:27:28] Is cloud having hiccups [16:29:08] RhinosF1: read topic [16:29:32] Zppix: it shouldn’t impact us though per email [16:37:47] andrewbogott: bots unresponsive [16:38:01] Is network having an issue? [16:38:14] RhinosF1: we failed over between network nodes; I'd expect anything that broke to be back by now [16:38:23] (and I'm surprised that anything broke in the first place) [16:38:48] andrewbogott: tools.zppixbot-test is still unresponsive (but connected) [16:39:02] RhinosF1: ok. I can have a look after we have things settled on our end [16:39:20] I'm not sure I understand what 'connected but unresponsive' means [16:39:22] Will let you know if it wakes up before then [16:39:31] thanks [16:39:35] On irc but not responding to any messages [16:41:28] Bot has now disconnected, it should keep trying to come back but will update [16:43:47] They’re bacm [16:43:48] Back [18:19:28] !help [18:19:28] Examknow: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [18:19:42] hey Examknow [18:28:06] arturo: am I allowed to run a node.js webservice on toolforge? [18:35:46] Examknow: definitely allowed, although I don't know exactly what the process would be. [18:35:49] Examknow, There is a node10 container for kubernetes webservices [18:36:05] ah, there it is :) [18:36:40] https://wikitech.wikimedia.org/wiki/Help:Toolforge/My_first_NodeJS_OAuth_tool has instructions on setting up an example nodejs webservice [18:38:48] "am I allowed to" is an easy question: yes, as long as it's in the rules (https://wikitech.wikimedia.org/wiki/Help:Toolforge/Rules). "Can I make it work" is the one that'll get ya. [18:48:20] thanks [18:50:42] Can someone explain to me how to run a git pull origin master on the crontab on Tools without the crontab rewriting it to a jsub command and causing it to not work? [18:54:31] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Scheduling_jobs_at_regular_intervals_with_cron [19:07:44] Cyberpower678: why does jsub not work? [19:08:04] Dunno. It just doesn't [19:08:17] are you using git over SSH or HTTP(S) [19:08:22] HTTPS [19:08:29] yeah dunno [19:08:44] 'It just doesn't' isn't helpful. without knowing the problem nobody can debug it [19:08:54] ^ [19:09:42] zhuyifei1999_: Well I run it the modified crontab command, it submits, but it doesn't pull. [19:09:48] I don't see any errors. [19:09:54] So that's all I got. [19:10:07] no output no error? [19:10:10] which tool? [19:10:21] Your job was successfully submitted is all that comes out. [19:10:25] iabot [19:11:18] cd $HOME/master && /usr/bin/jsub -N cron-5 -once -quiet git pull origin master <= this one? [19:11:35] and the one below it for test [19:11:40] https://www.irccloud.com/pastebin/23h97A0E/ [19:11:57] cd doesn't affect jsub [19:12:00] use git -C /path/to/git/dir instead of cd [19:12:37] or add -cwd to jsub [19:14:01] git -C /data/project/iabot/master pull origin master [19:16:32] But it seems to work for the on demand workers in the entries below? [19:19:01] because the stuffs below have -cwd [19:31:31] zhuyifei1999_: so I'll try that. [19:32:23] zhuyifei1999_: I wonder if we should add something to crontabe-sge that looks for people doing things like `cd` and rejects the entry? [19:32:46] because that's almost never going to do what they think it does [19:33:04] or we can add -cwd automatically when it has a cd [19:33:15] that would be fnacy [19:33:21] *fancy [19:46:55] the difficult part is how to make the code look elegant [19:49:14] I don't love adding more magic in that wrapper honestly [19:54:01] https://gerrit.wikimedia.org/r/#/c/labs/toollabs/+/589411/1/misctools/oge-crontab so that would be a patch [20:03:08] zhuyifei1999_: looks to be working now. THanks. [20:03:13] np [20:06:23] zhuyifei1999_: ha. I didn't even remember that we had `cd` related code in there already. I'll review and try to remember how to build that package "soon" [20:06:40] ok [21:31:51] andrewbogott: im still seeing connect instability with tools.zppixbot(-test) [21:33:05] We reset some network things a few minutes ago to fix a problem. Probably you lost connectivity for a second during that. [21:33:34] andrewbogott: this was around 4:15PM UTC-5 [21:34:06] other bots are also disconnecting today at random times. wm-bridgebot stashbot shinken-wm [21:36:46] andrewbogott: our bot has been stuttery since 19:20 - afaics, all server timeouts so we're not getting a response from freenode [21:37:12] we assume disconnected and then FN times the bot out [21:38:34] irc bots are always super sensitive to network interruptions. Partially the fault of the irc libs in use, partially the fault of irc as a protocol. [21:39:09] Are things actually failing to connect /now/ or are we talking about interruptions during the upgrade? (Which completed about 20 minutes ago) [21:39:30] andrewbogott: last was 23 mins ago [21:40:03] which pretty nicely lines up with the last flip of the egress router [21:40:31] Yep [21:40:40] bd808: I know it's not the most stable [21:40:57] lots of jargon in that-- "flip" is failing over from one to the other in the HA pair of routers [21:41:38] andrewbogott: that sounds good then. I'll hang round and hope. [21:43:12] andrewbogott: what happened with network? [21:43:23] * RhinosF1 read shouldn't affect toolforge on the email [21:43:56] We upgraded the OpenStack software today. This included things in the software defined networking layer [21:44:31] there was more fighting with the new version of Neutron (the software defined networking layer) than hoped for [21:45:31] short network interruptions (<1m) are not something that we consider 'broken" [21:45:58] they are not ideal, but they are also not catastrophic [21:47:18] I see ~2-3 minutes each time which being IRC is enough to kill it. [21:49:56] huh — we should probably test that more. I'd be very surprised if a failover takes that long but other things were certainly taking longer than before with this version. [21:53:08] RhinosF1: I doubt that TCP/IP networking was interrupted for anything near 2-3 minutes. What you are seeing is the amount of time it took your sopel bot & Freenode to miss enough ping/pong messages to decide to die [21:54:06] which goes back to my statement of "[21:38] < bd808> irc bots are always super sensitive to network interruptions. Partially the fault of the irc libs in use, partially the fault of irc as a protocol." [21:54:06] From what I can see, it was 2m 50s on one between the first fail and it sucessfully reconnecting. The last successful to first when it was back was 4m 50s. No way to see what part is actual issues though. It died after 240s. [22:08:52] If anyone is bored, https://lists.wikimedia.org/pipermail/wikitech-ambassadors/2020-April/002284.html has a couple of linter fixes that could use a well tested PWB script to share with the world. [22:09:49] * RhinosF1 might just sleep :)