[00:30:23] Niharika: {{done}}! [00:48:19] !log tools.threed2commons Killed redis and celery instances running on grid engine exec nodes outside the control of an active grid job (ppid 1) [00:48:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.threed2commons/SAL [00:52:55] !log tools.jembot Killed orphan php-cgi processes [00:52:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.jembot/SAL [00:54:25] !log tools.iabot Killed orphan php-cgi processes [00:54:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.iabot/SAL [00:54:36] !log tools.croptool Killed orphan php-cgi processes [00:54:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.croptool/SAL [00:54:47] !log tools.wsexport Killed orphan php-cgi processes [00:54:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wsexport/SAL [00:54:59] !log tools.dupdet Killed orphan php-cgi processes [00:54:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.dupdet/SAL [00:55:09] this is a very annoying grid engine bug :/ [01:09:50] !log tools Deleting /tmp files owned by tools.wsexport with -mtime +2 across grid (T190185) [01:09:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [01:09:52] T190185: tmp file leak from tools.wsexport - https://phabricator.wikimedia.org/T190185 [01:53:08] bd808: Thank you! I owe you a unicorn! [12:31:15] What the policy of CC-BY-NC on ToolForge? I'm looking at some font with a similar license (Free to redistribute, No selling nor derivative) [14:28:12] Dispenser: I'm not sure, tbh. [14:34:38] bd808: thanks for updating that Wikitech page. Although ssh -a labweb100{1,2}.wikimedia.org gives timeout error for me, from non-bastion and via primary.bastion it says the host is unknown; maybe the info there should be clarified that login should be via restricted.bastion? [14:34:57] ssh: connect to host labweb1001.wikimedia.org port 22: Connection timed out [14:48:46] Hauskatze: you have to access them via a core network ssh bastion. The public hostname is a bit misleading for that. [14:49:59] bd808: so no primary.bastion.wmflabs.org but other bastion? [14:50:37] Hauskatze: yes. these hosts are not in the Cloud VPS world. they are "production" hosts [14:51:08] bd808: makes sense as those are "live" (non replica) servers if I understand rightly [14:54:34] Dispenser: it is not allowed, no [14:57:04] see https://wikitech.wikimedia.org/wiki/Help:Toolforge/FAQ#Do_I_explicitly_have_to_specify_the_license_of_my_tools.3F [15:04:31] CC-BY-SA (nor GFDL) isn't listed on https://opensource.org/licenses [15:05:01] thats mainly because it is not a source code license [15:05:19] hello, i need to have access on toolforge. i requested it yesterday, but my access isn't anymore approved. Please approve my request [15:05:52] https://toolsadmin.wikimedia.org/tools/membership/status/274 [15:06:00] that's the request, for those who can approve [15:06:22] yes [15:06:34] it' s my request [15:06:46] Los_: you have not answered what you want to do with the bot, neither do you have authorization for the bot use there [15:06:48] you are blocked indefinetly on it.wp [15:06:57] there is that as well ^ [15:07:08] I also have a problem and its that, each time I ran git pull to sync my Toolforge code with that of GitHub, it changes the permissions of my files so they can't be executed by the cron [15:07:24] do we know any 'hack' to avoid that? [15:07:49] No, i answered [15:07:55] Hauskatze: it changes the permissions or the owners? [15:08:05] my bot is for create benvebot on it:v [15:08:12] git should respect permissions iirc [15:08:16] chicocvenancio: ola, just the permissions, owner and group remain the same [15:08:30] Los_: you do not need a Toolforge account to apply for your bot flag on it:v [15:08:58] ohh, that is weird, give me a second to research, Hauskatze [15:09:03] Los_: and we are reluctant to give you a toolforge account without approval by a wiki commounity due to your itwiki ban [15:09:17] chicocvenancio: obrigado [15:09:20] Hauskatze: what permissions? the exec bit on a script? [15:09:28] I need toolforge for host my bot [15:09:38] Los_: can't you run it locally first ? [15:09:43] for the test phase ? [15:10:07] bd808: so after git pull it changes to -rw-r--r-- 1 tools.mabot tools.mabot 1512 Mar 21 15:06 archivebot.sh [15:10:20] and I need it rwxrxw--- [15:10:28] https://it.wikiversity.org/wiki/Wikiversit%C3%A0:Bot/Autorizzazioni [15:10:33] Hauskatze: windows ? [15:10:43] thedj: nop :( [15:10:48] thedj: yep, but this is on toolforge [15:11:00] and I make the modifications on GitHub directly [15:11:02] Hauskatze: yeah, but windows git doesnt preserve the x flag. [15:11:12] ah [15:11:38] well, I usually remember to chmod 770 *.sh later [15:11:48] Hauskatze: have you set the exec bit in the git repo? this may be helpful -- https://stackoverflow.com/questions/21691202/how-to-create-file-execute-mode-permissions-in-git-on-windows [15:11:54] Dispenser: we also have to abide by the board resolution https://wikimediafoundation.org/wiki/Resolution:Licensing_policy and CC-BY-NC violates that [15:12:01] bd808: looking at that [15:12:27] i have a ban on itwiki. But user:Ruthven wants to unlock me, but I must first be reliable on it:v [15:12:34] I don't know if there is an easy way to set that flag via github's UI. I would guess not [15:12:57] Los_: get the approval for the bot there first (ideally the code as well) [15:13:45] the code is welcome.py [15:13:54] you can search on mediawiki [15:14:00] sure [15:14:28] and has it.wikiversity approved that? [15:14:37] where is that discussion? [15:16:05] bd808: is there a chance to get T190296 created quickly? sorry about the suddenness, this came up at the EMWCon hackathon [15:16:05] T190296: Request creation of showcase VPS project - https://phabricator.wikimedia.org/T190296 [15:17:26] tgr: nobody has a project where they could host the vm for this already? Like the mwstake project? [15:17:26] i have requested on telegram group [15:17:53] for the bot [15:18:01] 1 user approved [15:18:08] oh, there is an mwstake project? [15:18:22] tgr: I'm pretty sure they have one... [15:18:32] indeed: https://tools.wmflabs.org/openstack-browser/project/mwstake [15:18:34] i requested on project page [15:18:53] Los_: wait for the community to approve, get a record of it on the wiki and then we revisit the account creation [15:19:03] would you prefer if we used that? [15:19:04] https://it.wikiversity.org/w/index.php?title=Wikiversit%C3%A0:Bot/Autorizzazioni [15:19:38] tgr: if you want it fast that's easiest. Our published policy is "Requests are processed by the Cloud Services team during the Cloud Services team meeting every Tuesday (8:20 AM PST) that the meeting is held." [15:19:58] fair enough, thanks [15:20:02] Los_: it would also be a good idea to respond on the request that you will use pywikibot welcome.py to setup automatic welcome of users (or whatever it is you want to do with it) [15:21:11] ok [15:21:18] i update the request [15:26:37] Technical Advice IRC meeting starting in 30 minutes in channel #wikimedia-tech, hosts: @addshore & @Christoph_Jauera_(WMDE) - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:36:58] bd808: mwstake is out of quota, apparently [15:37:44] tgr: :/ one xl is full quota? I'll take a look [15:38:00] thx [15:39:18] yup. 8 CPU + 16G is default quota [15:42:26] do you think it's possible to raise that / create a new project? I have some projects with free quota but those have an unrelated topic so any work done there would eventually have to be redone elsewhere [15:42:55] which is not the end of the world, the plan is to figure out how to use meza for automated installs anyway [15:44:58] * bd808 is biting tongue about the overkill of a xl for a wiki [15:46:17] peak load avg over the last 90 days on that VM is 0.4 [15:47:14] yeah, not sure what that box does, but there is probably no easy way to change that without having to reinstall everything [15:47:30] so basically I'm not excited about giving them more resources to waste/hoard [15:49:32] ok, so what would be better? request a new project, or use something unrelated like qna? [15:50:37] or I guess we can go back the original idea of using AWS/whatever but that seems like a missed opportunity to me [15:52:30] tgr: for a quick proof of concept, using qna would be easiest. I really don't like making special cases for project grants because literally everyone claims that their work is time sensitive [15:53:05] we made the "review on Tuesday" policy 2 years ago because it was out of control [15:56:12] well I'm sitting at a hackathon and the other people who are interested in working on it can't start until there is an instance so I feel I have a better than average claim for time sensitiveness :) [16:01:35] I'll go with qna then [17:23:40] !log tools.wsexport clush -w @exec -w @webgrid -b 'sudo find /tmp -type f -user tools.wsexport -mtime +1 -delete' (T190185) [17:23:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wsexport/SAL [17:23:42] T190185: tmp file leak from tools.wsexport - https://phabricator.wikimedia.org/T190185 [17:50:51] !log tools Cleaned up stale /project/.system/bigbrother.scoreboard.* files from labstore1004 [17:50:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:26:06] andrewbogott: in the wikidata-dev project we have an instance (redis-dispatching-client) that’s been “deleting” for five days now… any idea what we can do to actually get that instance deleted properly? [18:26:21] (Amir1 said I could ping you, sorry if you’re the wrong person to ask) [18:28:33] the sibling instance redis-dispatching-repo was deleted around the same time and didn’t have any issues [18:30:51] Lucas_WMDE: I can try to clean it up [18:30:59] ok thanks [18:35:16] huh, it seems to be gone now [18:35:44] I went to the instance’s own page, clicked “delete instance” and confirmed the popup [18:35:57] andrewbogott: did you do something simultaneously or did my deletion work for some reason? [18:36:08] I did a bunch of things, no idea which of them worked [18:36:15] okay :D [21:17:09] !log rcm Neon: Security patch [21:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [22:12:53] !help can someone help me investigate a performance issue? A service (ifttt) I have running on tools has gotten slow since Sunday [22:12:53] slaporte: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [22:13:17] slaporte: sure [22:13:24] what is the tool? [22:13:47] chicocvenancio: it’s ifttt (tools.wmflabs.org/ifttt) [22:17:25] slaporte: is there a specific thing that is now slower? What have you tried to check [22:17:27] ? [22:22:44] chicocvenancio: let me know what you can find. It’s been running fine, but just recently ~50% requests started timing out. [22:23:15] slaporte: what have you already checked? [22:30:46] chicocvenancio: I can tell it’s still up and running, and slower than before. I’m super familiar with kubernetes, so I don’t know how to see much more about what could be affecting performance. [22:31:16] why do you say it is slower? [22:33:14] slaporte: you're not giving me much information to help you [22:33:22] chicocvenancio: the service that relies on it is timing out, starting on Sunday [22:33:38] what is that service? slaporte [22:33:56] chicocvenancio: this is the Wikipedia channel in IFTTT [22:34:11] at https://ifttt.com/wikipedia [22:34:58] ok, so that is the only consumer of that endpoint? [22:35:02] yup [22:35:25] I’m wondering if you can see anything about the stats for the ifttt tool (or point me in the right direction) [22:35:26] I see nothing in the access.log, I'm assuming that is expected [22:35:52] the traffic stats are in app/ifttt.log [22:36:46] I see errors there [22:37:25] https://www.irccloud.com/pastebin/UdYO2U7s/ [22:37:29] there are some known errors, but the overall performance has gone down too [22:38:58] well, I'd usually time the requests, but tools.wmflabs.org/ifttt gives me a 401 [22:39:50] I can share the key with you privately [22:41:29] will the timing help us? how long does a standard request take atm? [22:42:11] I usually go that route to check performance to be able to know what did or did not solve the problem [22:42:56] as just one example test, https://tools.wmflabs.org/ifttt/v1/triggers/word_of_the_day is taking anywhere from 150ms to 5000ms [22:43:19] not sure why there is that big of a difference [22:48:19] slaporte: what does the tool do on each request? [22:49:38] gets some data from WP APIs and transforms it for IFTTT, a few endpoints hit the db replica, but the performance issue is affecting both. There is a filesystem cache in there too. [22:49:39] I'll attach strace to it [22:49:47] code here: https://github.com/wikimedia/ifttt [22:50:19] filesystem cache could cause random variance if it ends up writing to NFS [22:50:35] hmm, any reason this would start drastically changing last sunday/ [22:51:56] is the tool constantly getting requests? [22:52:10] I'm also seeing a lot of getdents(2) syscall [22:54:37] https://www.irccloud.com/pastebin/AZ0WbUJ3/ [22:54:42] yes, the traffic is much constant. It has a pretty big userbase (through the IFTTT.com) [22:54:53] pretty much* [22:55:20] https://www.irccloud.com/pastebin/GKUL7ZRu/ [22:55:34] do have any monitoring of the traffic? IE could it have increased a lot this week? [22:56:09] I'd suggest optimize those cache^ [22:56:38] I don’t have great traffic monitoring. From what I can see from ifttt.com, the users stayed the same (actually a slight decrease) over the last week [22:57:04] zhuyifei1999_: sounds good. I’ll investigate more. [22:57:23] Any way to confirm it’s not a system wide thing? [22:59:29] you mean affecting the entire toolforge? curl https://tools.wmflabs.org/ finishes in 47ms for me [23:05:20] I have a problem. I tried to add a security group to my instance in Horizon, it said something like "successfully edited unknown instsance", and now my instance has NO security groups and cannot be reached via SSH anymore [23:05:25] (This is deployment-maps01) [23:05:51] can you edit the security groups now? RoanKattouw [23:05:55] Trying [23:06:19] No, edits don't stick [23:07:04] what is the project? [23:07:17] deployment-prep [23:07:21] Yes [23:08:36] My Kafkaesque experience so far: I tried to create a new security group for my service, but couldn't because the quota was reached (there's a quota?!). So I edited an existing security group ("sca") to add my port to it, so I could come back and ask for more security group quota later. Then when I tried to add the sca group to my instance (deployment-maps01), not only did that not work, it removed the default security group so I can't SSH [23:08:37] into my instance anymore [23:11:10] RoanKattouw: "fun" let me take a peak in horizon [23:11:24] slaporte: for the traffic, see uwsgi.log, and it logs time consumption for each request [23:11:24] I've had this happen to me when setting up paws-beta, but I managed to edit the security groups [23:11:48] I should probably mention that I deleted and recreated that instance under the same name a couple times last week. Didn't seem to cause any issues at the time, but it did worry me slightly [23:13:13] deployment-prep is a bit messy I'd say [23:13:22] RoanKattouw: ugh. It's not giving me an error message, but when I try to add the "default" group back to it nothing happens [23:14:07] andrewbogott: are you around to debug a horizon + security group issue? [23:15:47] * bd808 wonders where horizon hides its log files [23:16:40] 2018-03-21 23:00:00.122973 Recoverable error: Quota exceeded, too many security groups. (HTTP 403) (Request-ID: req-80154ac2-14ba-4878-9bf1-6022bb84d8fd) [23:16:56] I will let you know when I see slaporte and I will deliver that message to them [23:16:56] @notify slaporte for the traffic, see uwsgi.log, and it logs time consumption for each request. I must say that the tool is really busy (probably saturated) [23:17:12] RoanKattouw: had you just made a new security group? [23:17:27] bd808: I had just tried to make one, but couldn't because that project is at its quota [23:17:34] So instead I edited the "sca" security group to add my por [23:17:49] ok. [23:17:52] Then immediately after that I attempted to add "sca" to my instance, and that's when everything went sideways [23:18:24] first thing we can do is bump that group limit. its arbitrary and beta cluster is a complicated project [23:21:37] Cool, lemme know when it's bumped and I'll create a new group [23:22:03] !log deployment-prep Raised security group quota from 20 to 40 [23:22:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [23:25:03] !log deployment-prep Created maps security group for port 6533; removed port 6533 from sca security group [23:25:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [23:29:26] OK so now I have the group I want, do you want me to try to add that to my instance? [23:29:47] you can try, but its failing for me so far [23:30:17] and nothing in error logs [23:30:27] I'm digging for cli commands to list and attach to an instance [23:30:37] RoanKattouw: could you start a phab task about this? [23:31:29] Will do [23:32:08] bd808: I'm here! What's the issue? [23:32:41] hey andrewbogott. RoanKattouw is trying to change security groups on an instance and horizon seems to not actually be doing anything [23:32:53] even worse it seems to have removed all groups from the instance [23:32:54] deployment-maps01.deployment-prep can't be assigned security groups [23:33:03] ok, looking [23:34:34] I'm trying to figure out how to show/change the groups attached to a server using the `openstack` command [23:34:58] T190367 [23:34:58] T190367: deployment-maps01 has no security groups, none can be added - https://phabricator.wikimedia.org/T190367 [23:35:09] thanks RoanKattouw [23:36:51] bd808: the output from 'openstack server show' shows security groups [23:37:19] and there's 'server add security group' and 'server remove security group' [23:39:12] andrewbogott: cool. I did `OS_TENANT_NAME=deployment-prep openstack server show 32545986-1556-4a07-87e7-79c7a607acf8` and didn't see any groups so I wasn't sure [23:39:25] it wasn't showing for that particular instance [23:39:32] probably further result of its weird state [23:39:38] I added one, trying to see if that makes horizon shape up [23:39:59] zhuyifei1999_: Thanks again. The cache had grown quite a bit, and some of it was unused, so I cleared it out. Were back to low/no timeouts. Appreciate your help here! [23:40:16] np [23:40:37] I’ll look into fixing our caching so I won’t have to bug you again :) [23:40:48] try somehow to separate cache-pruning from web requests [23:41:08] RoanKattouw: is that an existing/otherwise valuable VM? I see at least one other thing messed up about its state in nova [23:41:10] or make it async, doing after the request is served [23:41:13] moving the cache to redis if its not too huge might be useful too [23:41:14] Was it recently created? [23:41:31] or somehow make it not run too frequently [23:41:38] It was created a few days ago I thnik [23:41:55] I did delete and recreate it a few times under the same name, not sure if that breaks things (everything seemed fine until now) [23:41:56] yeah redis is awesome for a cache [23:42:03] andrewbogott: it looks like 8 days ago in horizon's metadata [23:42:08] There's nothing terribly valuable on it, but rebuilding it would take me another day [23:42:23] RoanKattouw: ok. What security group did you want to add? 'maps'? [23:42:28] Yes [23:42:43] (originally "sca", but now "maps" now that it exists) [23:42:55] I'd also like "default", for obvious reasons :) [23:47:34] RoanKattouw: if you look at that instance in horizon now (by just clicking on the instance name) it shows 'default' and 'maps' as applied [23:47:43] The 'edit security groups' menu still shows madness [23:47:54] but, can you confirm that it actually allows the access that you need now? [23:48:03] I see them [23:48:04] Lemme check [23:48:23] Yay SSH works again [23:49:35] 6533 doesn't work but I think I screwed up the security group itself, lemme fix that [23:50:52] Or at least, I changed ALLOW 6533:6533/tcp from 0.0.0.0/0 to 10.0.0.0/8 [23:51:23] That does not seem to work but let me check if the service is configured to actually listen for outside connections [23:51:53] Hmm yes it is, that's weird [23:52:31] andrewbogott: So SSH is now working again, which was the most pressing thing, but port 6533 isn't working yet. Neither curl http://10.68.16.73:6533 on deployment-tin nor https://maps-beta.wmflabs.org work