[13:28:18] [[Tech]]; Nenntmichruhigip; /* Template:LangSelect doesn't work with all language settings */ new section; https://meta.wikimedia.org/w/index.php?diff=21176357&oldid=21165067&rcid=17579465 [14:30:19] Hi, does anyone here know Lua and might be interested in co-mentoring an Outreachy project with me this summer, working on the Commons Wikidata Infobox? See https://phabricator.wikimedia.org/T273109 for full info. [23:16:29] there's a potential issue with the URL shortener ( https://meta.wikimedia.org/wiki/Special:UrlShortener ) [23:16:59] since it's basically walking through every possible combination of characters, it'll eventually hit strings that could be problematic [23:17:29] which, since it's used on #-en-help, could make new users very upset [23:19:10] if someone's in the chat when they see their contributions referred to as https://w.wiki/fUCk or similar, there could be problems [23:30:39] Dragonfly6-7: see https://phabricator.wikimedia.org/T230685 [23:33:56] Dragonfly6-7: lots of letters are not there and if there's a case of abuse, it can simply be deleted by stewards [23:34:42] and if you by bad luck hit a bad work, you can simply use the alternative one provided [23:34:52] if both are bad, then your luck has ran out [23:35:06] if I'm reading that correctly, two URLs are generated, and it's up to the user to pick the non-rude one to use? [23:35:27] yeah [23:35:34] one starts with "_" [23:35:52] That's more problematic for API consumers of the short-url service where human input isn't really available, but I guess maintaining a list of bad words is also... problematic. [23:38:06] and as I said, lots of letters are missing, it reduces the chance of being able to make one with badwords [23:39:35] Just for confirmation, this is the allowed set of letters? https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/UrlShortener/+/refs/heads/master/extension.json#185 [23:39:41] not what I meant [23:39:55] on chat, the bot uses the URL shortener for links to people's contribs [23:40:01] and specific diffs [23:40:29] if all you'd need was a giant list of offensive words to compare to..there is https://en.wiktionary.org/wiki/Category:Vulgarities_by_language but then you'd have to start calculating Levenshtein distance to each word or something to detect similar ones... and would probably turn into an entire AI scoring service [23:40:51] the current method seems WAY easier [23:41:39] yeah, it's possible but the list can easily grow really quickly, I did something for vandalism detection using AI a couple of years ago in https://gist.github.com/Ladsgroup/cc22515f55ae3d868f47#file-enwiki [23:41:56] but there's so many variations, etc. [23:42:15] 'zanaflex' ? [23:42:37] mutante: the issue is that short urls are bot-generated via the API, and often presented directly to newbies in chat by the bot, so a vague filter is probably needed somewhere. The extension's current approach here seems sane [23:43:20] WP:NOTCENSORED and all the practical problems with trying to identify "bad words" :) [23:44:03] it's bad enough having to explain to people that telling them to use the 'sandbox' isn't an insult [23:44:16] I think a filter embedded in the bot for the really bad ones is probably sufficient, and if some of the more minor ones come up then shrug it off as an automated system issue. [23:44:57] (yes, I've really had people who thought 'sandbox' was offensive, and I'm confident that at least one of them was sincere.) [23:46:28] stwalkerster: ..or just a disclaimer that says ~ "please note these are autogenerated strings, Any similarity with fictitious events or characters was purely coincidental... " [23:47:00] might as well not shorten URLs if we're including that :P [23:47:11] heh, right