[08:59:59] hey - is there a recommended way to add hidden search terms to help findability? (on my internal documentation wiki) [09:02:15] hmm, perhaps I could make a `Template:search-terms` which expands to nothing? [14:40:45] kjetilho: alternate names are sometimes created as redirects [14:42:22] kjetilho: what kind of hidden terms are you thinking of adding? How close are they to words in the article? If very close, you might want to use a more advanced search engine instead (eg Cirrus/Elastic) which normalises words to improve matching. [14:43:16] If the words aren't close to existing words, I'd recommend actually mentioning the words. Seems like that'd be helpful to a reader to know the relationship isn't an accident. [14:43:45] the specific case was that it didn't match "catalogue" when I searched for "catalog" [14:43:50] The template would work fine. You could perhaps have it output the words in a small box at the end of the article so that it isn't actually hidden. [14:44:11] Interesting [14:45:25] it shows up in the search snippet. so it became "service catalogue Search terms service catalog tjenestekatalog" (I threw in Norwegian there for good measure. the wiki is supposed to be English only, but there are some renegades ;) [14:56:29] https://test.wikipedia.org/w/index.php?search=Catalog&title=Special%3ASearch&profile=advanced&fulltext=1&ns4=1 [14:56:43] https://test.wikipedia.org/w/index.php?search=Catalogue&title=Special:Search&profile=advanced&fulltext=1&ns4=1 [14:57:07] Looks like Cirrus/Elastic don't stem or otherwise normalise between these two either [14:58:14] yeah, it's not really stemming, what's needed is synonym support. sidewalk/pavement etc. etc. [15:00:26] https://gerrit.wikimedia.org/g/mediawiki/core/+/b5a1f97c2eee8dc47c5f20d63eaf6c263083e5af/includes/content/ContentHandler.php#1323 [15:00:29] https://gerrit.wikimedia.org/g/mediawiki/core/+/b5a1f97c2eee8dc47c5f20d63eaf6c263083e5af/includes/content/ContentHandler.php#1394 [15:01:24] Might be able to use that hook from LocalSettings to automatically stuff extra keywords. Eg if str_contains catalogue, then append "Synonyms: catalog" [15:03:43] hmm. but I would like it to find the synonyms in context. like if I did a phrase search for "service catalog". [15:05:15] https://gerrit.wikimedia.org/g/mediawiki/core/+/b5a1f97c2eee8dc47c5f20d63eaf6c263083e5af/includes/deferred/SearchUpdate.php#103 [15:05:32] yeah it would work but only without quotes. Words don't have to be next to each other [15:07:58] If you change both text and query you can get that to work [15:08:14] ah, thanks for the link :) I did try an first [15:08:56] https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/b5a1f97c2eee8dc47c5f20d63eaf6c263083e5af/includes/search/SearchMySQL.php [15:09:26] I don't know off hand where to do this but I believe we normalise both text and query in different ways, but they have common elements. [15:09:47] So if you normalised catalogue to catalog, you'd match it either way [15:10:02] With the snippet in context [15:10:30] yeah, but I don't know when we get to a level of magic it's just confusing? catalog/catalogue should be fine, though [15:10:33] I'm surprised there isn't an extension for this but if your LocalSettings hooks work out maybe that's decent start for an extension that takes word maps by config array [15:11:19] program/programme, disk/disc/disque ... uh oh, soon getting into hot water :) [15:12:29] You mean it'd be nice if the snippet always showed the original? Yeah [15:12:54] Then you'd need to expand the search only ie turn it into an OR or regex of sorts [15:13:40] yeah, that sounds like a better idea, actually [15:13:55] There's a few things like that but it gets complicated pretty fast. [15:14:38] I expect Elastic to support synonyms but it's default may be limited you could configure more [15:14:58] That'd handle all that transparently. MySQL isn't really built to be a search engine [15:16:33] https://www.mediawiki.org/wiki/Topic:V78v2eafr8idzfzh [15:16:43] https://forums.huddersfield.exposed/topic/316/ [15:17:08] Looks like it does indeed but the Cirrus extension doesn't enable that by default [15:18:46] we're using SphinxSearch I think [15:21:15] and it has http://sphinxsearch.com/docs/current.html#conf-wordforms [15:21:48] (which we don't use) [15:23:15] might be worth future study. the downside to this is that it's not available for endusers [15:24:00] perhaps I could make a plugin to register alternate wordforms :) [15:25:11] realistically I won't have time for it, unfortunately. [15:29:09] thanks a lot for your input!