[14:02:40] hello hello, testing 1 2 3. be right back, must give cat her yummies. [14:03:04] hello, apergos [14:04:38] yo yo [14:04:57] sorry about that, my kitty gets 1/3 of a packet of soft food sometime in the mid afternoon [14:05:15] it's not really even for nutriional value, mostly she eats dry food, but it's a kind of treat :-D [14:05:35] so hnowlan clarakosi CindyCicaleseWMF here's my thoughts about the watchlist: [14:05:53] we can absolutely commit the maintenance script regardless of whether the job itself gets enabled in production [14:05:54] that reminds me - I have to give my daughter's sick rabbit her medicine - brb [14:05:58] ah go go go [14:07:13] and we could add it to the production dumps but skip over it in the actual run; there is a facility for skipping certain jobs, which is how we run partial dumps on the 20th of the month [14:07:24] (by skipping the page content history jobs) [14:08:10] so in that sense it would not go to waste in that it wouldn't be code that is written and then tossed [14:09:38] * clarakosi nods [14:09:40] the other thing we can do is consider whether to implement the less leaky format of it (page ns + title + occurrences) which might be more likelyto pass even if the grouped watchlists were rejected [14:09:45] imo [14:11:03] I'll wait for CindyCicaleseWMF to get back and then collect comments/suggestions/gripes on this [14:20:01] that's one ornery rabbit it seems [14:21:39] CindyCicaleseWMF: please ping me when you are rabbitted up and ready to proceed [14:24:12] lol - sorry, she is feisty. I'm back now [14:25:00] apergos: that sounds like a good plan [14:25:00] ok! [14:25:18] I am now collecting comments/suggestion/gripes/objections [14:25:26] please don't be shy, step right up! [14:25:56] No objections from me. As long as we get to see most of the process with the watchlist task then I'm ok w/ it [14:26:39] hnowlan: ? [14:27:07] I like the idea of adding the watchlist dump, whether it be aggregated with counts or per watchlist item, so that we can go through the full lifecycle of adding it to the jobs. [14:27:13] sounds good to me [14:27:54] this is an easy room! thanks a lot folks; in that case I will give you new homework: please look closely at the dumps/xmldatadumps/dumps/apijobs.py file and its associated config file [14:28:11] I link to it in the google doc from our presentation last week [14:28:28] sounds good - thanks for the homework! [14:28:30] please reread that part of the doc and have a look at the methods there [14:28:43] Could be way out of my league on this but is there any kind of canarying that could be done to be sure that we're not introducing anything undesired? [14:29:04] in what sense? [14:29:39] (that's my polite way of saying i didn't understand the question well ;-)) [14:30:09] I don't think I do either - I'll have a look and see if I can understand the problem enough to answer the question :P [14:30:19] but for now ignore [14:30:30] for scale, we will get some estimates of how bit the watchlist tables are across all the wikis [14:30:51] and do some small testing of Title:exists() running on some number of those [14:31:45] as far as bad data, if pages exist then that info is already in our current dumps; the page table includes the namespace and title of every existing page [14:32:42] data leakage (being able to guess who owns a watchlist) is something for the privacy-security folks.... so I'm thinking about this from the wrong angle for your question. lemme know the right vantage point when you get there and I'll try to answer better! [14:33:13] do folks have questions/comments in general for me? about the task or anything else? [14:33:17] my cat's name? :-D [14:34:04] haha what is your cat name? [14:34:31] And it all sounds good to me. Will probably have more questions after looking at the hw assignment. And I assume we should get this done before Thursday? [14:35:24] her name is Luna! [14:35:28] apergos: ah! I was thinking about it from the data leakage perspective but I also realise i don't understand the problem space enough so I'll read into it more :) [14:35:54] ah for data leakage, look again at the task for a few of my thoughts, but basically we have to have privacy-security folsk go through that [14:36:06] and thnkin about it in context of other data that I may or may not know about too [14:36:33] clarakosi: yes, having a look at it by our next meeting would be good [14:36:48] as always this is not a high stakes venture, if you cna't get the time no one will make you write lines [14:37:00] but if you do get the time, that would be awesome :-) [14:37:14] 👍 [14:37:16] I'm here always for questions/comments, feel free to ping me [14:37:38] if i don't answer, I might be asleep/afk or coding, I'll answer when possible [14:38:37] thank you apergos! [14:39:42] thanks apergos! [14:40:16] thanks for showing up and participating! see folks thursday in the hangout, or earlier on irc! [14:41:05] thanks! 👋 [14:47:09] Hello. I am a regular user of Wiktionary. I wanted to note that—when trying to use the advanced search function to search only words in the category "Category:English lemmas", I suddenly get the error message "A warning has occurred while searching: Deep category query returned too many categories". [14:47:18] I have used the advanced search feature to search for words that are in this category many times before, but suddenly it no longer works, spitting out that error message. Another Wiktionary user has confirmed this bizarre seeming bug. [14:49:34] Does anyone here know what might be going on? o_O [14:53:47] mark? [14:54:01] apergos? [14:54:10] yes? [14:54:22] Do you have any idea as to what might be going on? [14:54:32] just a moment (in another conversation, I'll be right there) [15:00:55] Tharthan: I don't know if there might now be too many categories when previously there weren't (because new categories have been added) or if this is a regression of some kind [15:01:29] What do you mean by "a regression", in this instance? [15:03:03] I mean a bug introduced that complains about too many categories when in reality the number of categories is not the issue [15:03:13] this message is produced by CirrusSearch: [15:03:23] ./includes/Query/DeepcatFeature.php [15:05:03] https://phabricator.wikimedia.org/T188350 this discusses the introduction of the feature and it's where you might start looking [15:05:28] The idea that new categories may have been added which meant that the category in questionn now had too many subcategories was brought up by the other user who tested this (who is also a moderator). However, he said that this was doubtful, given that nothing in Recent Changes indicates that any new categories were added. [15:08:59] the max value for the setting in question is 256 [15:09:11] https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/master/docs/settings.txt#L1546 [15:10:12] Yeah, Category:English lemmas only has fifteen subcategories. [15:10:19] https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/master/includes/Query/DeepcatFeature.php#L218 [15:10:24] that's where the error is emitted [15:10:34] yeah but I don't know what cirrussearch sees when it looks [15:10:57] does it only see those 15 subcategories or is its way of finding related categories different? [15:11:09] this is a case where you need someone with expertise in that area [15:11:32] I see. [15:12:17] I'm sorry I can't be of more help; search is way outside my knowledge [15:12:54] No, that's alright. I appreciate your assistance anyway. [15:13:32] are you able to use this feature for other categories that you know have subcatoegories in them? [15:13:41] i.e. we know it works at least some of the time? [15:15:02] Yes, Category:English adverbs seems to work with the feature. [15:15:12] note we are using some sparql service to get this list and i've no idea how that works whatsoever [15:15:20] but I guess that's the next place someone needs to look [15:15:44] ok, if it's not generally broken that makes me think it's really on the sparql service and what it finds in the particular case [15:16:34] would you be willing to file a task in phabricator and tag it with discovery-search ? [15:16:47] https://phabricator.wikimedia.org/maniphest/task/edit/form/1/ [15:17:09] I can try. I've never done that before. [15:17:52] Why is it requesting permission to see my e-mail address and stuff? [15:18:04] you'll want to create yourself an account, but then it's just a matter of filling in the form. you want to describe which wiktionary you were on, what you clicked to do the search, what terms you entered, etc, so we can reproduce it [15:18:30] if you let it have your email address then you can subscribe to bugs and get email when they are updated [15:19:12] you don't have to do that, I think [15:19:25] I do it because I really really need the email notifications [15:19:33] It has "required" next to the box, so I do. [15:19:36] But Phabricator is an official Wikimedia service, right? [15:19:52] @Tharthan having an email account also lets you recover it should you disconnect OAuth or change accounts, etc. [15:20:41] https://www.mediawiki.org/wiki/Phabricator/Help#Using_email it says the email is required for verification but not shown to other users [15:20:51] yes, this is an official wikimedia service [15:21:51] we occasionally get spammers signing up, and naving a valid email address that they must verify is better than 0 hurdles at all [15:22:22] I see. [15:22:33] What do you think would be a helpful subject line for this report? [15:22:43] Err... task. [15:23:48] CirrusSearch deep category search hits limit ? [15:23:58] someone can fix it if they don't think it's descriptive enough [15:24:37] How about this: "CirrusSearch deep category search appears to hit limit arbitrarily"? [15:24:42] Or is that to presumptuous? [15:24:44] great [15:24:46] no, not at all [15:25:20] if it turns out to be something else it will get retitled, no worries [15:25:32] Do I tag this with "Discovery-Search" or "Discovery-Search (current work)"? [15:25:49] I would try Discovery-Search and let someone else categorize it into the right column [15:25:55] on the workboard [15:26:08] All right, thanks. I'll type up the report. [15:26:21] awesome! if you link it here I'll add any details that seem to be missing [15:26:36] Sure. [15:36:42] apergos: Done. https://phabricator.wikimedia.org/T260152 [15:37:21] this is en.wiktionary.org ? [15:37:33] Yes. [15:37:49] and you said that Category:English lemmas only has fifteen subcategories ? [15:38:11] Yes. [15:38:13] you might add both of those to the task description. [15:38:27] I'll add a comment about the code etc below. [15:40:55] All right. I've done that. [15:41:33] I've commented. Now we wait :-) [15:41:58] Alright. Sounds good. Thanks for your help. [15:42:04] thanks for your report!