[00:42:11] is there a reason the property proposal stuff doesn't use a subpage for each proposal? [00:42:55] * nikki ended up digging through the history of multiple pages to try and find what a proposal looked like before it was archived :/ [00:43:27] would've been a lot easier if I could've just looked at the history for the specific proposal [03:21:29] https://phabricator.wikimedia.org/T98471 hasn't been released yet, has it? (just checking I understand the page right) [06:20:08] nikki: correct :) [06:59:14] Lydia_WMDE: oh, good (sort of), 'cause it happened another four times to me this morning (that I've noticed) and I would hate to think the fix hasn't actually fixed it... do you know when it's going to be released yet? [06:59:44] i'll have to check when the next deploy is scheduled [07:00:34] will do in the office [07:00:55] k [08:59:27] Wikidata-Logo on https://www.wikimedia.org/ is broken o.O [09:04:34] and wikiversity :o [09:04:49] https://www.wikimedia.org/static/images/project-logos/wikidatawiki.png :( [09:09:34] CFisch_WMDE: I made https://phabricator.wikimedia.org/T103296 [09:18:34] Does anybody know how to get wikipedia page titles in different languages from a Wikidata item? [09:18:47] For instance Waldorfschule (Q14551995), I tried: https://www.wikidata.org/w/api.php?action=query&pageids=14551995&prop=extlinks but the API responses with "missing" [09:18:59] I can see the wikipages linked at wikidata, so it should be possible somehow... https://www.wikidata.org/wiki/Q14551995 Can you please give me a hint? [09:25:30] thx addshore [09:26:01] JoH_: www.wikidata.org/w/api.php?action=wbgetentities&ids=Q14551995 [09:26:34] search for 'sitelinks' in that output, and that gives you all of the sitelinks / wikipedia pages etc in different languages [09:27:18] also for that previous request / the Q number(Item ID) is not the same as the page ID sotred by mediawiki [09:27:52] See https://www.wikidata.org/w/index.php?title=Q14551995&action=info where the pageid for Q14551995 is 16223554 :) [09:32:26] addshore: thx a lot! That's exactly what I need [09:32:32] no worries :) [10:22:21] Thiemo_WMDE, DanielK_WMDE_, Lydia_WMDE: http://dc.wikia.com/wiki/Julian_Day_%28New_Earth%29 [10:29:10] jzerebecki: hehe [10:33:40] CFisch_WMDE: Lydia_WMDE just poked someone in ops to look at https://phabricator.wikimedia.org/T103296 ;) [10:33:50] addshore: thanks! :) [10:35:23] * nikki wonders if Lydia_WMDE is now in the office [10:39:13] nikki: jep :) just checked and if nothing goes wrong it should go out on wednesday [10:39:22] thanks :D [10:44:25] oei [10:50:04] Thiemo_WMDE: https://github.com/wmde/WikidataBuildResources/pull/30 [10:53:41] nikki, Lydia_WMDE: sorry i managed to look it up incorrectly, it should go out next week wednesday, not this week [10:54:20] we branched wmf/1.26wmf9, this week is wmf/1.26wmf10 and we always skip one [10:55:46] DanielK_WMDE_: will look at factoring TermSqlIndex::getWeight out of that class (as is said in the TODO)... [11:00:37] or maybe just weight in getMatchingTerms also [11:10:01] addshore: or just don't use getMatchingTerms, and use getMatchingIds instead? [11:10:20] hmm, there was a reason I didn't do that *looks* [11:10:24] addshore: that makes it hard to answer the "why does this match" qwuestion, though [11:10:36] that might be the reason :) [11:10:37] but it could be changed to return TermIndexEntries [11:10:49] adding the ranking to getMatchingTerms should be easy [11:10:53] getMatchingIds is only used by SearchEntities and ItemDisambig [11:10:57] you can do with it what you want [11:11:33] addshore: since we do ranking in a sick and twisted way, it's ssadly not easy at all... or at least, I couldn't think of a good way [11:12:03] honestly, don't try and be too smart, use or copy existing code. we really want to move all this to elastic. [11:12:03] Well, it would be done in the same interesting way that it is done in getMatchingIds :/ [11:12:13] yea :/ [11:12:23] yeh, I should be able to do it copying that code ;P but its.... yeh... [11:12:58] Or just return the ranking in the TermIndexEntry objects if its there and then we can sort in the interactor [11:13:02] it would be nice if the interactor didn't have to deal with sql itself. [11:13:10] mhhhm, no, that wouldnt work actually.... [11:13:16] but then, if the abstraction is really in the way, just do it [11:13:35] Ill throw a patch up in a bit and see what you think [11:17:15] \o/ [11:26:35] addshore: btw: please make feature/topic branches [11:27:06] otherwise, git review keeps messing with master, which is annyoing [11:27:16] okay, I have to do it for each change in gerrit though as I dont use git-review ;) [11:27:33] maybe use git review ;) [11:27:57] but even with git review, you have to manually create the branch for each patch [11:27:57] mhhhm, if this is going to be the only feature it brings that I need, then perhaps not ;p (it slows me down) ;) [11:42:34] DanielK_WMDE_: added you to a quick draft https://gerrit.wikimedia.org/r/#/c/219808/2 the method is only used by TermPropertyLabelResolver::loadProperties and TermSqlIndex::getLabelConflicts (you probably have a better idea if the internal limit of 5000 would effect anything in a bad way or not [11:48:24] addshore: the limit would not be bad, but the fact that we'd *always* load 5000 entries would be a performance problem [11:49:12] how much of a problem it actually is depends on how they use getMatchingTerms, that is, whether there would ever be "too many" matches. [11:49:31] I could also create a new method as implemented in that patch called getMatchingTermsWithRanking or something only used by searchEntiies / the disambig page and thus the interactor [11:50:26] addshore: which would be the same as modifying getMatchingIDs [11:50:42] ...to return entity ID + matching term entry [11:50:42] but it would still return terms rather than entityIds [11:50:48] yea [11:50:59] perhaps with entity IDs as keys? [11:51:23] but because its all being done in stages I would have to add the method, then the other 3 patches, then I could remove the getMatchingIds emthod [11:51:58] could add a "useSillyOrdering" option to $options of getMatchingTerms in the interface [11:52:20] hehe, sure :) [11:52:43] cool will do that.. :) food first [13:09:49] Lydia_WMDE: while I remember, when do you go on holiday? [13:11:02] I remember something like the first weeks of June. [13:11:20] sjoerddebruin: July not June then maybe ;) [13:11:34] Yeah, July. [13:11:54] Red Bull is still finding his way in my body. [13:34:19] hello, complete noob here so excuse my ignorance... I got the latest RDF dumps and (while still importing) I think I'm not understanding something [13:34:24] on the wikidata-statements.nt I see a lot of [13:34:29] . [13:34:35] while I would expect [13:34:59] . [13:34:59] notice the S* suffix on the subject and the extra `v` on the predicate [13:38:35] diaolos: I don´t know anything about those dumps, but I agree with you that the random characters are weird. [13:40:14] The main developer of this is not on IRC if I can remember, best thing you can do now is create a bug on https://phabricator.wikimedia.org/ Be sure to include the Wikidata project. [13:45:11] JohnFLewis: :P flying early on july 4th [13:45:23] okay :p [13:45:38] * JohnFLewis tags it in his calendar [13:46:01] thx sjoerddebruin [13:47:08] Lydia_WMDE: how will the dev update happen? still doing those in the summary itself or the google doc or? :) [13:47:33] JohnFLewis: will discuss with lucie to see if she can help make that happen [13:47:42] okay - awesome :) [14:15:32] SMalyshev: around? [14:46:13] addshore: re ranking and grouping by entity id: if raking is done in TermSqlIndex, and Grouping in the TermSearchInteractor, how does this work with limits? If I want 20 matching items, how does this work? As far as I understand the code, it would currently ask for 20 matching terms, and then group, often resulting in less than 20 items, possibly even just one. [14:46:26] benestar: it's till early in the us [14:59:23] DanielK_WMDE_: so yes, currently the limit restricts the number of termMatches rather than the number of entities that are returned. [15:00:39] also, it turns out a bunch of tests use TermIndex::getMatchingIDs for stuff :/ [15:02:28] *goes to remove the method and usages*... [15:14:54] addshore: getMatchingIDs could be kept and implemented directly on top of getMatchingTerms with the new option. It's not very efficient, but fine for tests. [15:15:36] managed to remove all the usages with about 3 lines of extra code in one place :) all good [15:15:37] addshore: it seems to me like getRowsOrderedByWeight should also group per entity - or rather, after sorting, keep only the first occurrance of each entity. [15:16:03] I don't think there is any code path that uses getRowsOrderedByWeight and doesn't need grouping by entity. [15:16:13] If getRowsOrderedByWeight did that though, you wouldn't get all matches [15:16:45] and also you wouldn't be guaranteed to have the number of results you would probably expect by setting limit [15:16:46] addshore: afaik, there is no use case of getting all matching terms ordered by rank. [15:17:21] the number of terms in the result should be correct, unless 5000 entries are not enough to satisfy it. [15:17:43] ahh true, because of the internal limit [15:17:47] yea [15:18:10] addshore: uses cases: uniqueness (no ranking), lookup for given entity id (no ranking), finding entities (unique, ranking) [15:18:20] that should about cover it [15:18:50] well, if there is never any need for all matched terms to be used / displayed it should be easy enough to refactor the chain to always only expect a single matched term [15:19:16] the TermIndex interface is overly generic, catering to potential use cases we never had, resulting in an amorphous interface with bad performance. [15:19:26] i'd favor one narrow interface for each use case [15:20:04] indeed, infact, I couldnt do that in getMatchingTerms, would have to be in another method or I would just break the interface / definition [15:21:09] addshore: that's why i suggested to modify getMatchignIDs. But actually - the new option could be defined as "onlky include the term with the highes rank for each entity". that would not break the interface contract [15:21:23] the grouping/filtering would only happen if the rank option is set [15:21:55] That probably sounds best right now! [15:22:19] addshore: but, perhaps it's nicer to implement getMatchingEntities based on getMatchingTerms by calling getMatchingTerms with limit 5000 and then running getRowsOrderedByWeight on the result, for soreting and grouping/filtering [15:22:24] i think that would be cleaner [15:23:00] it'S essentialyl the same, but in one case you have options and if/then/else , in the other you have two methods and delegation [15:25:38] right! :) [16:54:39] Thiemo_WMDE: re "cosmetic": to me, somthing that only changes names and documentation is cosmetic. but go ahead and change the summary again, I don't care much about the wording, as long as it's not misleading. The patch changes a few lines of code around, but I don't see any actual refactoring. The classes and methods involved are still the same, parameters are still the same... or am i missing something? [17:28:37] DanielK_WMDE_: around? [18:03:45] SMalyshev: yea, but i'm a bit distracted by the kids :) [18:03:47] what's up? [18:04:06] DanielK_WMDE_: wanted to figure out the https question [18:04:47] DanielK_WMDE_: the problem there is that we use canonical URL for data URI (not concept URI!) and that one is now https [18:04:47] * hoo still thinks we should use https (as I said during the daily on Thursday(?)) [18:05:00] hoo: why? [18:05:26] SMalyshev: yea. i'm a bit torn on that. for the concept uri, i definitly think it should be http. for the document... well, the canonical *url* should be https. [18:05:50] ...but the canonical uri should probably stay http, for consistency, and by convention [18:05:52] hoo: around? [18:05:57] Somewhat [18:05:59] concept URI is not a problem if we set it (though when we generate it automatically it still can be https) [18:06:22] yea [18:06:38] DanielK_WMDE_: so when we generate data URI, how we ensure it's consistent? [18:06:44] SMalyshev: i'm asking myself whether we really need three things instead of two: the concept uri, the document uri, and the canonical document url. [18:06:59] consistent with what? [18:07:22] DanielK_WMDE_: well, canonical document URI doesn't come from us, it's mediawiki concept for the page as I understand... or even for the server [18:07:37] we just use canonical server URI to build wdata: URI [18:08:15] DanielK_WMDE_: consistent between different ways of requesting the page from the server. We can use different protocols, different URIs maybe, but inside the data wdata: probably should be the same, right? [18:08:24] otherwise it becomes messy [18:10:43] hoo: why does SiteLinkTable::insertLinksInternal make a new db call for every sitelink and doesn't pass an array of rows to insert? [18:10:52] SMalyshev: i think the idea of canonical page URLs is only now really being thought about and implemented. but our canonical data document uris go via the EntityData special page, so they are a bit different anyway [18:11:32] and yes, the data uri should always be the same, no matter how the page was requested [18:11:51] i think core has a problem with that, saw the discussion wrt link=canonical" [18:12:05] DanielK_WMDE_: ok, so then we have 2 questions: 1. what we base it on and 2. how we ensure consistency [18:12:07] DanielK_WMDE_: Fix for that has been +2ed literally now [18:12:28] benestar: I guess whoever wrote that didn't know that that function can take an array [18:12:40] DatabaseBase::insert, I mean [18:12:42] right now we have 2 separate places where we produce data url - special page and dump [18:13:05] pretty much the only common thing between them is rdfbuilder and rdfvocabulary [18:14:40] benestar: That reminds me of https://phabricator.wikimedia.org/T99459 which really, really needs to be fixed [18:14:42] hoo: can't find it any more, can you give me a link? [18:14:49] Can't even run our maint. scripts right now :/ [18:15:06] DanielK_WMDE_: https://gerrit.wikimedia.org/r/#/c/219782 [18:15:19] SMalyshev: hm... $this->getPageTitle()->getCanonicalURL() . '/'... that *should* be consistent, no matter how it is accessed. If getCanonicalURL actually works right [18:15:33] but wrt the protocol, this is the canonical *url*, which should be https. [18:15:50] we *may* want the canonical uri to be different, but I don't really like that idea too much [18:16:03] hoo: thanks [18:16:55] hoo: hmm, ok. I'm a bit stuck in that class so maybe you can help me a bit? [18:17:10] I finally want to put the badges thing somehow in there [18:17:46] SMalyshev: i'll write down my thoughts about it on the ticket, and ping lydia. in the end, it's a product level decision. [18:17:51] DanielK_WMDE_: it returns https now, because canonical urls moved to https [18:18:14] ok, let's hear what Lydia thinks on it [18:18:14] yes, which makes sense for document urls. and maybe for document uris. trying to find best practice [18:18:17] well, I stumbled on that already :S [18:18:43] AFAIK for rdf best practice is using http unless there's a very good reason not to [18:19:02] like http url actually not existing but https exists [18:19:08] hoo: when the badges have changed, it doesn't make sense to remove the complete sitelink and readd it, we should only touch the badges table [18:19:11] but maybe I'm wrong [18:20:13] the question is, should SiteLinkTable know about badges and handle them (because that would blow up the class a lot :S) [18:20:16] we can also ping Markus and hear what his take on this [18:26:12] hoo DanielK_WMDE_: should SiteLinkTable also handle badges stuff or should we create another class for that purpose? [18:26:12] -> related: should rebuildItemsPerSite also rebuild the badges table? [18:27:03] mh... there's two ways to look at that [18:27:11] benestar: since SiteLink knows about badges, SiteLinkTable needs to handle that too, i think [18:27:19] badges are part of the concept we call SiteLink [18:27:19] on the other hand, those are two tables [18:27:37] You could have two classes (one per table) and something that takes an instance of both and delegates work [18:28:00] SMalyshev: i think you are right, but I find it somewhat annoying that the canonical url and the canonical uri would be different [18:28:10] i'll try to summarize [18:28:41] DanielK_WMDE_: I agree, it's annoying, but I don't see better way to do it so far. Let's see if somebody proposes one [18:37:32] hoo: what about put a BadgeTable instance into SiteLinkTable to split code? [18:37:53] Could also do that [18:47:29] SMalyshev: commented on https://phabricator.wikimedia.org/T102717 [18:47:45] i currently tend to merge your patch, but we need a good solution for hoo's patch, too [18:47:57] hoo: this also impacts the urls for rel=alternate [18:48:01] Yeah... I wasn't to sure about that [18:48:14] We use the concept base uri for the link we have in the sidbar [18:48:16] * sidebar [18:48:23] so I decided to do that as well [19:35:16] On Wikidata:Arbitrary access there are just a few future wikis on the list. What is the schedule for the rest? (Or to make it lolcats: When can svwiki has it?) [19:46:40] svwiki can haz wiki? [20:07:10] hoo|away: btw -2 on my lua/capiunto patch :O [20:11:52] Ainali: we'll schedule more after wikimania [20:12:07] there is a deployment stop over wikimania as most of us are traveling and so on [20:12:38] DanielK_WMDE_: SMalyshev: will look at the ticket tomorrow. my brain is too fried for this tonight :P [20:12:40] sorry [20:12:57] Lydia_WMDE: no problem, thanks [20:19:37] benestar: Yes, sorry :( [20:19:51] It's just that I'm not really sure where we want to go with Lua and how [20:20:02] I guess I'll be able to tell you more by Wednesday or Thursday [20:20:06] Sorry :( [20:20:28] would be great if that work wasn't wasted... [20:20:34] but nvm [20:37:05] sjoerddebruin: updated [20:37:11] Okay, thanks.