[01:30:49] what Wikidata query would tell me how many completely empty items there are on Wikidata? [01:31:58] https://www.wikidata.org/wiki/Special:ShortPages apparently [01:35:23] hm, ?item a wikibase:Item doesn’t seem to work [01:36:32] oh, that’s #2 on https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#WDQS_data_differences, nevermind [01:40:01] harej: it's hacky but can query the database for page_len [01:40:39] looks like page_len = 159 is empty (also 158) [01:41:18] depending on how many digits the Q identifier is :D [01:41:35] seems so [01:42:27] looks like there are only 239 empty items, if this is correct [01:42:39] * aude excluding redirects [11:33:33] Lydia_WMDE: https://www.mediawiki.org/w/index.php?title=Wikibase%2FDataModel%2FJSON&type=revision&diff=2213317&oldid=2107919 [12:45:27] hoo: Merging all the things! \o/ [13:04:09] Thiemo_WMDE: Yes! Going through the review queues [13:07:55] Thiemo_WMDE: https://gerrit.wikimedia.org/r/#/c/303600/3 [13:09:07] Thiemo_WMDE: https://gerrit.wikimedia.org/r/#/c/303844/3 <--- jenkins runs into a segfault here, reproducible, but only with zend, not wuith hhvm. also, it doesn't segfault in the follow up patch... [13:10:09] Jonas_WMDE: http://stackoverflow.com/a/16119722 [13:14:31] DanielK_WMDE_: Thiemo_WMDE: RE segfaults https://phabricator.wikimedia.org/T142158 [13:24:05] hoo: huh. narf. [13:34:57] Our parser limit report data key is the only one using camel case, hm :/ [13:35:17] https://de.wikipedia.org/w/api.php?action=parse&text=Berlin&prop=limitreportdata [13:36:56] also the "0": 0 is ugly [13:37:02] but that's how the API works with our stuff :/ [13:37:28] We could possibly make it wikibase: { entityaccesscount: 123 } [13:37:32] DanielK_WMDE_: ^ opinions? [13:45:24] { [13:45:25] "name": "wikibase", [13:45:25] "entityaccesscount": 2 [13:45:26] hm [13:48:05] I'll open a ticket [13:51:24] hoo: looks like character soup. if not camel case, use - or _ [13:51:45] DanielK_WMDE_: I'll create a ticket with the options we hav [13:51:46] e [13:51:49] sure [13:51:57] we don't have that much control over the output, sadly [13:52:00] - is not even allowed [13:59:45] DanielK_WMDE_: https://phabricator.wikimedia.org/T142713 [14:00:12] I'm leaning towards either prefix with wikibase- or limitreport- [14:02:01] wikibase- would be good [14:02:19] if specific to wikibase [15:28:30] Jonas_WMDE: http://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time http://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time [16:33:34] Lydia_WMDE: I hate posting on mailing lists so I'm not replying there (sorry), but regarding the deprecated rank, I think it's a complete mess and is completely misused by lots of people and help:ranking is really ambiguous (mostly because of the use of "outdated" which just means no longer current) [16:36:20] whenever I've looked at deprecated statements, I've found lots which look incorrect and very few which look like they would actually be correct [16:38:17] it seems like people really want to mark things as being historical and while there's no way to do that, people will keep using deprecated for it [16:38:49] hm [16:41:07] nikki: Do you think labeling Preferred "Preferred (latest, most up to date)", … would help? [16:41:16] or something along these lines [16:41:31] Hard to explain that in just a few words [16:42:28] I can't imagine it helping :/ [16:44:06] I get the impression that people want to edit the old statements to mark them as old and mostly irrelevant unless you want old data, marking something as preferred doesn't help distinguish the old statement from future normal rank statements which aren't old [16:45:54] "marking something as preferred doesn't help distinguish the old statement from future normal rank statements which aren't old" << Well it does… you then need to make that statement normal and the new ones preferred [16:46:06] but I see that that's not intuitive [16:48:15] it doesn't clearly mark the *current* statement as old independently of what happens in the future, you would have to constantly monitor the items so that any future statements get fixed instead of being able to do something to the existing statement [16:48:19] "normal" is overloaded anyway [16:49:04] since it's used for new statements which haven't been checked, alternative current but not preferred statements *and* known to be out of date statements [16:49:06] So you want Preferred > Current > Normal > Deprecated [16:49:34] I would have argued for preferred, normal, historical, deprecated based on what I've seen people do with the deprecated statements [16:51:18] the only thing I can think of other than a new rank would be to detect any of the qualifiers which can be used as a way of marking the date it was valid and displaying those statements separately, but that seems awkward since we would have to keep track of which properties can be used that way [16:53:39] and it would have to compare statements with other statements for the same property too, e.g. for population which uses "point in time", they're all going to use the same qualifier, it could only work out which ones are definitely superseded by newer data by checking all the dates [16:58:37] we also have a problem with identifiers which now redirect, I've seen multiple people recommend marking them as deprecated [16:59:57] hm [17:00:10] one of the things people seem to want there is a way to ignore those statements for the purpose of the single value constraints [17:00:18] I guess you will need an RfC about this [17:00:33] that's anything but a trivial change [17:02:01] * nikki nods [17:45:02] Jonas_WMDE: are you around? [17:45:12] jap [17:45:21] >> Error: Cannot find module 'load-grunt-tasks' [17:45:33] that's what I get trying to do grunt deploy [17:45:38] what am I missing? [17:45:51] npm install [17:46:05] ahh pk [17:46:07] *ok [17:46:46] Jonas_WMDE: Thiemo added a number of refactoring patches, did you review them? [17:47:09] yes merged a lot of them [17:48:13] ah ok, b/c I wanted you to review them since it's your code. Some of them still in review, I'll take a look [17:48:47] Jonas_WMDE: slight problem with grunt version - it adds .gitreview to the commit [17:48:54] Jonas_WMDE: see https://gerrit.wikimedia.org/r/#/c/304274/ [17:49:30] also for some reason it minimized maint.html. Not that I care too much but it didn't do it before [17:50:18] but .gitreview thing I'd like to fix [17:52:27] sure with something like gitignore? [17:53:26] Jonas_WMDE: ah, I know what's up [17:53:40] my script first commits, then adds gitreview, then does review [17:53:58] and yours first adds gitreview, and then commits and reviews in one step [17:54:25] Jonas_WMDE: so what needs to be done is to move commit before gitreview is added, and move "git review" call after [17:54:49] you dont actually need .gitreview file when committing, only when doint git review [17:54:55] should be easy to fix [17:58:40] yes should be easy will fix tomorrow [18:04:46] I can make a patch probably [18:04:58] I need to learn to deal with Grunt anyway :) [18:12:56] nikki: meh. really? that's bad. Do you think the language on the help page needs to be improved then? [18:14:28] Jonas_WMDE: https://gerrit.wikimedia.org/r/#/c/304277/ [18:26:06] SMalyshev " -> ' ^^ [18:26:47] oh, ok [20:59:01] aude: are you around? Wanted to ask you about T142670 [21:11:58] yes? [21:12:10] Hello <3 [21:12:56] the defaults could be simple text fields for text and source_text? [21:13:31] and integer for bytes? (or number)? [21:18:45] makeSearchFieldMapping gets the name of the field, so maybe cirrus can still do special stuff in these cases [21:19:31] 1/4 is nog maar gedaan [21:26:10] Oh wait... [21:31:22] aude: sorry, got distracted :) so why you need these fields - is there something you need to do with them? [21:33:35] SMalyshev: it's just odd that core provides data for the fields, but not the mapping [21:33:52] somehow it adds a dependency on cirrus [21:34:11] aude: well, not specifically on cirrus, since any engine can use these fields [21:34:19] but there is no mapping definition [21:34:20] (or ignore them for that matter) [21:34:40] well, that's because mapping is kind of tricky for those. [21:34:49] there can be a simple, dump default [21:34:53] though for text_bytes it's probably not very hard, I could do that [21:34:55] then cirrus can do special things [21:35:11] text is just a blob / text field [21:35:21] it's harder then as I'd have to make specific exceptions to override the defaults [21:35:30] we already have getTextForSearchIndex anyway [21:35:44] text is not just a text fueld in cirrus... it has a bunch of additional stuff [21:35:50] * aude knows [21:36:05] this is just the default [21:36:08] in core [21:36:31] correct, but to override the default, I'd need to build specific code that knows that text field is special... [21:36:39] * aude just thinks there should be symmetry in the mappings and the data for indexing [21:37:10] we probably need that [21:37:10] it's not really mandatory... engine can add or ignore data fields [21:37:26] * aude thinks it's odd not to have stuff like title and namespace as fields [21:37:35] (those are in cirrus, and know they are special there) [21:37:41] that's because those are service fields for Cirrus [21:38:47] at least now, we could maybe generalize them [21:38:49] this probably not hugely urgent [21:39:22] but i think we should think about this more and have a way to handle this [21:39:23] yeah so I wondered if any of this is needed for wikidata work or not [21:39:35] it's just something i noticed [21:39:44] if yes I'd put it on top of the stack, otherwise I'd get to it on the next round :) [21:39:53] * aude was trying to find where the mapping was provided in core [21:40:36] yeah it's not fully generalized yet because it is in all kinds of places in Cirrus and we need to extract it carefully [21:40:49] some of these fields might need some special handling for wikidata, though not sure yet [21:40:52] so I started with easy cases, but harder ones probably can be addressed now [21:40:57] yeah [21:41:11] aude: ok, so tell me if something doesn't work for wikidata, I'll prioritize that [21:41:16] ok [21:41:20] * aude can poke at it also [21:41:35] b/c I want wikidata ES be good, I want to also hook it up to query engine then [21:41:38] priority is getting the parser output fields indexed again [21:41:56] then maybe migrate away from the old hooks and of course, labels etc. [21:42:19] yeah I've seen the patch, I'll review it soon, immediately after I'm done with units patch update. Either today or tomorrow for sure [21:42:24] ok [21:42:31] I think the approach looks good, just want to go through the code [21:42:39] i'm trying to compromise with something practical + way you like and way daniel likes [21:42:55] and something that can be tested well enough [21:43:29] yeah I think it looks good. is it still WIP? [21:43:37] tests [21:43:48] ah, ok [21:43:54] i like to know if the approach is ok [21:44:49] aude: one note: ParserOutputSearchDataExtractor is a class which doesn't seem to have any data values [21:45:10] what do you mean? [21:45:50] it gets the data from ParserOutput [21:46:39] i want to pass ParserOutput to each method, so that at some point there can be one instance of ParserOutputSearchDataExtractor in the handler instead of constructing it for each content [21:46:44] right, as an argument. But when you construct it, it doesn't have any context [21:46:53] that's what i want [21:47:09] if we have a job indexing stuff, then we can have one instance of this thing [21:47:14] for all the pages [21:47:34] not yet done in this patch, because it's hard to inject services into content handler [21:47:43] it's kind of weird to have non-static class which is essentially static [21:47:56] but maybe ok [21:47:59] it's easier to mock in tests, etc. [21:48:05] what it needs though is moar phpdoc [21:48:11] ok [21:48:19] especially describing return formats [21:48:21] ok [21:48:44] i.e. parserOutput has getCategories and this one has getCategories - what's the diff? why we use one and not the other? [21:49:16] ok [21:50:52] I'll probably review it more in detail sometime tonight - I like the general approach but I see a number of nitpicks I'd like to pick on :) [21:51:00] please do [21:51:27] * aude needs to out for a bit (back in a few hours)