[02:24:11] anyone want to comment on [[Wikidata:Property proposal/Sefaria ID]], pretty please? this property would be useful for several little projects I'm planning. [02:24:12] 10[1] 10https://www.wikidata.org/wiki/Wikidata:Property_proposal/Sefaria_ID [02:25:18] also, it's good in general for items to link to helpful free-culture projects that have lots of useful content for people wanting to learn about associated topics. [03:42:08] C [09:10:12] sjoerddebruin2: was it you who didn't like people adding "imported from" references to stuff which wasn't? [09:10:25] because it seems like krbot is doing that now [09:11:21] like https://www.wikidata.org/w/index.php?title=Q21130143&diff=545635381&oldid=541036041 which I added almost two years before, long before the page it was supposedly imported from even existed [09:12:29] nikki: I suspect that the bot is adding Geonames IDs and, when an ID is already defined, it combines the statement and adds that reference [09:12:49] nikki: bots are okay for me, that's what the property is designed for [09:12:53] but it wasn't imported from that page [09:12:59] it was there before the page ever existed :/ [09:13:23] and was added manually... [09:14:04] nikki: ^ I think the bot is adding IDs indiscriminately, no matter if existed or not ^ [09:14:34] If they existed, it only combines the already defined statement with their own [09:14:57] That's also how Quickstatements works [09:16:05] anyway, if we want to add wikipedia "references" for all statements, even ones which weren't actually imported from wikipedia, we shouldn't be using imported from [09:16:39] I don't like my edits being turned into ones which look like bot imports [09:18:29] The ID that KrBot wanted to import was indeed imported from that Wikipedia; it would be better that KrBot checked out what IDs are already defined before running, but the current behaviour doesn't seem such a bad thing IMHO [09:19:52] it wasn't though, I added it myself. the fact that if someone removed it, someone else could reimport it from the cebuano wikipedia is irrelevant [09:22:26] meh [09:22:43] I think I need a break, totally burnt out by wikidata right now [09:23:03] all I seem to do is clean up after bots and badly designed tools [09:23:20] No no, you're right, nikki :) [09:25:23] But that's not something that actually degrades the entities, I think, that's not so relevant... [09:25:50] In any case, you have all the merit of having included those IDs :) [09:28:50] Would it be possible to create a filter to prevent adding "imported from"s when the ID is already there and was defined by a different user? [09:29:07] In other words, last edit wasn't made by the current user [09:29:19] should be easy to check whether the statement exists before adding it [09:30:07] Yeah, but we can't ask every single user to do it :( [09:30:24] nikki: :( [09:31:02] well, it's a bot, not a user, users aren't supposed to add imported from statements at all [09:31:47] (well, not while doing manual edits) [09:32:11] But with QuickStatements we all become cyborgs ;) [09:32:59] true... [09:34:16] sjoerddebruin: yeah :/ doesn't help that I haven't been very well either. running around after a string of bots all making mistakes based on the mistakes of the previous one is frustrating enough even when I feel great [09:34:36] Aww. [09:34:43] :( [09:37:47] I need energy for my backlogs, the jetlag isn't helping. [09:48:57] I'm also frustrated at mix'n'match, so many bad matches coming from it... [09:49:44] that's mostly the game in the game 2.0 right? [09:50:05] the edits I've seen have been a mixture of both [09:51:32] the problem seems to be that it suggests implausible things and then shows almost no context, so they look the same from what little data you're presented with [09:52:20] Those tools are so "dangerous" in terms of usability, particularly when having to select between two options [09:52:46] There should be a quorum > 1 (more than one user selected the same) to apply the edit [09:53:01] like I've seen people adding ids for something in alaska to something in antarctica... yeah, they have the same name, but it should never have suggested it as a match since it's clearly too far apart to be plausible [09:54:06] or the id for a river being added to a city, because they have the same name [09:56:34] but the more I think about it, the less sense it makes to do the matches with humans, a computer can check whether the coordinates look sane, check whether they're in the same country, check whether the types match, etc. the only thing a human can do well that a computer can't is look at a map and say "yeah, those two coordinates point to the same city" or "no, those are two separate villages with the same name" [09:56:39] but mix'n'match doesn't have a map :P [09:57:21] A computer can do it too xD [09:58:23] yeah, I didn't mean it can't do it at all, just that it's harder to do [09:59:14] Not really, it's almost immediate to calculate the distance in straight line between two coordinates [09:59:36] but how do you know if it's a really big city or two small villages quite close together? [10:00:27] If they are veeery far between them, they can't be representing the same city; if they are closer, there can be doubts, yeah :S [10:02:42] yeah, there are definitely things which are pretty obvious even for a computer [10:03:27] but some where I only noticed they were different because I looked at a map... I don't remember how far apart they were but there was nothing else to suggest they were different [10:03:57] silly people and their uncreative place names :P [10:06:28] Silly conquerors :) [10:06:49] Will we repeat the mistake of naming settlements as in our planet if we conquer the Moon? [10:07:09] probably [10:08:33] Only with the purpose of confusing Magnus' tools ;) [11:27:57] PROBLEM - High lag on wdqs1002 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [1800.0] [11:32:57] RECOVERY - High lag on wdqs1002 is OK: OK: Less than 30.00% above the threshold [600.0] [12:24:08] Lydia_WMDE: will there be extended support for the new recent changes (and soon watchlist) for Wikidata? Would love to filter for new/edited/removed claims/descriptions/sitelinks :) [12:25:38] sjoerddebruin: i've not looked into it yet. but as always this kind of filtering is tricky because worst case a whole entity might be changed [12:25:41] in one edit [12:25:53] and there is no easy way to tell if the description was changed for example [12:25:53] Yeah, I know the system can be complicated. :( [12:26:07] i'll ask around though [13:42:41] Lydia_WMDE: I saw https://www.wikidata.org/wiki/Topic:Tw81of56l98m2l24, I don't agree that language fallback works just fine but that person's talk page doesn't seem like the right place to discuss it, where would be better? [13:44:45] nikki: write to the mailing list. best summarize or quote the conversation so far. [13:45:10] nikki: if you think there is a specific problem with the fallback chain, you can of course also file a phab ticket [13:45:31] also if you're concerned about descriptions bloating the db, does that mean you're finally willing to let us have dynamically generated descriptions instead of us having an army of bots adding static generated ones? [13:47:27] it's not the fallback list itself that's the problem, lydia is suggesting that language variants should be treated differently to different languages, but we don't actually make a distinction between language variants and different languages [13:49:04] i think it's quite important to make that distinction [13:49:08] i just added a comment on the page [13:49:10] e.g. if I switch to british english, I start getting "english" appended to lots of things. the only way to get rid of it is to fill in the british english fields [13:49:43] yea, that added hint is useful when falling back from German to English, but not really when going from British English to "default" English. [13:49:53] I think we have a ticket for supressing that for variants [13:49:57] wouldn't be hard. [13:50:15] and if I edit the labels, it focuses the empty british english field... I have to do lots of tabbing to get to the english field [13:50:25] Lydia_WMDE: ^--- we discussed that before, do you remember whether we have a ticket for that? [13:50:55] nikki: yes, that sucks. editing labels & description in multiple languages generally sucks. I agree that should be improved. [13:51:06] flooding the database and rc streams isn't the solution, though [13:51:10] I would rather have something where variants which are normally the same are collapsed by default, and you can expand it to add labels when they actually differ [13:51:35] yeasounds reasonable to me, suggest it to Lydia and Jan :) [13:51:54] Bonjour ! Can someone explain to me how search works on Wikidata? I was looking for the article "dolphin" and could never find it until it appeared magically in the suggestions while typing (but not on the results list at all - a million other dolphin things appeared before that) [13:52:01] automatic descriptions are also something i'm personally not totally opposed to. But we have to think hard to get them right. [13:52:49] delphine: "full text" search on Special:Search is essentially useless on wikidata. suggestions in the "quick search" box (top right) work much better. [13:52:59] over half of our descriptions are just "wikimedia category", "wikimedia disambiguation page" and "wikimedia template" in loads of languages, just supporting those three things would already go a long way towards improving things [13:54:29] about language fallback, I have another pink poney request. When using English as default language, there is no fallback at all, which is sad! For instance, if an item has only one Japanese label, I would like to see that label (with appended at the end) even if I cannot read japanese. [13:54:39] delphine: we are working on fixing this. https://phabricator.wikimedia.org/T46529 [13:54:44] as you can see, there is a lot to do [13:55:15] pintoch: yes, but if there are 10 non-english label, which one do we show? [13:55:38] something arbitrary and deterministic? [13:55:42] it could start by showing ones the user says they speak :P [13:55:43] nikki: yes, i see the problem. best take it to the mailing list, or phab [13:55:49] * DanielK_WMDE_ is about to run away in a few minutes [13:55:57] Ahhh, language fallback. [13:56:06] (like I don't get fallbacks from english to german, except on standard mediawiki pages) [13:56:10] but it would be even better if people could indeed input their own ranked list of languages [13:56:28] pintoch: alphabetic? number of speakers? should it depend on your UI language, or should it always be the same? the reason english has no fallback is really that we never found an anser to that questions. [13:56:34] Babel is requestable via API now, right? [13:56:58] pintoch: personal language fallback means no caching. no caching means the servers go up in smoke. [13:57:06] If we use usage on Wikidata, we end up with Dutch fallback :P [13:57:16] DanielK_WMDE_: say, the label of the first language code in alphabetic order (on the language codes) [13:57:41] that's arbitrary, but already a lot more informative than no label at all [13:57:48] hm. lots of good ideas here, lots to discuss. i have to go now, though. [13:57:55] please take it to the mailing list and/or phabricator [13:57:56] o/ [13:58:02] okay, see you :)( [13:58:06] *:) [14:01:00] Would it make sense to look at other languages in the same group? [14:01:12] Like, fallback to German if my language is Dutch. [14:01:58] sjoerddebruin: that is something that can already be addressed with the current system (just need to tweak the fallback graph), right? [14:02:35] isn't ULS split up in continents [14:04:43] I wonder if it can add languages after english based on languages further up the chain [14:04:47] I don't know - what I mean is that what you are proposing can be implemented without changing any line of code, it is just a mater of configuration [14:04:52] (that was for sjoerddebruin) [14:06:51] like iirc ukrainian used to fall back to russian but it was changed to fall straight back to english... but if there's no english, then russian is still probably the next best choice [14:07:52] and dutch english german would make sense to me too [14:09:23] it would be easier if we had countries too [14:13:04] like in the uk, french is a common language to learn at school, but I would expect spanish to be the most common in america [14:14:45] I assume french is more common than spanish in canada... no idea what they do down in australia and new zealand [14:16:25] I seem to remember one person saying indonesian and japanese, which are pretty exotic from a european point of view, but pretty logical from a geography point of view [15:54:44] nikki: try this in your user css: .wb-language-fallback-variant { display: none; } [15:54:50] should hide all fallback indicators between variants [15:57:27] huh, so there's a distinction in the class names already? [16:05:27] anyway, yes, that works :) [16:11:42] now I just need an improved terms box (or at least the ability to have english first instead of third) and I could finally switch to british english permanently [17:31:13] SMalyshev, around? [17:31:28] yes [17:31:44] i just did one more minor update to the code - could you briefly skim over it, see if this is the right way to do it? [17:32:10] https://github.com/wikimedia/wikidata-query-rdf/compare/master...nyurik:master [17:33:16] you can add prefixes by putting them into prefixes.conf [17:34:16] SMalyshev, i am doing that https://github.com/wikimedia/wikidata-query-rdf/compare/master...nyurik:master#diff-54a859a91a2b752874913851c3be7e48R5 [17:34:44] then I don't think you need context listener [17:35:48] ldf-config.json is missing comma after osmroot [17:36:02] also for some reason deletes xsd? [17:36:04] yeah, i already fixed that [17:36:20] deletes xsd? [17:36:29] - "xsd": "http://www.w3.org/2001/XMLSchema#", [17:36:40] ah, yes, see my PR for that - you had a dup there [17:36:50] ah, ok [17:37:03] there are 3 PRs in gerrit [17:37:09] pls take a look [17:37:56] ok I will [17:38:15] i have a question about osm tags: some of them are very common, while the long tail is infinite [17:38:24] so i hardcoded top 100 - is that a good approach? [17:38:35] yeah the diff looks fine for me, except that I don't think you need new context listener for this... [17:38:52] they will all be osmtag:name (common) and osmtag:something_weird (1-offs) [17:39:03] will the system work ok for that? [17:39:27] yurik: yeah 100 maybe even overkill - the diff would probably be that the vocabulary ones get shorter storage but it only makes sense if you have lots of them [17:39:49] if you have not so many it may be not worth to bother [17:39:59] SMalyshev, https://taginfo.openstreetmap.org/keys [17:40:08] see the number of objects [17:40:25] by 100s, it drops to a few million [17:41:27] yeah I see. I'd probably just experiment and see if changing the cutoff makes any difference on db size :) [17:42:27] oki, thanks :) [17:43:11] as for listener -- you do addDecl for wd: namespace in your listener, but also list it in the ldf-config.json -- https://github.com/wikimedia/wikidata-query-rdf/blob/f6b758b2f0e06778c55a6a256aa0e810c0c9d06f/blazegraph/src/main/java/org/wikidata/query/rdf/blazegraph/WikibaseContextListener.java#L192 [17:44:40] ld-config is a separate config [17:44:47] multichill: what is your opinion? https://www.wikidata.org/w/index.php?title=Q15636126&action=history [17:44:51] but prefixes.conf should cover all new prefixes [17:45:10] yurik: prefixes.conf was introduced so you do't have to edit code each time you add a prefix [17:45:26] but the old code remained [17:45:48] ldf server has its own set of configs, so you need to add it there too [17:46:08] ok, so i can safely remove my own listener, and use yours. thx [17:47:10] SMalyshev, what about constants, e.g. "osmmeta:has" -- it's not an integer - why should i use unsigned integer handler for it? [17:47:29] you should not [17:47:31] i understand why integer for "osmnode:123" [17:48:00] sjoerddebruin: Hard one, you should probably discus it on the talk page [17:48:47] Ugh, I hate to explain this. :( [17:49:04] Not sure how repeating my stuff would make sense. [17:51:00] SMalyshev, i mean - i simply supply the full URLs -- https://github.com/wikimedia/wikidata-query-rdf/compare/master...nyurik:master#diff-b398e6525fd7d772b38b15086561432cR34 [17:53:04] i guess i'm not using the unsigned for it, but my tests do check for unsigned -- https://github.com/wikimedia/wikidata-query-rdf/compare/master...nyurik:master#diff-57f791b8ae043560c54f233b20767f5eR41 [17:53:31] "VocabURIByteIV" [17:55:38] SMalyshev, lastly - i think i do need my own listener -- unit tests fail without the addDecl registration [18:23:02] yurik: hmmm I'm not sure tests load prefix.cong [18:23:05] .conf [18:23:13] tests are kinda hacky setup.... [18:23:14] SMalyshev, yep, i'm working on fixing that [18:23:48] their bootstrap process is completely different from what happens in real app [18:23:49] wikibase loads declarations in a static {} block, before my tests, but i think i can still call it after it [18:24:08] what's a good way to get a path to the prefixes file in a unit test? [18:24:56] hmm... that depends on where the file is... it'd be either some kind of resource loading via classpath or maybe variable supplied by pom? [18:25:21] again, the env in tests is pre-deployment so the file may not even exist yet for all we know... [18:25:34] or be in different module [18:27:55] oh well, might as well hardcode the addDecl callsthen. thx [18:39:12] SMalyshev, unrelated: i need to store osm relation members -- each "relation" object can link to any number of other objects (nodes,ways,relations). Each link to another object also has an optional text label. I was storing it as "relation osmm:has:* targetobject ." statement, but this makes it very slow to query "show me all objects that are part of a query" - because i have to do "startswith" filter. i'm thinking of storing it instead as two [18:39:13] statemnets: relation osmm:has target and relation target "label". What do you think? [18:39:51] in other words to use target as a predicate. [18:39:57] yurik: can you give an example? I'm not sure I understand yet [18:45:34] SMalyshev, e.g. a lake could be described as an outline (way geometry object #1), and two islands (two way objects #2 and #3). So a lake relation object would have a few tags like name, and also a list of objects that make up this relation (relations cannot have geometries, only ways and nodes(points) can). So relation would contain #1 labeled as "outer", and #2, #3 labeled "inner" [18:46:51] in short, a relation as { tags: [], members: [] }. Each member is itself an osm object. [18:48:11] but the member also has a label, so each member in that list is actually an object { label: ..., osmobject: ref } [18:49:24] ok now I am confused :) can I see the rdf maybe? [18:52:46] SMalyshev, https://wiki.openstreetmap.org/wiki/Relation:multipolygon#Examples [18:53:21] the "id" field is an ID of an OSM object - either relation, way, or a node. [18:53:44] the "role" is the label [18:54:07] relation is simply a container that combines multiple other objects [18:56:25] hmm if you need shape objects maybe geosparql or something? [18:56:48] or it's not shapes? I'm still not clear what we're dealing with here :) [18:57:00] SMalyshev, no no, don't worry about the geometry part of it - i simply need to record what relations contain which objects [18:57:08] and under what roles [18:57:47] ok, so you've got relation and two roles... sounds fine so far [18:58:12] relation "lake" contains two islands, which means it would have 3 members: 1 with "outer" role, and 2&3 with "inner" roles. [18:58:20] problem is, the role field is a free text [18:59:05] so if subject is the relation ID, and object is the ID of the member, i used "osmm:has:*" for the predicate [18:59:36] where * is anything [18:59:47] such as inner, outer, ... [18:59:58] ahh... well, in that case you have to either use containers or materialize the role... or use non-standard tricks like this: https://wiki.blazegraph.com/wiki/index.php/Reification_Done_Right [19:00:19] so: ?lake osmm:has:outer ?outerBorder; osmm:has:inner ?island1, ?island2. ? [19:00:26] generally it's a bit of a problem in triple world when you have properties on relations [19:00:34] you'd have to reify relation then [19:00:45] WikidataFacts, correct. [19:00:52] I mean role in your case [19:01:25] or generate predicates from freetext [19:01:29] but if you ever want to say: "give me all members of a relation", you have to do "?lake ?pred ?obj. FILTER(startswith(?pred, "osmm:has") [19:01:30] with all that involves [19:02:21] so what i wonder is: what if i store it as two statements: ?lake osmm:has ?id . ?lake ?id "label" [19:02:28] yurik: you can do ?lake osmm:has:outer ?x; osmm:related ?x [19:02:40] exactly! [19:02:47] oh, oops, no [19:02:50] have a generic predicate that contains all relations... duplicates it but makes faster queries [19:03:25] yurik: using the same thing as predicate and object is generically frowned upon [19:03:32] why so? [19:03:56] in fact, in some systems I don't think it even can be so (it is treated as two different things having the same name) [19:04:23] you mean i won't be able to query it with the same variable? [19:04:25] OWL likes to distinguish between what is a predicate and what is an object I think [19:04:48] yurik: no, you can query it, the engine doesn't care. some people would wince though :) [19:04:52] SMalyshev: I hope describing our ontology with wd:P31 wikibase:directClaim wdt:P31 is an exception then :) [19:05:32] WikidataFacts: I think it's the case of punning. https://www.w3.org/2007/OWL/wiki/Punning [19:05:37] thing is, using the "osmm:has:inner" style is not very good, because sometimes users type in crazy role strings, making it impossible to use in the "osmm:has:blah blah" form [19:05:49] whereas storing it as labels seems fine [19:06:12] * as objects [19:07:15] yurik: so, the problem is, by Linked Data standards, predicates have semantics. so when you use random URI as predicate, that doesn't have semantics [19:07:51] querying will work, but it's not nice for systems that may use OWL, etc. Not sure how much it is a concern for what you're doing [19:08:10] hm, oki, seems like its not an issue for this case. Thanks!!! [19:57:24] hey! I got lost :( [19:58:28] https://tinyurl.com/y8pjrxxw [19:58:34] that's query made by my friend for me [19:59:11] "?country wdt:P30 ?continent" etc. - where can I get wdt number? [19:59:42] I'm trying to read the manual, but I'm too tired and I never understood even basics of these queries [20:00:00] I know MySQL rather well, tho, so I'd be grateful if somebody could explain it to me [20:00:22] the number is the ID of the property, e.g. https://www.wikidata.org/wiki/Property:P30 [20:00:25] I just want database of countries and I thought I could get some more that linked things [20:00:46] if you look at wikidata item and you see the property you like, the URL for that property would contain the number [20:01:10] SMalyshev, but assuming I'm entering https://www.wikidata.org/wiki/Q36 page, is there a way to get number from it? [20:01:12] https://www.wikidata.org/wiki/Special:AllPages/Property: [20:01:15] or only by api? [20:01:31] oh [20:01:40] thanks, SMalyshev and Lockal_! [20:01:52] Q36 is the ID of the item, if you look at statements, properties are on the left column, each has link which contains its ID [20:02:39] got it, thanks :D [20:02:42] by the way, do you think that AI would be OK for wikidata? I'm asking about private opinion of experienced people. The thing is that "main name" of plants and animals sometimes is English and sometimes Latin. It's a real mess. [20:03:02] and I've seen on YT guy who have nice language recognition tool using neural networks [20:03:17] it would add a few mistakes, but still it would be much better than current situation [20:03:40] so, do you think that such sacrifice of a few things could be possible at all, or it's not worth official discussion? [20:04:01] I'm using Wikidata for my photo gallery and it's sometimes very hard to get anything :/ [20:04:15] I'm getting English name, Latin name and aliases [20:04:56] http://up.krzysiu.net/3ECJ10XdFl/golab.png - that's the example (input could be id, en.wiki name or pl.wiki name; I used the last) [20:05:34] Labels are not obliged to contain latin names. We have https://www.wikidata.org/wiki/Property:P225 for this [20:05:39] all fields, except marked blue, are automatic [20:06:08] Lockal_, the problem is there's main name and sometimes it's English, sometimes Latin and I think it would be better if there would be one way [20:06:32] because now I have to compare main name to Latin name - if it's the same, then main is Latin, if not it's English. But it's not a bulletproof way. [20:06:57] in fact it's much more complicated on the backend. I'm willing to release it on open license, but for now it's just don't work. [20:07:25] I spent 15 minutes giving random and well known plants, animals to show my friend that example above [20:07:57] Why are you not using P225 only, if you want to display latin names only? [20:08:01] still it's easier to me to fix wikidata than make description by hand (because I usually descibe same species), so wikidata have use of my describing :P [20:08:18] Lockal_, let me check my backend [20:08:43] also I don't want latin only - I need English, Latin and aliases [20:09:31] and when main is English, let's say foo and aliases are bar and far, then English names are foo, bar, far [20:09:48] but if it's latin, then first alias would be probably main English name [20:10:05] I need a bunch of ifthenelses to get it working [20:10:11] Many taxons don't have any english names at all. Same goes for any language.See also https://www.wikidata.org/wiki/Property:P1843 [20:11:37] Lockal_, almost every has [20:11:47] in English it's not like in other langs [20:12:08] mostly excint species have English name, but I'm photographing living ones only [20:12:47] https://www.wikidata.org/wiki/Q26158 [20:12:58] here it's example where English name is Latin [20:13:18] and English aliases are both the same with one of them being real English name [20:13:42] and there's no Latin name at all [20:13:46] as a property [20:13:50] only English which is Latin [20:13:55] and that's very common problem [20:14:12] and that's why I want to fix it [20:18:52] in fact that Latin name is also English and for us it's clear, but for automatic getting data - nope [20:19:35] if I'd try to get Latin name from 26185, I'd get nothing [20:22:22] http://txt.krzysiu.net/view/c51b4e92 - 128 to 151 [20:22:37] that's my only idea so far for current situation [22:11:47] Hello all…with wikidata query, is there any way to get the the query results to sort as numeric instead of alphabetic? [22:59:22] good evening. I'm using query and since items have more than one occupation I get the as many duplicates as item's occupations. Is there anyway to combine the occupations of an item in the output of the query? Would be thankful for any useful infos [23:00:52] i searched in the existing wikidata examples but without success [23:05:29] guest24: search for GROUP BY and GROUP_CONCAT [23:06:31] thanks wikidataFacts, I will do now