[04:49:00] hello everyone [04:51:30] https://www.wikidata.org/wiki/Q28020787 says it is a Wikimedia disambiguation page, while the Wikipedia articles and not disambiguations. Is it possible to fix it? [04:51:43] !admin [04:53:00] *are not [09:20:08] hrm. I'd appreciate if the Web interface interprets "today" as today's point in time. That'd make it easier for me to reference Web sites correctly. [09:22:02] muelli: Enable the currentDate gadget [09:22:09] On https://www.wikidata.org/wiki/Special:Preferences#mw-prefsection-gadgets [09:23:45] multichill: that seems to only apply to "retrieved" [09:24:06] I also have that enabled. [09:24:16] Where else would you want to have today? [09:26:30] multichill: imported from - germany wikipedia. Then another qualifier point in time. [09:26:52] muelli: Last time I checked you were a human, not a bot [09:26:58] You shouldn't be using imported from ;-) [09:27:49] hrm. [09:28:19] See https://www.wikidata.org/wiki/Help:Sources#Different_types_of_sources [09:28:36] true. "Useful for bots but not for wikidata: if you can't use "stated in" (P248) (English)". okay.. [09:28:44] Imported from is just to keep track of what Wikipedia bots grabbed it from so it can be replaced by a real sources [09:29:09] but.. I mean. I'm not checking anything there. Just blindly copy and pasting. So it's closer to being a bot than to being a human with a brain ;-) [09:30:42] Simple imported from get replaced by bots, see for example https://www.wikidata.org/w/index.php?title=Q18508034&type=revision&diff=561878267&oldid=561569383 [10:47:34] https://www.wikidata.org/wiki/Wikidata:Request_a_query#Combine_two_queries_to_match_paintings_with_painters <- could use a hand with this query puzzle :-) [10:48:57] multichill: two joined subqueries maybe? [10:53:56] multichill: http://tinyurl.com/yd4mcxn5 [10:55:09] No matching records found ? [10:56:25] hm [10:56:41] The limit is causing that [10:57:00] probably [10:57:06] I've got one more idea though [10:59:20] Aargh [10:59:23] I see a mistake [11:00:19] there are too many painters imo [11:00:28] Yeah, that was the mistake :-) [11:00:51] so? [11:01:24] http://tinyurl.com/y8gzfyto is the painter query that does complete [11:01:53] 20744 Results in 9996 ms, only painters that have a work in a Dutch collection [11:02:08] And only English labels/aliases [11:02:24] I see [11:04:24] It almosts completes, it gets killed with already some decent output [11:07:12] matej_suchanek: I think I got it to work now :-) [11:07:21] did you? [11:07:24] nice [11:10:33] multichill: the subquery for painters still times out for me even when run itself... [11:11:43] matej_suchanek: http://tinyurl.com/y95jc5ch completes, put the query on the query page [11:14:58] And now on https://www.wikidata.org/wiki/User:Multichill/Kladblok :-) [11:15:12] good! [11:16:39] Thanks for your help [11:18:12] ;) [11:22:06] multichill: btw have you seen https://www.wikidata.org/wiki/User:Mat%C4%9Bj_Such%C3%A1nek/Same_labels? that's what *I* thank you for [11:23:21] Oh, nice! [12:13:26] apparently, we've got a spammer on mediawiki.org, anyone here who can stop them? [12:14:16] I can [15:04:21] Amir1: Do you know what is going on here https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%98%D9%87:%D8%B3%DB%8C%D8%A7%D9%87%D9%87%E2%80%8C%D9%87%D8%A7/block&page=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1%3AErfgoedBot ? [15:59:01] multichill: that's fantastic [15:59:56] abuse filter can block users in fawiki, we have a troll who made life of all wikipedians a living hell so the abuse is too general and has collateral damage too much [16:01:40] So misconfigured abuse filter disabled that bot? [16:36:42] matej_suchanek: What triggers https://www.wikidata.org/w/index.php?title=Q11872962&type=revision&diff=468096812&oldid=466132073 ? [16:39:59] see the edit summary [16:40:01] https://www.wikidata.org/w/index.php?diff=325742024 [16:40:34] The link was on the wrong painting so the user moved it [16:40:41] Why does that trigger a merge? [16:40:57] probably because no sitelinks left [16:41:09] That would be stupid [16:41:19] not in 99% [16:41:31] How many paintings did you merge here? [16:41:52] paintings? I didn't focus them [16:42:42] Can you check how many paintings were (incorrectly?) merged by the bot? [16:43:05] Basically items that are a redirect to a painting and last edit by your bot [16:43:05] probably... [17:37:17] multichill: I found 18 [17:37:32] http://tinyurl.com/ydalzmyq [17:43:20] I'm wondering how to create a SPARQL query for generating a similar table to https://de.wikipedia.org/wiki/Liste_der_Hochschulen_in_Deutschland. I'm having a few issues. One is to determine the state. I can make use of P131 but then I don't know how to sort of switch-case over the result in order to get, say "HH", "BY", etc. Any hints? Another issue is that the budget column is not being rendered: http://tinyurl.com/ya3jzubk [17:43:59] "budget" or "bugdet"? :D [17:47:20] ah. heh [17:47:22] -.- [17:50:01] muelli: anyway, "BY" seems to be a short form of Bayern [17:50:15] matej_suchanek: yeah. It is. [17:50:43] short form is [[Property:P1813]] but I can't see it in [[Q980]] [17:50:44] 10[1] 10https://www.wikidata.org/wiki/Property:P181313 => [17:50:47] 10[2] 10https://www.wikidata.org/wiki/Q980 [17:51:38] it's the second part of the ISO code (P300) [17:51:43] so it's doable [17:51:56] well. I'd be happy with "Hamburg" or "Bayern" FWIW. I'm having a knot in my brain reg. the syntax to use. [17:52:10] ok [17:53:18] generally, it can be ?university wdt:P131 ?above [17:54:46] http://tinyurl.com/y8of6t7m [17:55:20] oh. wow. that was quick [17:55:28] not sure how to get rid of the leading "DE-" though [17:55:43] but yeah if you're happy with the land item it's easier [17:55:44] STRAFTER( ?iso, 'DE-' ) [17:56:01] matej_suchanek: thanks! [17:56:07] funny hack. [17:56:23] yeah "hack", that's the word... [17:56:55] perhaps safer would be REPLACE( ?iso, '^DE-', '' )... [17:59:12] is there a syntax for getting the "lower bound" of the class of an instance? Like.. right now P131 gives the most specific answer. Of course. But I'm not interested in that level of detail. So is there a syntax for getting the values of these transitive properties only up to a certain level? Like "casting" the result... Just being curious here. pintoch's answer looks very good to me for now. [18:00:15] muelli: what you can do is force the target item (here the land) to be instance of something specific, like a municipality instead of a land [18:00:50] in theory that should let you pick the right administrative level [18:03:27] matej_suchanek: I assume you did a SQL query to find that? Thanks! I'll check them [18:16:22] multichill: no [18:16:54] I tried but I couldn't get results in a short time [18:17:38] so I used SPARQL to query for redirects to paintings that were last modified between January and May [18:18:26] then let my bot iterate over them and print those last modified by my bot [18:18:56] anyway, I'm glad I'm not a dead man now :) [19:30:41] pintoch: yeah, that's what you're doing in that query, right? Is that how one does that kind of thing? [19:31:02] muelli: that's the only solution I can think of [19:31:09] cool. thanks. [19:37:36] is pywikibot my best bet for not updating Wikidata through the Web interface? I've parsed, e.g. the membership list of the DFN and I'd like to update all wikidata entries of the relevant universities to reflect that they are a member of the DFN. [19:40:11] pywikibot has many issues IMHO. there's https://github.com/SuLab/WikidataIntegrator which is a bit nicer in some regards, but still has some rough edges [19:41:11] the quick & dirty (and probably easiest) way is just to generate your edits in the QuickStatements format https://tools.wmflabs.org/wikidata-todo/quick_statements.php? [19:43:42] ah. interesting. [19:46:07] I think the Location and Quantity statements could use some examples. [19:51:09] muelli: You can easily do it in Pywikibot, if you know Python [19:51:33] I do :) [19:53:10] What do you want to do exactly? I'm not sure what DFN stands for [19:54:41] I generally just make a generator that returns all data on a per record basis and have the bot update items if needed [19:56:44] muelli: ^ [19:56:50] multichill: there is a list of universities here: https://www.dfn.de/verein/mv/mitglieder/ I think I want to update wikidata to reflect that. [19:57:39] How do you plan to match these with Wikidata items? [19:58:55] multichill: I don't know yet. Do you have suggestions? [19:59:25] I can imagine matching the hostnames of the URLs to the list I have generated from Wikidata already. [19:59:32] muelli: I propose OpenRefine for that :) [19:59:41] Yeah, would use that too [20:00:00] Bots are very good and fast at matching with lookup tables etc, but you need id's for that [20:00:10] https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation [20:00:12] This case you might be able to use official website to match most [20:01:02] pintoch: ah, it has already entered my bubble. I haven't followed up on that yet though. Is that an "end to end" software? Like does it load that HTML and parse out the stuff I might be interested in? And does it then look in Wikidata for what items the data could represent? [20:01:36] muelli: it is not very good at parsing HTML [20:02:01] it can handle fairly well clean XML, JSON, and any mainstream tabular format [20:02:04] however [20:02:31] if you just copy-paste the list you have from your browser to OpenRefine, it should just work [20:02:55] you will lose the URLs though :-/ [20:03:59] that being said, matching with URLs in OpenRefine is a bit dangerous [20:04:23] I haven't found a satisfactory scoring method for URLs in the reconciliation backend [20:05:18] (the problem is that the URLs of two organizations can be quite close, for instance parent / children organizations) [20:07:00] yeah, given the list you have I wouldn't bother including the URLs, they will not add much [20:07:57] just make sure you use the german version of the reconciliation interface for your matching (https://tools.wmflabs.org/openrefine-wikidata/de/api) and you should have decent results [20:11:12] hi, I've been doing some reading up on Entity linking and all of the existing implementations are restricted to Wikipedia. Is there an entity linker available for wikidata? [20:12:39] ah pintoch. cool. let me play around. I've started "reconciliation" already. Let's see what it yields. [20:12:40] not really, but here are many which output Wikipedia links, which can be converted to Wikidata identifiers easily [20:12:46] newbie123: ^ [20:12:53] newbie123: here is a list: https://meta.wikimedia.org/wiki/User_talk:Hjfocs#Wikidata_Entity_Linking [20:13:14] muelli: cool, I'm curious to see what you find! [20:14:58] (newbie123: sorry that I read your question too quickly! looks like we are facing the same problem ^^) [20:15:23] pintoch: But that is still entity linking on wikipedia (just that the returned output is the corresponding wikidata_id instead of the wikipedia link) [20:15:26] pintoch: it seems to have matched well with the default API endpoint. Wow. [20:15:32] same problem? [20:16:07] newbie123: I agree with you that having to go via Wikipedia is a problem [20:16:21] (for instance, you can't match any item that does not have sitelinks…) [20:16:29] pintoch: Yes! exactly [20:16:40] Thus restricting the linker to wikipedia only entites [20:17:06] pintoch: Is someone from wikidata working on this problem? [20:17:13] not that I am aware of [20:17:46] it's one of the things that I would love to do if I had time (and I don't understand why nobody has done it before) [20:18:17] I'm a researcher and am interested in this area. Building an entity linker for wikidata is a challenge (mainly because of lack of rich textual information and manual annotated hyperlinks that act as training data) [20:19:05] pintoch: I asked the same question when I started out (Why hasn't anyone done it before?) Turns out they had good reason not to, since it is very challenging [20:19:29] newbie123: still, I expect you would get a decent result with off-the-shelf models [20:20:02] like, take a Standbol entity hub [20:20:13] *stanbol [20:21:11] Most of the techniques I have come across make use of the hyperlinks data (to compute prior probabilities) and the descriptive text for comparison (and these two are very important features in most methods I've seen) [20:21:20] Wikidata lacks this information ^^ [20:21:24] ^* [20:22:12] well it does have 1/ labels and aliases 2/ links between items [20:22:28] and AFAICT that's all you need in most heuristics [20:23:14] wikidata has alias information? [20:23:39] yes! https://www.wikidata.org/wiki/Q34433 [20:24:09] on the right-hand side of the descriptions you will see various alternative surface forms ("aliases") for the item, in many languages [20:25:17] these aliases are mostly manually curated [20:26:59] newbie123: if you want to work on this I am happy to help, I have started experimenting with a re-ranking heuristic for mentions [20:27:12] it's not at all what my PhD is supposed to be about but oh well… [20:29:27] pintoch: I'll get back to you in 5. Discussing something. Sorry [20:30:58] sure! [20:35:40] pintoch: I now have a list of more or less properly reconciled data (OpenRefine seems to be cool!). Do you happen to know how I would now update Wikidata? Is it even possible? https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine doesn't mention anything. [20:36:41] muelli: the latest stable version does not have any feature for that, but I have been working on that [20:36:55] pintoch: can you explain this line "well it does have 1/ labels and aliases 2/ links between items " ? [20:37:39] muelli: if you want to try the current dev version (which is quite unstable) you can try this branch: https://github.com/OpenRefine/OpenRefine/tree/wikidata-extension [20:38:20] pintoch: do I have other options? -.- [20:39:20] muelli: you can also export the Qids you have got via reconciliation and generate quickstatements yourself [20:39:42] (see the recipe at https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine ) [20:40:13] newbie123: what I mean is that each item has (potentially) multiple surface forms [20:40:42] like "University of Oxford", "Oxford University", "Université d'Oxford" for Q34433 [20:40:45] pintoch: No, I get that. Each entity has a list of aliases [20:41:02] By links between items did you mean the edges in the wikidata graph? [20:41:10] yes :) [20:41:41] and the good news is that it's semantic information, so it's much cleaner than hyperlinks between articles [20:41:54] it's much more relevant (but more sparse) [20:42:07] pintoch: ah, good idea. Yeah, maybe I do that. Like get the Qids, load in LibreOffice, copy and paste loads of these "member of" statements, and run through the quick statements web interface. [20:42:58] pintoch: Yes, it is semantically rich but I guess no one has found out how to replace textual comparison (as done in most entity linkers for wikipedia) with methods that exploit the graph structure (relations between nodes) basically [20:43:18] I got this message "Your actions in #wikidata tripped automated anti-spam measures, but were ignored based on your time in channel; stop now, or automated action will still be taken. If you have any questions, please don't hesitate to contact a member of staff " [20:43:21] am I in trouble [20:43:35] ? [20:44:15] muelli: yeah, that should work. Just make sure you add references to the statements (something like "S854 "https://www.dfn.de/verein/mv/mitglieder/" S813 +2017-09-23T00:00:00Z/11") [20:44:41] yeah, that would have been my next pain point. But I'm glad that you've given me something to work with :) [20:44:54] 813 is "retrieved" I supposed [20:45:04] s/supposed/suppose/ [20:45:09] yes [20:45:41] newbie123: yeah I have some ideas for that :) [20:46:00] (I have no idea for the IRC warning) [20:47:06] Will you available on the channel always? or should we exchange email ids? I would like to discuss this problem [20:48:40] yeah let's discuss that further outside and not swamp this channel with NLP ^^ [20:52:50] hm. funny. OpenReconcile successfully matched an item but getting the Qid doesn't seem to work. The cell is empty for certain items, e.g. Q19835368. It works for others though. [20:53:16] hmm, that's interesting [20:55:17] https://muelli.cryptobitch.de/tmp//2017-09-23-Screenshot_from_2017-09-23_22-54-02.png you can see that I hovered over item "2". It shows the Wikidata link in the bottom left. But no Qid in the second column. [21:04:26] muelli: did you match this one after creating the column with the qids? [21:04:40] if so, it is normal that the column is not updated [21:04:54] (OpenRefine does not work like usual spreadsheet software in that regard) [21:07:18] pintoch: very possible. Do I have to remove and re-add that column with the expression? [21:10:03] yes, you should finish the matching on the first column before creating the Qids column [21:10:30] (by finishing I mean reaching the level of completeness you aim for, right ^^) [21:13:30] hrhr [21:21:05] hm. I've tried the following quick statement: Q4115189 P2196 123345 S143 Q48183 but it doesn't give me a student count imported from German Wikipedia. What am I doing wrong? [21:28:49] muelli: it's probably the quantity [21:29:34] ah no it seems correct according to the manual [21:29:44] let me try it [21:30:46] muelli: it seems to have worked? https://www.wikidata.org/w/index.php?title=Q4115189&diff=564568802&oldid=564568505 [21:30:50] muelli, are you in QuickStatements 1 or 2? [21:31:17] 1: https://tools.wmflabs.org/wikidata-todo/quick_statements.php 2: https://tools.wmflabs.org/quickstatements [21:31:21] pintoch: yeah, the statement worked, but not fully, i.e. it doesn't have the "imported from German Wikipedia". [21:31:43] Jhs: 1. [21:31:57] try with 2, i think there might be some differences there in source handling [21:32:01] the syntax is exactly the same [21:33:13] hm Jhs. With that one it's even less clear how to use it :-/ [21:33:22] where do I paste my statements? [21:33:48] Import commands at the very top, then Version 1 format [21:34:13] (also, you will need to remove the existing statement on the sandbox before, otherwise your QuickStatements line will be ignored) [21:35:27] ah, better :) [21:35:29] cool. [21:35:55] any reason for version 1 to still exist? ;) [21:37:00] just because it's the only place where you can find information about the file format :-P [21:37:23] yeah, someone� should consolidate, I guess [21:38:17] this doesn't have the desired effect though: Q4115189 P2196 123 S585 +2017-09-23T00:00:00Z/9 S143 Q48183 :-/ Specifically, it makes the point in time appear, but not the imported from. Anything I'm doing wrong? [21:40:21] I am not sure I ever managed to use QS for multi-statement references actually, maybe this syntax I suggested you came from one of my dreams [21:41:11] although you seem to have two tabs between the date and the S143 [21:41:20] maybe that's what confusing it [21:43:59] pintoch: yeah, that looks better :) Although not entirely good, I think. The result now looks different from what I've seen before. The "point in time" is now shown under the "reference" rather than the number "123". [21:45:15] ah you should use P585 rather than S585 if you intend to create a qualifier [21:45:36] (it's great that you are adding one btw!) [21:47:29] ah. ouf. okay. So it's a property of the property, kind of? [21:47:36] * muelli still needs to wrap his around all this [21:47:49] yeah the wikibase datamodel is rich [21:48:05] a qualifier is indeed a property of a statement [21:48:30] it makes the scope of the statement more precise, or adds some details to it [21:50:12] how would I then query for that? Can I make SPARQL quintuples then? [21:50:16] rather than triples [21:50:31] (again, curiosity, no real need, I think) [21:54:32] This is the best for queries right? https://www.wikidata.org/wiki/Q38066676#P31 [21:54:44] muelli, i've never tried it myself, but there seems to be a pq: prefix for checking qualifiers of a statement [21:55:01] search this page for "pq:" and you'll see some examples: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Missing_labels_for_a_target_language [21:56:37] cool [21:59:25] speaking of queries, I could need some advice myself. I need a query to find all surnames (P31:Q101352) *without* a label in language "smj". i've tried this: http://tinyurl.com/ybkh9bsf but it always times out. any suggestions? [22:00:41] i need it for PetScan in conjunction with an arbitrary manual list of items, so what i really need is which of those ~3500 items don't have a label in that language [22:03:53] AAAAH never mind, i just used the wrong "Combination" setting in PetScan, where the query works [22:04:04] i used "manual NOT sparql" where I should have used "manual AND sparql"