[15:30:04] Hi everyone! I have some questions about the Wikidata dumps. I'm importing the JSON dumps into a database, but the dumps do not have redirect information. Because of this, some relationships point to non-existent entities, which is causing data consistency issues. I'm ignoring the missing entities for now, but I'd like to find a longer term solution that takes redirects into account. [15:30:04] I found Phabricator tasks mentioning that the JSON dumps don't include redirect info, but there is not any recent activity on them. [15:30:04] To work around the issue, I tried parsing the "owl:sameAs" predicates out of the RDF files and creating an entity for each subject. That created some duplicate entities, which leads me to believe that some of the subject entities in the list of redirects exist in the JSON dump and some do not. This makes sense if the dump files are not snapshots in time, or if they are but the JSON and RDF [15:30:04] dumps are run at different times. [15:30:04] To try to understand this better, can someone answer the following for me: [15:30:05] (1) Within a single dump file (JSON and/or RDF), is every entity that appears in a relationship guaranteed to be defined in the file? In other words, does the dump file represent a self-consistent snapshot of the database at a moment in time? [15:30:05] (2) If the above is true, then is it also true across the JSON and RDF files in a given dump directory? [15:30:06] (3) I looked at various API calls to access JSON data on entities, and I have seen that I can get redirect data via the API (e.g. https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q390537) but not via the persistent URI (e.g. https://www.wikidata.org/wiki/Special:EntityData/Q390537.json). Is the latter what is used for the JSON dump files? [15:30:06] (4) My understanding is that a redirect is created when duplicate entities are discovered in the database, at which point the duplicate entities are redirected to one canonical entity. This is done so that links that point to the duplicate entities will automatically resolve to the canonical entity without having to update the links (e.g. in relationships). Is this understanding correct? [15:30:07] (5) If the above is true, is it also true that, once an entity is changed to a redirect entity, it will never be changed back? [15:30:08] (6) Should I be using the RDF dump instead of the JSON dump? I'd prefer JSON because of the availability of standard parsing tools, and because the one-entity-per-line format makes parsing much easier. [15:30:08] If there is a more appropriate place for me to ask these questions, or if you need more information from me, please let me know. Thanks in advance for the help! [15:38:00] Is there a way to know if a group of Mexican people do not have data on Wikidata? Or it can only be known in general? (for Spanish Wikipedia) [17:22:56] hispano76> "group of Mexican people"? any example? [17:38:01] Alaa|away: persons of Mexican nationality who do not have elements in Wikidata [17:38:28] and there article exist on eswiki? [17:42:32] yes [17:44:37] I'm sorry if it's confusing [17:44:48] hispano76> I'll make a try, give me 5 min ,okay? [17:47:31] ok no problem :) [17:54:21] hispano76> try this https://petscan.wmflabs.org/?psid=14666177 [17:54:59] you can change "Depth". Put remember on Wikidata tab to choose (Has no statements) [17:59:00] thanks Alaa|away :) [17:59:39] Hispano76 wlc ^^ [18:04:41] Hi, is anyone on who's familiar with the format of Wikidata dump files?