[03:45:19] i am loading my file ./loadData.sh -n wdq -d /download/wiki/service-0.3.0/data/split -s , i am at wikidump-000000290.ttl.gz of wikidump-000000507.ttl.gz and the curent wikidata.jnl is 233G, what is the expected final size :( [08:12:26] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1980 bytes in 0.078 second response time [08:17:26] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1953 bytes in 0.093 second response time [10:03:25] does anyone know what the current status / policy / whatever of English verb forms is? is it still undecided, as discussed in https://www.wikidata.org/wiki/Wikidata_talk:Lexicographical_data#English_verbs ? [10:06:53] and follow-up questions, are there other languages where there are already several verbs with a full set of forms? [13:52:13] ಠ_ಠ apparently troubleshooting is enough to dissconnect the dme wireless :/ [14:21:33] WikidataFacts: I imagine very little has been decided at such an early stage :) personally I would use separate forms (I would interpret the grammatical features to *all* apply simultaneously, otherwise it just gets unnecessarily complicated working out which are "and"s and which are "or"s) [14:25:33] nikki: +1 [14:26:13] +1 from me as well, but I’m not sure if it’s a good idea to encode that opinion into my tool while there’s no consensus yet :) [14:26:21] (context: https://twitter.com/LucasWerkmeistr/status/1006863422569492481 ) [14:26:22] what's your tool? [14:26:39] (or https://www.wikidata.org/wiki/Wikidata_talk:Lexicographical_data#New_tool:_Wikidata_Lexeme_Forms ) [14:27:15] heh, "habe" would be a good example... [14:28:21] ich habe, du hast, du hast mich, … ;) [14:28:33] :P [14:28:34] 1st person singular indikativ, 1st person singular konjunktiv I, 3rd person singular konjunktiv I, 2nd person singular imperativ [14:28:37] heh [14:29:17] btw, there should be two lexemese for the german word "haben". One for the auxilliary verb, and one for the proper verb [14:29:25] same for "to have" in English [14:34:29] WikidataFacts: it would probably be easy enough to merge the forms later with a bot if people insist on it... splitting would be harder [14:35:07] very good point [14:35:33] pssst.. it's "du hast mich gefragt und ich habe nichts gesagt" [14:36:00] I see that VIGNERON has started to map out French verb forms, so perhaps that will be the next language the tool supports [14:40:31] it would be nice if the tool could offer suggestions, since often the forms are predictable (e.g. iirc nominative and accusative are always the same except for a small group of masculine nouns that seem to exist to trip me up) [14:42:45] Hi! I detect vandalism on https://www.wikidata.org/wiki/Q115119 and https://www.wikidata.org/wiki/Q6035335 [14:43:47] Technical Advice IRC meeting starting in 15 minutes in channel #wikimedia-tech, hosts: @chiborg & @amir1 - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [14:47:25] Hey, JLInfante :) [14:47:54] Hi! [14:48:56] JLInfante: You can use !admin when you want an admin to block vandals or protect an item [14:49:07] !admin :) [14:49:07] Attention requested  HakanIST sjoerddebruin revi [14:49:40] let me see... [14:50:48] JLInfante: it has been quite heavily-edited within last 24 hours, which edits (you think) are problematic, and why? [14:51:16] I mean... this one https://www.wikidata.org/wiki/Q115119 [14:52:08] revi: J101vergaj's edit [14:52:53] seems to be reverted [14:53:25] but yeah things seems to be messed up [14:53:27] yes, I revert the vandalism [14:54:04] Edits made by 46.37.82.246, 95.60.28.140, 31.4.214.222, 2.155.136.193 and 176.83.107.138 are vandalism too [14:54:29] I guess the item should be semiprotected for some hours :/ [14:54:38] it seems to be something happened to his real life [14:54:44] that attracts anons [14:54:55] Right, he's trending topic in Spain today [14:55:12] semiprotected for 36 hours [14:55:24] teah that explains [14:55:37] and I guess https://www.wikidata.org/wiki/Q6035335 is attracting anons as well [14:55:47] Q115119 vandalism from IPs: 78.30.22.138, 88.12.79.94, 85.192.92.97, 46.37.82.246, 95.60.28.140, 31.4.214.222, 2.155.136.193, 176.83.107.138 [14:55:49] Another trending topic xD [14:56:13] Q6035335 vandalism from IP 81.37.184.5 [14:56:24] both semi’ed for 36 hours [14:56:40] Thanks, revi :) [14:56:58] if those two people are trending after 36 hours and people are still interested in their wikidata pages, [[WD:AN]] is also always available [14:56:59] 10[1] 1010https://www.wikidata.org/wiki/Wikidata:Administrators%27_noticeboard - Redirección desde 10https://www.wikidata.org/wiki/WD:AN?redirect=no [14:57:18] thank you, revi! [14:57:30] no problem [15:01:20] bye!! [15:05:59] revi, BTW, are you familiar with the Translate extension? [15:07:09] is AsimovBot speaking Spanish? [15:07:31] abian: sure [15:07:37] WikidataFacts: Sadly [15:07:41] AsimovBot is developed by Spanish dev [15:07:43] so... [15:07:44] -sugus Learn English [15:07:44] 04Error: No se ha indicado un usuario con sugerencias, o el texto de la sugerencia no tiene los 20 caracteres necesarios para ser considerada válida. [15:07:54] -sugus Learn English for #wikidata channel [15:07:55] Sugerencia anotada con el número 306. Para información sobre la misma, escribe -sugus 306 o deja un mensaje en 10https://meta.wikimedia.org/wiki/User_talk:-jem- [15:08:14] huh, okay [15:08:28] jem is its human operator :) [15:08:28] I thought it might use a per-project language and be confused about multilingual Wikidata ;) [15:09:26] revi: Cool :) [15:09:42] revi: Would you mark https://www.wikidata.org/wiki/Wikidata:Property_proposal/Proposal_preload for translation, please? [15:14:07] WikidataFacts: btw, did you miss my question on sunday about https://phabricator.wikimedia.org/T185895? [15:14:34] done abian [15:14:55] nikki: hm, now that you mention it, I recall seeing that ticket again recently… but I don’t remember what your question was anymore, sorry [15:15:47] ok, found it in the logs [15:15:52] oh, ok :) [15:15:55] was gonna repeat it [15:16:11] I think I answered you, but you were already offline [15:16:24] https://wm-bot.wmflabs.org/browser/index.php?start=06%2F10%2F2018&end=06%2F10%2F2018&display=%23wikidata, 14:16:57 [15:16:27] meh, stupid isp [15:16:45] revi: Thank you! :D [15:17:05] and there I was thinking you were ignoring me :P [15:18:36] WikidataFacts: The CSS improvements of the WDQS... [15:18:45] You're gonna hate me :( [15:20:12] oh no! [15:20:14] what’s wrong? [15:20:24] The pending patch :) [15:20:32] so if I understand it right, it would only fix the case where you provide a language code the software doesn't know about, and not the ones where it does know about the code, since those return a translated "Main Page" which doesn't work because there's no Wikidata: prefix? [15:22:18] abian: oh, sorry, I confused you with nikki for a second :D [15:22:20] nikki: yes [15:22:36] Heh, five characters too :) [15:22:42] our names are the same length and both contain and n and an i, I guess they're similar :P [15:22:51] and nikki also had opinions about the WDQS CSS in the past, I think [15:22:59] or the UI in general [15:23:12] I get upset when things move around, basically :P [15:23:12] so I was afraid we’d made more changes that broke your workflow :S [15:24:27] If nikki and I are twins, who's the evil one? :) [15:24:44] aww, nope, I got someone to help me put things back where they were and it's not broken again since, so I've been happy for a while [15:25:31] probably me >:D [15:27:16] My horns are bigger }:D [15:27:52] those were eyebrows, not horns. I don't have horns [15:28:00] I guess I'm not evil enough then :< [15:28:25] Bwahaha [15:48:54] I wonder whether decimal or dms versions of coordinates are considered better [15:51:42] I keep coming across things where the decimal versions are different because the site rounded to a different number of decimal places, but the dms form is the same [15:56:12] so they could be two different statements, because .3333 isn't exactly the same as .3 recurring or .33333... but at the same time it's a pointless distinction saying that one site says the coordinates are .3333 and another says .33333, especially when both sites have the same dms form and the interface converts them all to dms anyway [15:56:36] and they both got the coordinates a third site which says .333333 :P [15:59:39] I think I'm convincing myself that we should ignore decimal versions if the site provides dms, because decimal is often an approximation [16:17:49] loadData spent 3 days ! and i am no at 310G Jun 13 11:17 wikidata.jnl [16:17:57] where i m going with space ! [16:18:24] #/dev/sdb1 440G 298G 120G 72% /disk2 [16:19:15] the disak all for this job ! [16:19:56] >Processing wikidump-000000392.ttl.gz [16:20:23] >Processing wikidump-000000392.ttl.gz from 507 file ! [16:21:07] do you think the next 110 files may full my HDD ? [16:24:02] do you think the next 110 files may full my HDD ?? [16:24:11] well, if 392 files require 310G, then by extrapolation all 507 files would require about 400GB [16:24:50] how you calc that ? [16:25:01] units '310GB / 392 * 507' 'GB' [16:25:31] or units '310GB / 392 * 507' 'GB' [16:25:35] sorry [16:25:37] https://www.wolframalpha.com/input/?i=310GB+%2F+392+*+507 [16:25:49] but each wikidump has its own size [16:25:56] not the same ! [16:26:12] wait, where are those wikidumps coming from? [16:26:19] I thought those were split up from one big dump [16:26:25] and then I would expect them to have roughly the same size [16:27:01] ./loadData.sh -n wdq -d /download/wiki/service-0.3.0/data/split [16:27:20] okay, then they should all have roughly the same size, I thin [16:27:22] *think [16:27:47] no see [16:28:03] https://pastebin.com/4immaAYU [16:29:19] and BTW the total of all dump around 33 GB , but [16:29:20] -rw-r--r-- 1 root root 310G Jun 13 11:29 wikidata.jnl [16:29:35] so it is not the same at all ! [16:31:07] beshoo: I'm still confused about where you are loding the data from [16:31:36] Well i downloaded the big dump RDF file [16:31:48] then devided to chunks [16:32:09] the big dump is something like 30 gig compressed [16:32:14] Yes [16:32:17] so maybe 300 gig uncomplressed [16:32:25] yes [16:32:44] so now why i am going to 400 GB [16:33:00] well i don't know what you are actually doing :) [16:33:26] ./loadData.sh -n wdq -d /download/wiki/service-0.3.0/data/split -s [16:33:42] presumably BlazeGraph doesn’t compress the data as much [16:33:51] yea, i read that, but i do not know what exactly that does [16:33:55] which makes sense, because otherwise you’re paying CPU time for decompression on every query [16:34:02] blazegraph is a database [16:34:04] DanielK_WMDE: it’s the command to run your local copy of WDQS [16:34:07] databases generally trade space for speed [16:34:10] well, part of it [16:34:16] they have redundant indexes to allow for fast lookups [16:34:37] so yea, a database is always going to be quite a lot bigger than the raw data [16:34:59] as a rule of thumb i'd go for roughly factor x2 or so [16:35:18] WikidataFacts: yea, i guessed that, but what does it * [16:35:23] my SSD is 480 GB [16:35:24] ...what does it *do* ;) [16:35:42] beshoo: that's going to be tight [16:35:47] it lists all the files in the specified directory and instructs BlazeGraph to start loading them [16:35:59] (it doesn’t actually feed the files into BlazeGraph itself as far as I know) [16:36:03] (it just tells it the filename) [16:36:07] why are they split up first? [16:36:28] probably so that you can pause and resume the import process easier [16:36:33] DanielK_WMDE you can not feed the big file at once [16:36:36] but the splitup step also includes the “munging”, I believe [16:36:47] hmhm [16:36:50] which is the step where wdata: nodes are merged into wd: nodes, `a wikibase:Item` is dropped, etc. [16:37:04] (but I’m not quite sure if that’s part of the split or a separate step) [16:37:11] we should have a better solution for that... [16:38:12] ./munge.sh -f ./data/latest-all.ttl.gz -d ./data/split -s [16:38:17] that what i used [16:38:40] latest-all.ttl.gz around 33 GB [16:38:41] can someone quickly verify my question https://phabricator.wikimedia.org/T195615#4280121 and the status of that issue? Thanks! [16:38:59] beshoo: anyway. you general answer is: databases have indexes. indexes need room. [16:39:55] yes but the final size is going to crack 480 GB ? [16:42:00] greg-g: addshore probably knows better than I do, but this is WIP [16:42:09] Currently stuck on https://gerrit.wikimedia.org/r/438005 [16:42:28] o/ [16:43:01] greg-g, hoo: hm... the WikibaseLexeme extsion isn't enabled on clients? That will probably cause problems with clients trying to process wikidata statements that reference lexemese. because the lexeme IDs can't be parsed. [16:43:16] there is a hoo greg-g not blocked on that patch, there is another one we want to backport [16:43:18] *looks for it* [16:43:19] DanielK_WMDE: Yes :/ [16:43:56] greg-g: hoo DanielK_WMDE https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/437495/ that being backported should apparently fix it [16:44:16] addshore, hoo: I supposed the client side data access code should handle this case more gracefully. and WikibaseLexeme should probably be deployed on all client wikis. [16:44:19] the long term thing is enabeling lexeme on clients, but that wont happen for 2 weeks (because of the no deploy week next week) [16:44:37] DanielK_WMDE: yup, the graceful patch is the one above, and the deploy on clients will happen :) [16:45:05] hoo: greg-g there is an AbuseFilter on wikidata.org that tries to stop people adding references to lexemes in statements / snaks but some slip through [16:45:26] hoo: what item is linked to the test.wikipedia main page? [16:45:49] test.wikidata perhaps? or main wikidata? there is no abusefilter on the testwikidata site [16:45:56] Q33 [16:46:05] https://test.wikidata.org/wiki/Q33 [16:46:16] :P [16:46:21] https://test.wikidata.org/w/index.php?title=Q33&action=historysubmit&type=revision&diff=395685&oldid=390788 [16:46:23] that broke it [16:46:24] that's on.... ALL THE MAIN PAGES :D [16:46:25] i removed the lexeme, should be fixed [16:46:41] DanielK_WMDE: Luckily just testwikidata… this time [17:00:24] not 100% following the backscroll and just about to go back into meetings, can someone do the needful here? cc DanielK_WMDE hoo addshore :) thanks! [17:07:05] greg-g: Should be fine for now [18:21:58] :) [18:50:54] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1966 bytes in 0.068 second response time [19:01:05] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1970 bytes in 0.092 second response time [19:33:08] abian: https://www.wikidata.org/w/index.php?title=Special:Translate&group=page-Wikidata%3AProperty+proposal%2FProposal+preload&action=page&filter=translated&language=es echa un ojo y si tienes sugerencias, adelante :) [19:49:05] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1967 bytes in 0.195 second response time [19:49:13] hi, where can I report vandalism? [19:52:18] frickel: Vandalism? Where? [19:52:29] https://www.wikidata.org/wiki/Special:Contributions/2001:464E:A0A7:0:4C8A:37E7:6B14:D9D0 [19:52:46] i already reverted all edits in Q788822 [19:54:45] i have reverted more [19:54:58] dunno where to report though [19:55:11] thanks for the heads up [19:57:04] I expected like a big red threatening button on a users profile page [19:57:15] where is AIV?? [19:57:40] frickelL69 [19:57:52] sorry a baby touched my keyboard [19:58:07] AIV? [19:58:29] I would report at Wikidata:Administrators' noticeboard [19:58:40] https://www.wikidata.org/wiki/Wikidata:Administrators%27_noticeboard [19:58:50] WD:AIV redirects there [19:58:56] what need to be reported [19:59:01] ? (I'm admin) [19:59:17] https://www.wikidata.org/wiki/Special:Contributions/2001:464E:A0A7:0:4C8A:37E7:6B14:D9D0 [19:59:35] may not need to be blocked just yet [19:59:50] ah, vandalism in 2 different days, yes [20:00:14] blocked for 1 day [20:00:36] ok [20:00:56] i gave a warning [20:01:13] i guess its redundant now though [20:01:26] they are usually always reduntant for ip's [20:01:32] they don't care about warnings [20:01:32] true [20:01:49] especially schoold [20:01:53] *schools [20:01:59] schools are the worst [20:02:42] i think its time to block all schools [20:03:08] yeah, we should email the schools, or in the block summary write something funny [20:03:26] why don't we [20:03:41] I would be 100% supportive [20:04:15] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1966 bytes in 0.080 second response time [20:05:00] I would rather just block all Spanish talking IP adresses. [20:05:08] why? [20:05:20] if I see vandalism, it's mostly in Spanish [20:05:46] on wikidata? [20:06:00] that doesn't seem to be [20:06:01] Yes. [20:06:07] the case on enwiki [20:21:45] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1961 bytes in 0.071 second response time [20:42:14] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1952 bytes in 0.066 second response time [20:49:35] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1967 bytes in 0.077 second response time [20:54:35] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1948 bytes in 0.106 second response time [21:12:05] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1974 bytes in 0.186 second response time [21:22:24] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1970 bytes in 0.071 second response time [21:55:04] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1969 bytes in 0.096 second response time [22:00:05] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1949 bytes in 0.083 second response time