[03:44:17] Hello good people [11:00:35] can an admin please block 2001:999:21:df52:c4e0:d31d:4668:6fa vandalism [11:01:28] !admin ^^ [11:01:28] Please visit https://www.wikidata.org/wiki/WD:AN [11:01:44] Jianhui67: [11:03:15] Done [11:44:14] hey, kind of a dumb question, but how do you actually develop using https://github.com/wmde/wikibase-docker ? It looks like all the source code is in weird folders. Do you just clone a local version of what you want to hack on and then just point your docker-compose-build file to point to that? Is there a tutorial on that part anywhere? [12:21:53] search seems to be lagged again :( [12:23:12] addshore: a question about wikibase-docker ^ [12:23:31] *reads* [12:23:45] mvolz: i dont devleop using it :) [12:23:52] / it is not designed at all for development [12:24:23] addshore: ha well that explains a lot [12:24:34] What are you trying to do? :) [12:25:05] Test changes. I can't get the vagrant / wikibase set up working so thought I'd change tactics. [12:25:23] is it just the query servcie you really want or? [12:25:36] I use https://www.mediawiki.org/wiki/MediaWiki-Docker-Dev [12:25:39] for my dev env [12:26:17] addshore: core, actually, and also to add extensions. [12:26:23] thanks that helps a lot! [12:26:47] I'll give it a go. [12:27:20] well, wikibase extension. not mediawiki core. [12:29:53] I'll need the query service working on it as well probably? [12:46:52] addshore: https://www.mediawiki.org/wiki/Wikibase/Installation#For_developers [12:47:22] just because i've read it just a few minutes ago: the docker is linked at the "for developers" section of that page [12:50:20] aha, I knew I didn't pull that out of thin air :). Maybe it should link to https://www.mediawiki.org/wiki/MediaWiki-Docker-Dev instead and we can add specific directions for wikibase / query service to it? [13:18:47] addshore: btw. when the docker images at https://github.com/wmde/wikibase-docker are *not* intended for development, what are they actually intended to be used for? [13:19:11] can i use them to set up a "production grade" instance of wikidata? [13:20:07] e.g. to set up a wikidata instance to be openend and reachable from the internet? [13:20:45] Or are they just though as a quick way to setup a wikidata for personal evaluation uages? [13:22:32] hjb: if you do it right then yes, they could run a real life production setup [13:22:35] that's something i need to know to decide on which way to go setting up our own wikidata instance [13:23:04] hjb: i can link you to some reading about the images that might help you learn about them and decide etc [13:23:52] hjb: basically the whole story of them is covered @ https://addshore.com/tag/wikibase-docker/ [13:23:58] addshore: yes, links with further reading ounds useful [13:27:37] addshore: in case i've got an aleready up and running production environment with mysql, apache and php in place for setting up applications [13:27:53] but no infrastructure for runinng prodcution docker containers yet... [13:28:58] is it easier to set wikidata (and the stuff around it, like wqds, elasticsearch (is that really needed for a small instance?), quickstatements...) and that "old" stack [13:29:16] or is it worth setting up a docker environment?` [13:29:20] for small instances you probably dont need elastic search [13:29:39] good information. [13:29:52] so search works without having elastic search? perfect [13:29:57] you can also setup wdqs of course without using the docker images, if anything the Dockerfiles can just be used as a form of documentation [13:30:05] hjb: yes, it just wont be as good [13:30:09] "small instance" is btw. ~100 users max, i estimate [13:31:14] how about frequency of edits? or pageviews, or item / property count? [13:33:02] i have no real clue, but tends to be more about one edit per minute instead of an edit per second [13:33:11] pageviews i'd guess the same [13:34:02] item counts / propertiers might be more [13:34:59] something like 14 million entities at the end of the project [13:35:17] each with a subset of ~50 properties [13:35:34] sorry it i'm not using the correct vocubulary, quite new to wikidata [13:38:11] btw. what's the way to go to import foreign data into wikidata? do i need to use quickstatements for that purpose or it there something like an "import" API? [14:31:30] addshore: https://hub.docker.com/r/wikibase/wikibase/dockerfile [14:32:02] the content looks debian/ubuntu based - can i still use it with docker on RHEL? [14:32:16] ok, not building the images, but using them [14:33:09] hjb: yeh 1 per min sounds fine :) [14:33:38] so, when your talking about 14 million entities there are some things to take into consideration that you might want to do, but that isn't documented very well [14:33:59] for example, not storing the text in the mysql table directly, but using some external storage (as is done for wikidata) [14:34:24] Was OpenRefine always so slow? :| [14:34:27] hjb: it depends what kind of foreign data, stuff not from a wikibase? [14:34:41] hjb: yes, it should work just fine in RHEL [14:35:10] and hjb if you read the list of blog posts you can see that in some situations you may need to build your own docker file, depending on what customaizations you want etc [14:47:08] addshore: those 14 million entities are actually stored in a database table (but not mysql). the data itself isn't that much ~6-7GB i guess [14:47:55] addshore: yes, forein data not from a wikibase. authority records in whatever bibliographic format suits most well for mapping and import (marc, marc-xml, rdf, ...) [14:48:41] addshore: so when i need to build custom images, i'd either need an ubuntu/debian machine or adopt the docker scripts for RHEL, right? [14:48:50] yup, then quickstatements etc might be good candidates for importing data [14:49:08] hjb: nope, you can build them from any machine :) [14:49:17] ok, so i'll add quickstatements to my setup for sure [14:49:22] depending on what changes you might be making we may be able to incorperate them into the base images too [14:49:49] there are other tools as well (than quickstatements), if your looking at the code level then in python the wikidata integrator is good [14:51:26] wikidata integrator - i'll look it it up [14:51:34] Pywikibot as well? [14:52:00] are those 6-7GB enough to consider using a diffeent datastore than mysql tables? [14:54:02] well, that depends on how big your mysql server is ;) [14:55:00] but, it is planned to be around for a while, and continue to be edited, with the added revisions of entities youll end up with a growing data set [14:56:07] good answer :) [14:56:52] addshore: is it possible to swith over to another datastore later in case i need to? Without too much hassle? [14:59:36] okay, another dumb (probably?) sparql question: I need to filter entities without english labels. I added FILTER((LANG(?businessLabel)) = "en") but that's.. filtering out everything, so I'm clearly missing something in the query language [15:00:57] got it [15:01:02] thank you, rubber ducks! [15:01:56] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @nuria & @Thiemo_WMDE - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:08:19] hjb: yes :) [15:08:29] it is possible and although it will of course be work, it shouldnt be too much work [15:09:38] perfect, thanks [15:10:10] btw. i've got the first instance up and runing now using the docker-comoose file under RHEL [15:10:45] i can access it, but that's all i know. Is there some kind of test suite to test if everything's working fine? [15:28:32] hjb: no, not really a test suite for it [15:28:53] hjb: if your running full steam into the docker images i do recommend reading that list of posts that I sent you earlier :) [15:29:04] also, if your working with wikibase, there is a telegram group and user group [15:29:20] hjb: https://meta.wikimedia.org/wiki/Wikibase_Community_User_Group [15:51:25] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @nuria & @Thiemo_WMDE - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [16:04:05] James_F: any idea how many media info entities there currently are? [16:04:11] is there any tracking for that? [16:04:34] addshore: No, and no. [16:04:37] :D [16:04:49] addshore: If you wanted to know, you shouldn't have helped us stop using wb_terms. ;-P [16:04:58] haha, true [16:12:43] addshore: thanks for the ug link [16:12:47] "Create documentation about upgrading Mediawiki and Wikibase, as no process exists for this as Wikibase software is currently not upgradable." [16:12:48] hjb: np :) [16:12:55] What? Not upgradable? [16:13:02] it is upgradable :) and there are docs :P [16:13:14] the upgrade process is just the mediawiki upgrade process [16:13:26] https://www.mediawiki.org/wiki/Manual:Upgrading [16:15:20] ah :) than that other page is just not upgraded :) [16:18:58] "This is a discussion group for users of Wikibase who are running their own instance with the docker-based distribution available at https://github.com/wmde/wikibase-docker " [16:19:06] https://lists.wikimedia.org/mailman/listinfo/wikibaseug [16:19:21] Is the UG really only for users of the docker-based distribution? [16:28:47] James_F: looks like you have 273k mediainfo entities, unless i forgot how to write sql queries [16:29:00] James_F: and 694k media info revisions [16:29:16] hjb: I dont thiink it should be [16:29:41] addshore: Kk. Interesting to know. [16:30:02] addshore: me too. reading several of the mails on the lists it obviously isn't [16:30:14] just misleading description [16:30:28] i'll check out the telegram channel [16:31:01] addshore: you've helped me a lot today, thanks! i'll be probably back in near future ;) [16:32:10] hjb: no problem :) (I'm also in that telegram group ;)) [16:33:15] James_F: fyi https://quarry.wmflabs.org/query/34303 and https://quarry.wmflabs.org/query/34304 [16:36:13] addshore: Those are technically "273k pages which have had their mediainfo entity edited ever" (including things which were vandalism now reverted and are blank), and "694k edits to pages which had a mediainfo entity at the time, including the creation of captions but also edits to wikitext pages which have an associated mediainfo entity which wasn't edited". [16:36:25] addshore: Isn't MCR fun? :-( [16:38:14] James_F: yes, that is technically true :) [16:38:23] mcr is fun :D [19:54:58] is it true that there's no vagrant role for Wikibase lexemes? [19:55:52] James_F: btw, would you like to remove -2 from https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/490648 ? Not deploying it today yet, but soon [19:56:35] addshore: ^^ about vagrant - do you know? [20:01:45] SMalyshev: I'm not sure to be honest [20:02:21] I would guess no [20:02:45] Adding if shouldn't be hard for someone that knows how vagrant works, you don't need to set any config or install any tables [20:02:51] Just load it next to wikibase [20:02:59] (I'm pretty sure) [20:24:16] addshore: ideally also create some basic entities maybe? [20:24:37] IIRC Lexeme requires some to work properly... [20:24:41] like languages, etc. [20:27:29] SMalyshev: AHH, vagrant does things like that too? [20:27:47] So, all you need is a single item, and you can use that for everything that requires an item id [20:27:54] Such as language, lexical category [20:28:01] addshore: vagrant can do anything you tell it, there's puppet config which can be run on role provision [20:28:25] what happens there is whatever you tell it to happen :) you can run scripts etc. [20:29:37] I'll probably not going to spend time on it now but if it doesn't exist and I have some time I might try to do it later [20:30:57] aha there's already T206175 for it [20:31:02] T206175: mediawiki-vagrant role for WikibaseLexeme - https://phabricator.wikimedia.org/T206175 [20:45:44] SMalyshev: Done. :-) [20:45:51] 10x :) [22:05:14] hello hello! I'm looking into adding better stats support for wikidata to Wiki Education Dashboard... for an editathon or other time-bound program with a discrete set of editors, we want to be able to show things like the number of wikidata statements edited and added, the number of references added, the number of descriptions added or edited, and so on. [22:05:20] Any advice on where to start? [22:13:11] addshore: it appears that sense search does not have code to use CirrusSearch at all. I though DB search wasn't working for Wikibase now, so it works for senses? [22:13:32] or sense search is not actually working? [22:17:02] I mean searching sense by anything but ID [22:23:14] SMalyshev: no idea without checking [22:23:44] There is no search in wikibase PHP for senses if it isn't provided by cirrus or mediawiki [22:24:21] @ragesoss: your hmm, stats as In what? [22:24:38] Live things, something just for this event? Something that will live on for longer? [22:37:25] @addshore as in being able to calculate these things for an arbitrary set of editors over an arbitrary date range. Here's one example: https://dashboard.wikiedu.org/courses/Wayne_State_University/Wikidata_workshop_(Fall_2018) [22:38:14] The dashboard shows stats for what the participating editors did, which was mainly edit Wikidata (creating two new items, and editing a total of 23, plus 3 wikipedia pages) [22:38:27] I'd like to have more granular stats for the wikidata contributions... [22:39:19] such as how many new statements were added, how many references were added. [22:49:15] In think most of the data I care about is captured in edit summaries, for example, this edit added a reference to a claim: https://www.wikidata.org/w/index.php?title=Q42&diff=619838943&oldid=619838920 [22:49:36] hmmm [22:49:56] (but obviously, I don't want to be trying to extract that from the text of edit summaries) [22:50:09] one possible way would be to use the editors personal edit history as a start [22:51:24] that's basically what the dashboard does already, importing a list of revisions (from the wmflabs replicat DB) with some metadata such as the bytes change. [22:53:07] but it's not clear to me whether that is the best approach for answering these questions about the finer-grained details of a wikidata item. [22:53:17] hm, having a regex running over the edits could be enough for a start and should be quite fast, unless it's a very large amount of edits. [22:53:55] have you seen xtools yet? https://xtools.wmflabs.org/ [22:54:25] yeah... I was hoping for something better than using a regex over diffs or edit summaries. [22:54:44] does xtools do things at the level of statements and references, for wikidata? [22:56:03] It can: https://xtools.wmflabs.org/topedits/www.wikidata.org/Sotho%20Tal%20Ker/0/Q12876404 [22:57:07] you just have to know a username and the edited page. [22:57:25] that's showing the edit summaries and diff sizes, but that's not the same as answering the question of how many statements were added. [22:58:09] just count them :D [22:59:46] you mean, by reading each edit summary to see which ones have some text that corresponds to adding a statement (like `wbcreateclaim-create:1`)? [23:00:19] which is basically, applying a regex to the edit summaries? [23:00:29] I could go that route if needed. [23:01:00] well, hm, do you plan to do the statistics by hand or via a (semi-)automated tool? [23:01:15] but I'm hoping for something a little more structured (and flexible for edits that come from a variety of different sources, which may or may not match the regex patterns) [23:01:45] SothoTalKer: I'd like to add support for counting these stats to the Wiki Education Dashboard. [23:02:41] So I'm asking about the best way to do a large amount of this kind of stats generation programmatically. [23:02:47] I personally don't know any tool that currently does exactly what you need. [23:03:37] I'm not necessarily looking for a tool. More like, advice on the best places to start with what's available from the API and/or queryable from the replica DB. [23:03:49] The tool is what I'm planning to build. [23:05:17] good night [23:06:42] wikidata uses a predefined set of edit summaries, i.e. wbcreateclaim-create for creating a claim, wbsetreference-add for an added reference. You should be able to extract all of those from any user, parse it and display it. It will be centered on wikidata, though. [23:09:14] SothoTalKer: even for edits that come through the API rather than the web interface? [23:10:19] it seems so... [23:10:42] ragesoss: yes, the summaries are probably the best bet [23:10:42] I guess those could change over time, but in practice they'll probably be pretty stable. [23:11:03] There is a docs/summaries.wiki documenting what to look for in the summaries I think [23:11:17] The ui and API will use those summaries for most cases [23:11:36] cool... I guess I just assumed there would be something more... structured-data-y. [23:11:39] The exception is wbeditentity which is an API module, but that is only used by bots and tools etc, will never be done by regular UI edits [23:11:50] There isn't anything unfortunately :/ [23:12:02] There is also https://www.wikidata.org/wiki/Wikidata:Statistics and SQID or Wikidata-Toolkit on github: https://github.com/Wikidata [23:12:20] but being able to get pretty reliable info out of edit summaries like that should make it actually pretty easy to implement. [23:12:25] I might try and calculate some data like that for all of history on Hadoop, but it's a bit complicated [23:13:32] And won't get done any time soon [23:15:07] addshore: do you have an example edit with wbeditentity? [23:27:31] i see, there is "wbeditentity-update:0|: Moving claim from Q27603, using moveClaim.js" :D [23:49:50] Ah! There's another possible approach I see... ORES extracts a lot of useful features for a given revision of a wikidata item: https://ores.wikimedia.org/v3/scores/wikidatawiki/840608564/itemquality?features