[11:19:35] hi [11:26:09] hi [11:31:09] @zhuyifei1999, so for the NA glam we can expect an url of the form 'http://www.gahetna.nl/beeldbank-api/opensearch/?q='+ searchstring only? [11:31:43] i.e. we will upload based on given searchstring. [11:31:50] I'm not very familiar with NA glam [11:32:54] Ok Is there any glam you would suggest whose API I should look into? [11:33:00] though, I think handling a specific image based on id should come first, before working on search query support [11:33:10] basvb: ^ [11:33:31] Hi, in the middle of something, will react in a few minutes, saw something on the strings from yesterday [11:34:26] infobliss: I'm more familiar with the wikimedia side stuffs :) but if there are technical documentations on glam apis I should be able to help as well [11:34:53] ok [11:35:31] The images have a share link like http://proxy.handle.net/10648/a9946ea0-d0b4-102d-bcf8-003048976d84 [11:35:49] so that is the most clear/easiest for users and consistent [11:36:02] I think it is best to see if we can look those up in the api [11:36:38] I believe 10648 is to indicate Nationaal Archief, so we could check for a string which starts with "http://proxy.handle.net/10648/" [11:36:47] and then use the second part to find the image in the api [11:37:55] hmm that doesn't seem to work directly like that [11:38:20] I think regex is better though [11:39:26] @Basvb I am trying to relate to the NationaalArchief.py you had written. [11:39:52] That was a bit hacky [11:39:53] If we expect a url like this from the user, can we get a useful xml from this? [11:40:12] my script can be abused heavily [11:40:22] if you type in "de" you will get thousands of uploads [11:41:04] http://www.gahetna.nl/beeldbank-api/opensearch/?q=910-3888&count=100&startIndex=1 [11:41:11] that is the XML of the previously linked image [11:41:22] the "a9946ea0-d0b4-102d-bcf8-003048976d84" isn't simply a field [11:41:39] and I can't query based on url's, maybe you know a trick for that zhuyifei? [11:42:08] ? [11:42:10] also we probably should not use q= (query), but a specific field to match [11:42:34] http://www.gahetna.nl/beeldbank-api/opensearch/?q=hdl://10648/a9946ea0-d0b4-102d-bcf8-003048976d84&count=100&startIndex=1 [11:42:38] I want to something like that [11:42:43] as it is one of the fields [11:43:44] but maybe this is the wrong entry point and we should be using the "bestandsdeelnummer" from http://proxy.handle.net/10648/a9946ea0-d0b4-102d-bcf8-003048976d84 which is 910-3888 [11:43:52] however I'm not sure whether those are unique [11:44:25] I don't think I understand these. What's bestandsdeelnummer? [11:44:52] It's the unique identifier I think. [11:45:17] a9946ea0-d0b4-102d-bcf8-003048976d84 technically is a UUID [11:46:08] http://www.gahetna.nl/beeldbank-api/zoek/a9946ea0-d0b4-102d-bcf8-003048976d84 [11:46:12] that seems to be more like it [11:47:15] well, me is nl-0 so :/ [11:47:28] ok got it. [11:47:31] nl-0? [11:47:35] It contains Bestanddeelnummer too. [11:47:35] it is in dutch yes [11:49:00] hmm their copyright parameters are a mess [11:49:34] it's listed as cc-by-sa, but then a lot of reuses under cc-by-sa are denoted as false (commercial publication, use own website, etc.) [11:49:58] oh, we can ignore that [11:50:14] if it's cc-by-sa we assume duel licensing [11:50:32] true, but those kind of things will be collection specific [11:50:46] here it is pretty easy: we should check the cc-by-sa, cc-by and pd fields [11:50:55] if one is true it is ok, if all are false we can't upload [11:51:19] yeah [11:51:45] we could probably add multiple license tags if more than one is true [11:51:55] what is pd field? [11:52:09] pd = public domain [11:52:18] auteursrechten_voorwaarde_Public_Domain [11:52:24] though, which tag should we use if it is pd? [11:53:19] it's based on pd-old I believe [11:54:05] pd-old-auto? then we would need to know the deathyear of the author [11:54:45] yep or anonymous publication [11:56:50] http://proxy.handle.net/10648/af9645e4-d0b4-102d-bcf8-003048976d84 stuff like that is in there [11:57:29] aah apparently they moved their cc-by stuff to cc-0 [11:58:13] http://proxy.handle.net/10648/a96dc9b2-d0b4-102d-bcf8-003048976d84 is listed as cc-0 [11:58:26] but the api http://www.gahetna.nl/beeldbank-api/zoek/a96dc9b2-d0b4-102d-bcf8-003048976d84 lists it as unfree [11:58:54] actually, if it's not freely-licensed, we can force a pd-scan on this image right? [11:59:18] wmf legal said they don't care if the museum protest or not iirc [11:59:39] what do you mean with a pd-scan? [12:00:06] in this case the archive wants to license everything it can freely (PD) I believe [12:00:25] but for museums with 300 year old works: yes they can be uploaded, even if not claimed to be fre [12:00:26] free [12:00:38] however you'll be breaking the API user agreements likely [12:00:43] https://commons.wikimedia.org/wiki/Template:PD-scan [12:00:50] hmm [12:03:24] Looks like finding the correct PD is going to be a bit complicated [12:03:31] http://www.gahetna.nl/beeldbank-api/zoek/ac4d068e-d0b4-102d-bcf8-003048976d84 cc-0 file [12:05:06] It has "auteursrechten_voorwaarde_Public_Domain":true [12:05:35] So is it in the public domain? [12:07:16] yep but there are many reasons why something is in the public domain [12:07:26] and we have to denote the correct one [12:07:43] ok [12:07:47] so in this case it is because the author (or rights holder) released it as cc-0 [12:08:04] ok CC0 means “No Rights Reserved” [12:08:14] scanning doesn't qualify has above TOO [12:08:51] but when it is about older pictures (70 years after authors dead in EU) or taken before 1923 in the US it becomes PD-old [12:09:01] and then there are dozens of other variants still [12:09:05] yeah [12:09:06] but those are the main 2 here [12:09:26] zhuyifei1999_: depends on the country, in NL scanning is not considered creative work [12:09:37] in France or Spain I believe it is [12:10:14] yeah, so if it is PD-old, licensing it again under cc-0 doesn't make sense [12:10:25] However I once tried to argue that all of googles satelite images (as they are not created by a human and mere reproductions of reality) should be considered below TOO [12:10:43] but I guess as Commons we don't follow that stance [12:10:51] :/ [12:11:10] well, I'd agree that those satelite images aren't copyrightable [12:11:30] But let's get a bit more on topic [12:12:06] infobliss: I think it is good to get a bit of a grasp what's going on in this API, but in the end it's probably good to focus on the general structure [12:12:24] we have to decide what we have to do for the main structure, and what should be done in GLAM specific mappings [12:13:27] and make sure it works for as many API's as possible, not just this type of API [12:14:25] Is there any other API you would suggest? [12:15:22] well I think as part of the https://phabricator.wikimedia.org/T164710 task it is a good idea to search a bit [12:15:34] alright. [12:15:47] Sure I will do that and report to you soon. [12:15:49] but it's not an easy task [12:16:08] I think it's best if we co-operate on that one [12:16:18] yeah [12:16:37] I know some interesting Dutch ones by head, but maybe we could look into some UK and US GLAMs [12:16:51] not sure if there are GLAMs in India who have API's with open images [12:17:15] https://commons.wikimedia.org/wiki/Commons:Batch_uploading is a good place to start from [12:17:51] I am not sure either. [12:19:45] the batch uploading page is not really up to date, and a lot of it is complicated (which is the reason it never got uploaded) [12:22:25] In the Netherlands we are spoiled with the good open data structure, I'm looking for some international equivalent of https://data.overheid.nl/ [12:22:39] that website lists 1000s of dutch (government) open datasets [12:24:02] okay [12:24:35] http://labs.europeana.eu/api europeana is also an entrance [12:24:41] some uploading work has been done on some of the GLAMs [12:24:56] yep some are completed, some are not feasable [12:25:17] http://labs.europeana.eu/data [12:25:18] is interesting [12:26:32] hm taking a look. [12:26:50] but with europeana we should watch out for circular collections, I see a WLM sweden dump there listed as a collection [12:27:00] those are images where commons is the original source [12:28:07] ok [12:29:46] BTW I was supposed to do some stuff with the renaming, correct? [12:29:51] what is the page number for this WLM sweden dump? [12:30:40] http://labs.europeana.eu/data/wiki-loves-swedish-monuments [12:30:42] renaming of the tool? [12:31:39] to glam2commons? [12:32:12] yeah have you found a more expressive name? [12:33:03] nope, I'll request on phabricator to rename the project, and from there we can go using glam2commons in our communications [12:33:31] I was testing yesterday the sibutest on tools [12:33:42] you got the oauth working? [12:33:47] yes [12:33:53] uploading is pending [12:33:54] nice [12:34:10] aah yes, I was trying to upload something but it didn't go through [12:34:21] but watch out with exposing the na.py upload [12:34:28] because it can upload hundreds of images [12:34:34] and create quite a mess [12:34:43] Yeah I will use site.upload. [12:34:47] and one at a time [12:34:51] I was not sure about the URL? [12:34:58] url for what? [12:35:12] So I asked what are posible URL to be taken from user. [12:35:27] aah ok [12:35:46] now I got the answer :) [12:35:50] well I think for the National archive case this is the answer [12:35:53] but that is for now [12:36:04] we should think about a structure which works for multiple GLAMS [12:36:09] and how to communicate it [12:36:28] So how I see it is the user can select a GLAM to upload from [12:36:53] they select/click on that GLAM (depends a bit on whether there are less or more than 10 GLAMS [12:37:21] ok [12:37:38] Suppose user chooses the NA glam [12:37:44] then they get instructions on what information to provide (eg.: URL looking like "http://proxy.handle.net/10648/af9645e4-d0b4-102d-bcf8-003048976d84" or bestandsdeelnummer looking like "908-1823") [12:37:56] with a box they can enter that in [12:37:57] ok [12:38:07] there is also some instructions on what we expect them to provide for the image [12:38:43] eg.: "For this glam we would like you to add some categories after uploading" [12:38:59] ok [12:39:15] those are going to manually given? [12:39:19] *to be [12:39:46] or: "Some of the files might show derivative works, those files, although listed as CC-BY can not be freely reused, please make sure that you do not upload those" [12:39:53] yep some GLAM specific instructions [12:40:30] ok [12:41:19] then they enter the images and press continue [12:41:20] If the user gives the bestandsdeelnummer for NA glam how will we get the corresponding URL like "http://proxy.handle.net/10648/af9645e4-d0b4-102d-bcf8-003048976d84" ? [12:41:34] ow no, we instruct them to give the proxy-url to start [12:41:36] with [12:41:41] "http://proxy.handle.net/10648/af9645e4-d0b4-102d-bcf8-003048976d84" and the bestandsdeelnummer are not directly related [12:41:53] it's just for another glam we might ask something else [12:41:55] oh ok [12:42:12] then they click next [12:42:22] maybe later on we support multiple images [12:42:46] if possible we could show a preview/thumb of the image after next [12:43:09] if we support multiple image then multiple images can show up which they can select [12:43:18] hmm yes. [12:43:21] we can also list some proposed categories (it will be a recurring theme) [12:43:28] or ask the question about derivative [12:43:39] or anything which has to be provided for this collection [12:44:58] ok [12:47:06] then they fill that in [12:47:11] and then there is a button: UPLOAD [12:47:18] the files get uploaded [12:47:25] and the page refreshes with links to the file [12:47:39] first all singular, later on maybe multiple files [12:47:49] that's just what looks suitable to me [12:47:53] I ahave one observation [12:47:55] we should formalize that a bit [12:48:06] yeah [12:48:08] and of course think about whether it works, ask some users to test etc. [12:48:28] sure [12:48:42] what's the observation? [12:48:59] the uploading takes some finite time. At that time the browser page becomes unresposive. [12:49:08] We have to do something to that. [12:49:22] you mean, in the proposed flow? [12:49:31] no [12:49:42] say the image is 7 MB [12:49:42] aah as a technical problem you have now? [12:50:16] Yes it takes say 2-3 sec to upload [12:50:49] during that time the user may be shown some progress bar etc [12:52:37] yes [12:53:23] may be zhuyifei1999 can better help with that. [12:53:37] v2c has a progress bar during upload. [12:54:08] oh it doesn't show anything [12:54:52] Does it not show the % of upload done? [12:55:26] to wikimedia? no [12:55:42] pywikibot doesn't provide it [12:57:25] I'm going to rename the project on phabricator now, so from now on let's go with glam2commons, unless there are some last sec objections [12:57:39] k [12:57:44] tom29739: ^ [12:58:38] Why is the project being renamed? [12:59:18] single vs batch [13:00:06] there was some agreement that the name is confusing [13:00:06] glam2commons seems fine. [13:00:29] It's in keeping with other tools like video2commons and flickr2commons. [13:00:57] it was meant as a bit of a contradictio in terminis and half joke, but seems like that didn't come across very well ;) [13:01:08] yep glam2commons is a bit more straightforward [13:01:12] ok I'll go with that [13:01:26] https://phabricator.wikimedia.org/project/profile/2690/ [13:03:27] Shall I rename the the subtasks too? [13:03:41] yes [13:04:24] ok