[11:33:01] hi [11:33:48] hello [11:34:05] I have read your comments. [11:34:14] nice [11:34:26] https://github.com/infobliss/sibutest2/blob/master/libraries/gen_lib.py [11:34:33] I removed the NA specific details [11:35:15] I am doing further testing to make sure that the proper wikitext is generated [11:39:19] infobliss: {{nl}} is a language template. you might not want to have the language be always nl [11:40:31] so I have to determine the language based on the glam. [11:40:51] that's one method [11:41:15] you can also try determine from the text if it is long enough [11:41:22] I think v2c does that [11:42:44] wow. But two languages may have the same character set. for e.g., many EU languages have. [11:43:14] so how to tell the difference? [11:43:30] not character set, but character appearance distribution [11:43:51] eg. in English "e" is the most common character [11:44:08] in other languages it's different [11:44:36] ok good to know. [11:44:51] or you can use google translate :) I have no idea how its language detection works [11:45:54] I think t can be hard to tell the difference between 2 languages [11:46:00] https://github.com/toollabs/video2commons/blob/master/video2commons/frontend/urlextract.py#L138 [11:46:11] well, yeah [11:46:51] actually, can you give me some sample sentences in various languages? I wanna play with it [11:47:07] sure I can [11:47:34] * zhuyifei1999_ ssh-ing into labs [11:47:34] for eg there are a few Indian languages with the same character set. [11:48:03] I can send you samples. [11:48:58] uh, just paste some here, or you can PM [11:49:55] hi infobliss [11:50:00] sorry my internet dropped [11:50:16] o.O [11:50:32] hi [11:50:57] let's continue [11:51:25] >>> guess_language.guessLanguage("abcdefghijklmnopqrstuvwxyz") [11:51:25] 'nl' [11:51:42] so guessing languages needs more text [11:51:59] https://en.wikipedia.org/wiki/Eastern_Nagari_script [11:52:08] is that an existing python library zhuyifei1999_ ? [11:52:12] this is the script for 4 languages [11:52:24] basvb: yes [11:52:28] v2c uses this [11:52:31] https://github.com/toollabs/video2commons/blob/master/video2commons/frontend/urlextract.py#L142 [11:52:35] nice, we can use that here as well [11:52:52] infobliss: sample text? [11:53:24] wait a min [11:54:41] I'm thinking, we can use the the language the glam is most likely to provide when the text is short, and guess it when the text is long [11:55:02] >>> guess_language.guessLanguage("I'm thinking, we can use the the language the glam is most likely to provide when the text is short, and guess it when the text is long") [11:55:02] 'en' [11:55:12] :) [11:55:18] nice [11:55:41] yes [11:55:51] about the libraries/mappings [11:55:56] actually the machine I am working on does not have Bengali keyboard installed. [11:56:06] @Basvb Yeah [11:56:11] why not do it like https://github.com/infobliss/sibutest2/blob/master/libraries/infobox_templates.py ? [11:56:18] and add this infobox template there [11:56:34] currently you're still focussing too much on this specific use case [11:56:55] the " nl " for example is specific for this case [11:57:11] and maybe in another case wewant to use the references parameter [11:57:23] https://github.com/infobliss/sibutest2/blob/master/libraries/infobox_templates.py <= I still don't see the point of the two functions [11:57:31] = {archiefinventaris} (archive inventory number), {identifier} (file number) [11:57:45] |accession number = {archiefinventaris} (archive inventory number), {identifier} (file number) [11:57:54] everything behind the = is glam specific [11:58:03] because in another glam they will just have one number [11:58:10] how do you mean zhuyifei1999_ ? [11:58:40] they are indeed a bit empty, maybe providing those templates in another manner is cleaner [11:58:49] but not sure what's the better way for it [11:59:03] from sibutest2.libraries.infobox_templates import art_photo_parameters [11:59:22] ok, then that's the way to go [11:59:22] is much clearer than: [11:59:49] from sibutest2.libraries.infobox_templates import get_art_photo_parameters [11:59:56] art_photo_parameters = get_art_photo_parameters() [12:00:25] same for the fill function? that still provides some functionality [12:00:32] or also prefer to do that directly? [12:00:48] I prefer directly [12:00:53] or with OOP [12:01:09] well I think good points for improvement [12:01:36] but I think forming the current photograph template to a more blank template is what we need for the nationaal archief [12:01:42] i.e. each GLAM can be a class that provide its own fill method [12:02:31] the idea is aah you disconnected [12:03:51] sorry net problem [12:03:57] no problem [12:04:23] we were discussing some improvements upon my templating method [12:04:41] @zhuyifei so there's one class for each glam that have the same set of functions? [12:06:08] no, "inheritance" [12:07:32] https://docs.python.org/3/tutorial/classes.html [12:07:46] the base class is going to have the methods [12:08:02] yes [12:08:52] but a few steps back: infobliss you get the idea behind the templating? [12:09:05] because the template you build now will only work for this one GLAM [12:09:27] and each actual glam class will define eg. what data api can provide, how to fetch them, how to build the infobox template, etc. [12:09:34] and the idea is to provide functionality which works everywhere, and we fill in the specific details on top of that [12:10:11] yes zhuyifei1999_, I was doing that in a different file per glam, class inheritance can also work for that [12:10:20] yes @Basvb I think it was not generic since I did not understand that many variables are NA specific. [12:10:35] Now I have a better understanding [12:10:39] everything behind the = in the wikitemplate is glam specific [12:11:05] infobliss: highly recommended: check out how youtube-dl works [12:11:08] maybe except for a few must have (source, title) [12:11:38] it does extraction for not only youtube, but many other sites [12:11:48] and provide a common api [12:12:04] https://github.com/rg3/youtube-dl/tree/master/youtube_dl/extractor [12:12:04] ok @zhuyifei [12:14:50] ok maybe we should go over the tasks and what to do the next few weeks a bit more [12:14:55] what the focus should be on [12:15:05] I liked the link to the file [12:15:13] after uploading [12:15:37] about the help tab: it shows nationaal archief specific information [12:15:55] from where-ever we access it [12:16:07] that won't scale up with 10s of GLAMS [12:16:08] @Basvb yeah [12:16:53] the idea is that the end product is a framework (with maybe a few glam mappings) where we can easily add a new mapping and don't have to make any chances to the general structure [12:17:20] we may have dynamic populating of that help info [12:17:29] so it's important that the pages don't focus on a certain structure applicable to only this case [12:17:57] yeah sure the help page I made is temporary [12:17:59] Ideally the form at home page is changable per glam [12:18:15] so for each glam you have to provide an identifier [12:18:54] but the help text above that field can be changed per glam (using some templated function or file, likely zhuyifei has better structure suggestions on that than I can give) [12:19:07] and maybe even the fields can change per GLAM [12:19:15] so for one GLAM we'd ask for categories [12:19:45] but for another we might not ask for categories (I think it can never harm to do so but maybe) and ask instead for a public domain rationale [12:20:01] I'd say that's part of MVP [12:20:18] what is zhuyifei1999_ ? [12:20:42] minimum viable product [12:20:57] what is " that" refering to [12:21:15] oh being scalable [12:22:02] well the part where it worked for the nationaal archive was pretty much there at the start, that's supposed to be the example which allows to show the capabilities of the framework [12:22:24] so i think it's important to focus a bit more on the general case and not on this (or multiple) specific glam mappings [12:23:25] ok we may ask the user to choose the glam from the dropdown and show specific forms for each glam [12:23:36] sure [12:24:04] ideally, but maybe start with this and make the help text (in or above the fields) easily changable per glam [12:24:37] the help text in the same page as the form? [12:24:48] yes [12:24:55] if it's about the form [12:25:11] we can make it a bit more concise as well [12:26:33] I've to go in 10 minutes btw, I'll be here tomorrow for a bit as well [12:26:48] (sorry in the middle of busy day with all meetings because of nearing vacation) [12:32:24] Hi I am sorry the net is giving problems [12:33:27] So for the first evaluation what all things are expected to be done by tomorrow? [12:36:59] np, let me see [12:37:09] https://phabricator.wikimedia.org/T161670 [12:40:39] I think the quickest win is on documentation + templating a bit more [12:41:09] but you shouldn't focus to much on a deadline tomorrow or the day after and more on the progress in general [12:41:42] On what to do in the next week: try to get this really cleanly working as a MVP which can easily be extended with the next mapping [12:42:31] I am really sorry for the bad internet connectivity today [12:42:33] so for now maybe some more documentation for the code [12:42:39] np, I had issues as well [12:42:42] well manage [12:42:42] I am unable to access phabricator [12:42:48] aah ok [12:43:04] May 4 to May 29 Community bonding period. Studying existing tools for uploading media to commons. Studying the Commons metadata fields in depth. Planning the design of the tool including both the frontend and the backend. Compiling a list of GLAMs. Adding and structuring the corresponding tasks in Phabricator. Requesting access to Tool Labs. May 30 to June 5 Designing the basic UI templates for the FLASK app. Study the API of a number of GLAMs to de [12:43:22] June 6 to June 18 Design the core elements of the FLASK app including the modules for license checking, metadata mapping and batch upload. Learn how to do the OAuth authentication using Wikimedia Commons login credentials and integrate that into the tool. June 19 to June 21 Contacting with a number of GLAMs to consider the viability of having a "Upload to Wikimedia Commons" button on their image collection site. Design an action plan based on the in [12:43:26] So docs as pdf files? [12:43:31] no [12:43:39] pydoc [12:44:22] Ok [12:44:24] that, and in a later phase maybe some documentation on a wikipage [12:45:09] where does what happen in the code (markdown on github?). But I think zhuyifei1999_ will be better knowledged in this than I am [12:45:24] I'll be off now for the next meeting [12:45:33] just like: [12:45:37] def abc(): [12:45:47] """some docs here""" [12:45:50] pass [12:46:03] aah yes, general code commenting [12:46:05] Ok [12:46:19] but also a bit beyond that, some info on the github (where to find it at work) [12:47:05] I'm off for now, good day and we'll see eachother tomorrow I think [12:47:14] Su [12:47:21] or for more sophisticated docs you can use something like sphinx [12:47:46] http://www.sphinx-doc.org/en/stable/ [12:48:07] Ok I will go through this to know about this [12:48:21] but I don't think that's required for this project though [12:48:22] I am also unable to ssh to tool labs now [12:48:39] having one would definitely be a huge plus [12:48:44] uh :/ [12:49:04] Looks like some configuration issue [12:49:31] Yeah I will see this in more detail [14:37:31] hi zhuyifei I have got into some system related trouble. I can't login to my ubuntu machine. It says failed to start session. [14:37:52] after some effort I am now working on a new machine [14:38:14] from this machine I want to ssh to tool labs [14:38:57] ssh -i ~/.ssh/id_rsa username@login.tools.wmflabs.org [14:39:29] here do I have to use the same id_rsa file as before? [14:41:39] I am getting "ssh: connect to host tools-login.wmflabs.org port 22: Connection refused"