[00:03:40] hi [00:04:05] do we already harvest any data in wikisources? [00:44:24] Hello friends. I am playing around on Test Wikidata and I am trying to make a statement say "0.1 ppm". But when I enter "ppm" or "parts per million" as a unit, it tells me that I cannot do that. [00:44:30] See https://test.wikidata.org/wiki/Q1635 for my experiment [00:44:37] What should I be doing so that I can use ppm as a unit of measurement? [00:49:05] harej: Worked for me: https://test.wikidata.org/w/index.php?title=Q1635&type=revision&diff=19774&oldid=19772&uselang=en [00:49:25] hoo_: yes, because I created the parts per million thing just now :P [00:49:32] I didn't realize that the unit had to be an existing property on Wikidata [00:49:36] :P [00:49:43] An item, but yes :D [00:49:59] Also, does Wikidata have a preference between ppm or mg/m^3? I could provide either, but my source quotes in ppm. [00:50:16] mh, dunno [00:50:21] see what's already there, I guess [00:50:33] I'm pretty sure the label is parts per million on WD [00:50:44] I see no such property on (production) Wikidata [00:51:17] Also, can I create items for these measurement levels as needed, since it's a new concept for Wikidata? [00:51:53] https://www.wikidata.org/wiki/Q27084 yikes [00:52:05] That's the parts per million item [00:52:11] but also the parts per whatever one :S [00:52:40] The National Institute for Occupational Safety and Health considers parts per million and parts per billion to be entirely different things :P [00:52:42] oh, and it also seems to describe the concept of the parts per notation [00:53:53] https://www.wikidata.org/wiki/Q21006887 =] [00:54:20] Now have fun unwinding the mess in the first item :P [00:54:43] are people using Q27084 as an actual unit of measurement? :v [00:54:53] Quite probably [00:55:12] This is why I am spending the government's precious money on researching the lay of the land *before* I import this manual into Wikidata. [00:55:16] in most languages the label seems to be "part per million" [00:55:48] In all despite english AFAICT [00:55:54] (in all that I can read) [00:56:18] oh, latin is also about the notation [00:56:37] and maybe ms :P [00:58:22] hoo_: they should be separate items, no? ppm is semantically different from parts-per-whatever, since english wikipedia decided to do them all in one article [00:58:51] (i cleaned up this mess in another instance, for an article about some thai steel manufacturer) [00:59:09] ppm being a subclass of pp-whatever [00:59:14] harej: Yeah [01:00:23] alright, i'll take care of it [01:00:35] i'm from the government and i'm here to help =D [01:01:04] Wait a minute… that sounds suspicious [01:01:05] :P [01:02:12] The Arabic article is officially about PPM but it pays tribute also to PPB and PPT [01:02:55] Also, it fascinates me how much English has infiltrated other languages. [01:36:14] Now, mr. hoo_, how can I see which items are using the pp-whatever as a unit of measurement so I can fix it? :D [01:37:13] I fear that's not even part of our RDF output, thus can't even be queried using the SPARQL endpoint (but I *might* be wrong there) [01:37:31] despite of that... no idea... loop through all linked pages? :/ [01:37:54] Not even sure we log these as links, very well possible that we don't [01:37:57] * hoo_ shivers [01:40:09] Well, in any case, when I do my data mass-importation, I will not be getting it wrong :D [01:40:21] Thank you for your assistance. [01:44:13] You're welcome :) [02:15:01] I have another question. [02:16:25] There is a property for "symptoms", currently used on disease items. I would like to also use it as symptoms for exposure to chemicals, but it seems "symptoms" is too generic for my purposes because these are specifically symptoms *to unhealthy levels of exposure* (whether that level is a little or a lot depends on the substance). But would it be redundant [02:16:25] to create another property for "symptom for unhealthy level of exposure" or is it okay to just use the "symptoms" property? [02:17:19] It may be weird to see "Wikidata has information on symptoms related to influenza, ebola virus, and... benzene." [02:17:24] hm... such a thing could either be expressed with the generic property + qualifiers or with a custom property (and then you would still probably need qualifiers to be specific) [02:17:54] Yeah, that should probably be a differen thing [02:18:09] see http://www.cdc.gov/niosh/npg/npgd0665.html for example [02:18:20] that is the entry for warfarin, an anticoagulant [02:18:43] now normally, it's a medication. but healthcare workers who are exposed to this stuff all the time can get symptoms related to overexposure, including: "hematuria (blood in the urine), back pain; hematoma arms, legs; epistaxis (nosebleed), bleeding lips, mucous membrane hemorrhage; abdominal pain, vomiting, fecal blood; petechial rash; abnormal hematologic [02:18:43] indices" [02:24:25] Thinking about it, a specific property would be best [02:24:42] guess you will then use qualifiers to express how much is toxic or so [02:25:09] that information will be within the item itself [02:25:26] specifically the properties for IDLH level, REL level, and PEL level [02:25:48] those are, in descending order of danger: immediately dangerous to life and health; recommended exposure level; permissible exposure level [02:25:54] IDLH == you're gonna die [02:26:10] actually, switch recommended exposure level and permissible exposure level, since permissible level is usually higher [02:26:25] permissible exposure level is the level that is legally required as the maximum [02:26:36] and recommended exposure level is the level, usually lower, that is recommended by my employer [02:27:33] would that be satisfactory? [02:28:07] Modeling it like that sounds good to me [02:29:48] like this: https://test.wikidata.org/wiki/Q1635 [02:31:23] Are these like per kilogram or do they have recommendations per gender or so? [02:31:42] If that's the case, I think one property per type would be better, otherwise it's going to get messy [02:32:11] the dataset does not have different recommendations for body weight or gender [02:32:43] In that case one property for all of it sounds good enough for me [02:32:48] which is a valid criticism of the data! but it's the data we have [02:33:17] i don't think there is data for children since generally children aren't allowed to work with hazardous chemicals :P [02:34:36] Which doesn't mean they can't/wont get into contact with them in some other way [02:35:21] no, but they are not our audience. (except children working on farms, which is very common in the US -- you can thank our agrarian heritage for having them exempt from child labor laws) [02:36:12] in any case, as soon as there is a better data set we can use it. in the meantime, the (immediately harmful|legal|recommended) exposure limits and symptoms are generalized for all people and that's just how it is :/ [02:37:07] :/ [02:37:17] It's past 4am here... think I should call it a day :P [02:37:18] Good night [02:37:32] goodnight! thank you for all your help, especially since it's much later there than it is here [02:37:54] :) [05:45:19] So many misconceptions about Wikidata still. :( https://gerrit.wikimedia.org/r/#/c/193681/2 [11:38:46] Lydia_WMDE: around? [11:38:57] addshore: jep [11:39:17] is the number of mails sent in the mailing list last month actually usefull? [11:39:25] / should I both with it :P [11:39:54] *bother [11:40:16] addshore: i think it is useful. not a top-line metric though [11:40:29] if the rest is automated this is easy enough to do by hand as well [11:40:51] okay, well I just bashed together a script to do all of the other social ones automatically [11:41:06] also I did the site_stats ones earlier too ;) [11:42:52] \o/ [11:47:31] :) [12:32:32] DanielK_WMDE: https://phabricator.wikimedia.org/T99795#1626519 [12:34:10] and last comment in https://gerrit.wikimedia.org/r/#/c/236537/ [12:51:44] Lydia_WMDE: is that and mukundas comment enough for you to create the tags?: https://phabricator.wikimedia.org/T93499#1666854 [12:52:50] jzerebecki: ok will do tomorrow [13:12:40] * aude waves [13:12:53] DanielK_WMDE: do we still need https://gerrit.wikimedia.org/r/#/c/234916/ or is everything covered with the other patches? [13:31:04] I asked the late-night crowd last night; will now ask the day crowd. Last night I saw that a Wikidata item conflated "parts per million" with a generic concept of "part-per notation," so I created a new item for the specific parts per million measurement for use as a unit of measurement on statements. I now want to go through all the items inappropriately [13:31:04] using the "parts-per notation" item as a measurement unit; is it possible for me to do that? [13:32:47] Only with the API atm I think. [13:33:14] Will the API generate a list of all the items using a given item as a unit of measurement? If so, good enough for my purposes. [13:34:41] * harej waves at his fellow government staffer [13:34:44] harej, Using "parts per million" is very very discouraged. [13:35:20] Okay, but that doesn't answer my question. [13:35:35] harej, Can you point me to a statement that uses parts per million? [13:35:52] That's what I'm asking for: a list of such statements. [13:36:35] This is the item: https://www.wikidata.org/wiki/Q27084 [13:37:15] The new item I created for parts per million is here: https://www.wikidata.org/wiki/Q21006887 but as far as I know it is not being used as a unit of measurement on Wikidata (it's less than 24 hours old) [13:37:22] harej, "What links here" does not show any kind of relevant usage. [13:37:33] Would it show up there necessarily? [13:38:26] I think so. [13:42:26] harej, I think your description and statement in the newly created item are already misleading. ppm are not only used for particles, and they are not really a unit of measurement. [13:43:00] The description can be changed. [13:45:57] harej, Can you tell me, what kind of statements you want to add? Maybe we can come up with a better description. [13:49:01] Jonas_WMDE: http://php.net/manual/en/function.ignore-user-abort.php http://php.net/manual/en/misc.configuration.php#ini.ignore-user-abort [13:49:36] I'm importing a chemical safety handbook. However the handbook has a conversion between ppm and... grams per cubic meter or something like that. So I can convert to that. [13:49:59] (The handbook is in the public domain and considered authoritative) [13:50:35] Jonas_WMDE: http://tron.wikia.com/wiki/MCP [13:51:10] harej: it's PD? How come? That's unusual... [13:51:24] harej, In that case I would create an item "gram per cubic meter" and use that as a unit. Using just ppm could also be mistaken for "gram per 1000 kg" which is also common for chemical concentrations. [13:51:31] (not that it matters much, as long as you don't copy prose) [13:51:35] United States federal government :) [13:52:08] And I'm not planning on copying over prose. [13:52:26] ah, good old pd-gov :) [13:52:51] Not that there's much of it. It's the Pocket Guide to Chemical Safety, which is mostly a bunch of data. [13:52:55] * DanielK_WMDE didn't know that ppm is ambiguous wrt mass vs volume [13:53:17] harej, Copyright law is fun: Copy the lifes work of 1000s of chemits is free. Drawin a mouse with big ears will cost you millions in fines :) [13:53:24] *chemists [13:53:32] hehe... so true... [13:54:49] The U.S. government publishes recommended and permissible chemical exposure limits in terms of PPM, so I would want to use PPM to stay true to the source. But it also gives you the equivalency in terms of grams per cubic meter, so I could do quick dimensional analysis and import that value into Wikidata instead. Whichever is considered best practice. [13:54:51] DanielK_WMDE, Metrology for concentrations is a mess: https://en.wikipedia.org/wiki/Parts-per_notation#Mass_fraction_vs._mole_fraction_vs._volume_fraction [13:57:05] DanielK_WMDE: do we still need https://gerrit.wikimedia.org/r/#/c/234916/ or is everything covered with the other patches? [14:00:30] harej, I think best practice is the unit that can be automatically converted. For ppm that is even impossible without having the original source. [14:01:32] aude: good question. i suppose most is done by other patches, and there will be a lot of conflicts in any case. [14:02:22] tobias47n9e: so should I do the conversion on my end? [14:02:48] aude: i completely forgot this patch existed :) let me try to rebase it... [14:03:51] harej, I think that a simple conversion does not fall under original research. Plus one can set the original value in the source (but I am not sure if we have a property for that yet) [14:03:57] DanielK_WMDE: ok [14:03:58] * Nemo_bis hates ppm [14:04:16] * aude wants to clean up more of gerrit (one way or another) [14:05:35] * DanielK_WMDE wants more bpm [14:05:41] :D [14:06:05] That would be great data to import. Unfortunately, not within the scope of my job. [14:06:18] aude: conflicts across the board. i'll abandon the change [14:06:50] hi [14:07:02] do we already harvest any data on wikisources? [14:07:24] as in the website Wikisource, or sources used on Wikimedia projects? [14:08:43] DanielK_WMDE: ok [14:08:59] i think most of it is covered in the other patches [14:09:47] tobias47n9e: now that my new work computer is up, here's an example: http://www.cdc.gov/niosh/npg/npgd0420.html exposure limits are expressed in ppm but there is a conversion factor for converting from ppm to mg/m^3 [14:13:27] harej: wikisource [14:13:33] harej: In that case I would just grab the value that is in brackets: NIOSH REL = 10 mg/m³ - The rounding to whole numbers is probably within the margin of measurement. [14:17:30] Also, our nation of 350 million weirdos uses the degrees Fahrenheit system for temperature. The Wikidata item for "Fahrenheit" is https://www.wikidata.org/wiki/Q42289 but should it also be used for units of measurement? One does not say "32 Fahrenheit," but "32 degrees Fahrenheit". Is that an issue? [14:19:13] It seems in languages other than English people have set the label to say some variant of "degrees Fahrenheit" so perhaps I will do that [14:19:25] harej, I think you can use the normal Fahrenheit item. "degrees of temperature" is just a figure of speech that for some reason "°C" and "°F" have kept. [14:21:19] I am also not sure how the measurment stuff will evolve. Pure SI-system would make a lot of stuff easier, but Fahrenheit and Yards have big fan club, even on Wikipedia. [14:21:55] Yes. Unfortunately there's the 4% of us that use the weirdo measurements :/ [14:23:02] Though even in the States using SI is more common for scientific applications but there's a lot of legacy stuff that still uses degrees F and so on. [14:24:54] harej, The US-system does not stand on its own feet (no pun). There is no lab-reproducable definition of US measurements. So they are all defined as conversions from SI-system. So most machines will measure metric, but convert the measurment to US-units for display. Converting back already introduces errors because not all digits are displayed and rounding. [14:27:26] harej, Even NASA reports a lot of things in rounded and truncated "miles". No idea how we should derive the original values from press releases. [14:29:39] I would love to have NIOSH re-do the Pocket Guide to Chemical Safety to use exclusively SI measurements, but that may not happen any time soon. [14:32:42] What's your opinion on "percent by volume" as a unit (or "unit") of measurement? [14:33:07] Or "grams per 100 mL" [14:33:14] (convert to grams per mL?) [14:34:47] harej, I just saw that we have https://www.wikidata.org/wiki/Q834105 (gram per litre), if you don't mind doing the conversion. [14:34:57] Works for me. [14:35:17] Shouldn't affect the precision, since we're just adding zeros up front. 0.0003 or whatever instead of 0.3 [14:36:49] harej, And we can't do trailing zeros anyway, so the number of digits is not a statement on the precision. [14:36:50] No idea how the world survives while being infected with approximated measures from USA [14:38:02] Juandev: ricordisamoa imported some stuff from it.wikisource; not much else happened AFAIK, but there is a wikiproject on wikidata [14:38:13] tobias47n9e: any plans for changing that, or somehow defining the number of significant digits? [14:40:14] harej, I am not sure if there is a bug report yet for that. There is just too much going on, and the devs are also pretty busy. I think nobody even knows what the +/- is supposed to mean when entering quantities. It could be 2-sigma of a normal distribution, but who knows :) [14:43:30] there is in fact two of them :P [14:43:43] (they're slightly different, but both about quantity precision) [14:52:09] What about "percent by volume"? It's dimensionless and as far as I know should be the same whether working in SI or something else [14:55:51] harej, I believe that that should have an item for "mililitre per litre" or something similar? [14:56:15] I suppose. And should I convert mg/m^3 to g/l? [14:57:29] harej, I think both would work (because the we can already do conversions). [14:57:36] *then [14:58:10] there's no mg/m^3 item as far as I know but there is definitely g/l [14:58:31] and it's really the same thing, just different orders of magnitude [15:14:20] Jonas_WMDE: https://de.wikipedia.org/wiki/30._Februar [15:34:18] Found a silly bug https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/repo/includes/store/sql/WikiPageEntityRedirectLookup.php#L108 [15:34:29] Fixing it.. [15:49:53] aude: jzerebecki so, the social stats and site stats that we wanted for the dashboard are not auto generated [15:49:55] *now [15:50:28] not puppeted yet, also need to import the old data from the spreadsheet, but now it should run daily [15:51:20] they will be in http://datasets.wikimedia.org/aggregate-datasets/wikidata/ after an rsync [15:58:57] jzerebecki: https://imgur.com/6clWoAg [16:08:40] DanielK_WMDE: two chars? NS - ECN-nonce concealment protection? [16:10:07] addshore: oh how das it get there? running from the limn data repo? [16:11:24] Currently just running on my user on the analytics cluster [16:12:07] Ussing the sample cron in the last patch [16:12:36] Oh,and yes, be code for it is in a chain on gerrit in the limn repo [16:15:09] jzerebecki: the two characters are a priest and a rabbi. https://www.youtube.com/watch?v=cH7mEN6Yl3U [16:21:41] DanielK_WMDE: now dispatching of changes to abitrary kittens for user pages is working [16:21:45] https://en.wikipedia.org/w/index.php?title=Special:RecentChangesLinked&hidewikidata=0&target=Talk%3ANew_York_City%2FArchive_10 [16:24:52] jzerebecki: yay! thanks! [16:25:22] * DanielK_WMDE renames the script to dispatchKittens.php [16:27:15] addshore: aaaaaaaaarg a shell script :( [16:28:12] that does sql :( [16:28:36] jzerebecki: does it use sed or awk to build the sql, too?... [16:28:47] Does Wikidata support properties where the value is a *range* of numbers, or should I have separate property proposals for the upper bound and lower bound? [16:28:54] * DanielK_WMDE has been there, done that, got the core dump... [16:29:39] harej: that depends on the kind of range you are talking about. is it a range of uncertainty? or a period of time? or...? [16:29:40] DanielK_WMDE: no only variable expansion, among them those that execute other mysql commands during expansion [16:29:48] DanielK_WMDE: it's a range of temperatures [16:30:20] namely, the lowest temperature at which a gas can catch on fire, and the highest [16:30:26] fire!! [16:32:51] harej: i'd suggest separate properties [16:33:07] harej: uh, there is a highest tempreature for catching fire? what happens above that temperatur? [16:33:24] we use separate properties for start and end dates, too [16:33:43] i'd also expect separate properties for min and max temperatures of pleaces, etcv [16:34:14] huh, hadn't thought of that. but supposedly there *is* a upper explosion limit: https://en.wikipedia.org/wiki/Flammability_limit#Upper_explosive_limit [16:34:18] harej: at one point we considered "range snaks", and the idea isn't completely off the table, but it's not implemented, and people seem to be doing fine without it [16:34:50] Alright. I will propose separate properties then. [16:34:51] harej: that'S concentration, not temperatur [16:35:05] that's what I meant, then. sorry, confusing different things [16:35:06] harej: if there's too much "stuff", that probably means there'S too little ocygen for combustion [16:35:08] it's a range of concentrations [16:35:12] *oxygen [16:35:25] ah, i see, that makes more sense [16:36:28] also, how does Wikidata handle alternate names for properties? it's called the upper/lower explosive limit, but also the upper/low flammable limit [16:36:58] You have a 6 months discussion period to decide which one to pick? ;p [16:38:28] so we pick one and it becomes canon; no option for aliases I take it [16:38:36] properties can have aliases [16:39:09] Tests on WB.git master are broken after latest Time release [16:39:28] harej: sure, properties can have aliases [16:40:09] harej: property aliases *should* be unique, but we currently don't enforce this in software, because there are quite a few violations of this, that need to be cleaned up first [16:43:38] aude: DanielK_WMDE jzerebecki Anyone fixing the broken tests on master? [16:46:18] JeroenDeDauw: not currently [16:48:45] JeroenDeDauw: hm, broken tests on master? [16:49:00] * DanielK_WMDE is digging into core code [16:49:16] This probably fixes it https://gerrit.wikimedia.org/r/#/c/234481 [16:49:30] Just rebased it, so lets see what Jenkins says [16:49:50] DanielK_WMDE: got your hazmat suit on? [16:51:11] JeroenDeDauw: and asbestos underwear [16:51:49] Dunno what that is, though am not going to google it :) [16:52:06] JeroenDeDauw: whut, the release broke our code? that wasn't supposed to be a breakign change. [16:52:15] hm, we should better test aginst that [16:52:37] You can argue it fixes a bug [16:52:44] And that the tests that broke relied on that bug [16:52:56] Looks like Thiemo and Jonas where well aware of this [16:53:05] http://dictionary.reference.com/browse/asbestos+longjohns [16:53:31] JeroenDeDauw: fixed 1, for the other one a similar change to lib/tests/phpunit/formatters/MwTimeIsoFormatterTest.php might be needed [16:53:43] JeroenDeDauw: i discussed the release with jonas, but he didn't mention it would *break* tests. sigh... [16:53:57] JeroenDeDauw: jenkins sais NO [16:54:12] JeroenDeDauw: I don't think Jonas was aware, otherwise he would have merged the fixes, before heading home [16:54:22] I'll have a go at fixing the other one [16:54:30] thanks [16:54:42] ah, nice, it breaks a provider [16:54:46] why does it break a provier?! [16:54:54] jzerebecki: he had seen this commit... https://gerrit.wikimedia.org/r/#/c/234481/ [16:55:30] JeroenDeDauw: seeing and being aware of the implications are two different things [16:55:55] JeroenDeDauw: yea, but we assumed that was just needed to make use of the new release, and requires the new release. we did not expect that current waster would break when used with the new release [16:58:10] djeeez that provider >_> [17:08:59] Tests fixed by https://gerrit.wikimedia.org/r/#/c/234481/ [17:12:09] JeroenDeDauw: thx [17:27:34] Can somebody help me with a category and badge intersection? [18:12:56] * aude brings back the kitten :) [18:25:56] aude: you stole the kitten?! [18:26:42] wikidata stole the kitten! [18:27:44] Are we having kittens on the party? [18:29:32] JeroenDeDauw: Are you going to be dressed like this at the party? http://amzn.com/B00F2OONZW [19:11:58] * eurodyne waves [19:32:24] sjoerddebruin: more like https://s-media-cache-ak0.pinimg.com/736x/78/b5/c1/78b5c1263c74de7570c4c07f361dff22.jpg [19:32:32] sjoerddebruin: what party anyway? When do I need to hide? [19:32:49] Don't going to tell you after this. [19:39:11] lol http://www.wmf.com [20:12:47] Hm, how high is the queue for page move processing? [20:21:59] hi --anyone here who knows about the design of the wikidata pages? specifically, I'm wondering about those boxes that show a property of an item [20:22:22] is that something you can customize with mediawiki? [20:23:39] What sort of customisation do you have in mind? [20:25:48] ah, I just like the look of the property boxes [20:26:26] and wasn't sure if that's something I use the MediaWiki software to create, or if it's something any user of Wikidata makes [22:40:47] I have more data modeling questions! [22:43:38] I would to model this information (namely the things in the left column like "soap flush" and "fresh air") into Wikidata. Would this be an appropriate thing to model?