[02:09:27] i wished we could allow two links to the same wiki :p [10:48:50] MisterSynergy: assume you have seen that PAWS is back? [10:49:26] MisterSynergy: https://wikitech.wikimedia.org/wiki/Incident_documentation/20190213-PAWS [10:49:50] MisterSynergy: "In general, PAWS is in a weird state. It's a service that WMF/WMCS wants to offer to the Wikimedia Movement, but we don't have enough human capacity to fully support it." [10:55:20] Thanks fuzheado [10:55:58] MisterSynergy: Actually I had no idea this was the state of affairs before toolsdbpocalypse :) [10:57:49] I'm wondering how important PAWS actually is [10:58:35] at some point in the past, I have evaluated PAWS edit numbers for Wikidata, and it appeared that it is not used by that many users [10:58:58] no idea about use in other projects [11:01:59] I just started using it recently and find it super useful. In fact I started experimenting with it as a teaching tool for Wikidata, but now have to reconsider [11:04:57] MisterSynergy: Also, the only reason I ran into PAWS is because it was the recommended way to learn Pywikibot [11:04:58] https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial [11:05:22] Yeah it is super useful and I am one of the most prolific users of it; however, I also received a block for my main account once in the past, for running bot code without a bot flag [11:05:23] That seemed to indicate this was a well supported tool [11:05:36] MisterSynergy: gulp! [11:06:30] Since then I think we have a crappy bot policy ;-) [11:06:52] it is difficult to distinguish between bot editing and batch editing anyways [11:07:19] MisterSynergy: was frowning on your bot-like edits on Wikidata or other wiki? [11:07:47] on Wikidata [11:08:13] hmm, i'm surprised [11:08:20] well not surprised... more disappointed :) [11:08:28] So, be careful when using PAWS for larger batches [11:08:37] I think it would be okay for smaller ones [11:08:38] yeah we have LOTS of batch stuff being done that's not "bot" per se [11:08:56] Hey fuzheado I'm not going anywhere on PAWS maintenance [11:09:21] https://www.wikidata.org/wiki/Special:Log?type=block&user=&page=User%3AMisterSynergy&wpdate=&tagfilter= [11:09:35] And although there is few resources, the cloud team does its best on emergencies [11:09:53] in case you are interested [11:10:18] chicocvenancio: thanks for all your work on this. "not going anywhere" = you are having a tough time, or you're going to continue doing work on it? [11:10:43] I'm going to continue working on it [11:10:47] chicocvenancio: Until this toolsdb crisis, I had no idea the cloud team actually has no obligation to support it [11:11:44] chicocvenancio: much appreciated. I think it has a very powerful role of straddling the world of editors and developers... I'm a fan of tools that can bring dev power to folks who don't typically code [11:11:54] The whole issue is that yuvipanda created it and maintained it as a pet project while on WMF [11:12:10] And when he left it was orphan [11:12:33] He kept maintaining it as a volunteer, but with very limited time [11:12:35] chicocvenancio: I wonder how to raise the significance of this on the priority list [11:13:13] Part of the problem is paws does not need much maintenance :) [11:14:04] When I was in the cloud team I saw it was a niche I could solve high impact bugs when I had free time [11:14:30] So I did. Once my contract was not renewed I kept at it [11:16:15] It's very hard to raise its priority at the moment because a lot of more basic stuff in the cloud world needs a lot of work [11:16:30] This toolsdb crisis reminds us of how fragile our whole environment can be. [11:16:40] Quickstatements seems to be still dead [11:17:06] Well, yes. On the other hand, it's very unusual for something like this to happen [11:17:57] Back on the toolserver days, outages and replag was a constant [11:18:40] chicocvenancio: true [11:19:35] Honestly, I was kind of astonished when I first started getting deeper into the dev stuff that we had one massive MariaDB installation where everyone was scribbling to shared databases :) [11:20:04] Simple, yes. Wiki-like, yes. Safety, no. :) [11:20:30] Lol, yeah [11:20:41] Cloud team agrees, [11:21:27] I kind of love it for the innocent AGF, IAR and SOFIXIT ethos that we have. But good god, we have critical services mingling with *my* crappy database tests! :) [11:21:36] But it's always hard to change things amateur develorpers depend on [11:21:45] (Not that I was responsible for any of this, but just saying) [11:23:11] Yes. It's a very known issue. Moving to everyone bringing their own database would be a step backwards, as no DBA would be involved [11:23:34] And Toolforge storage is too slow for it [11:23:39] chicocvenancio: It does remind me of the early 2003 era Wikipedia when crashes and resource exhaustion was not uncommon. the good old days [11:23:57] Lol, yeah, it's much closer to that [11:24:48] I think the best we can do is repeatedly communicate the cloud team could use more resources [11:25:43] They do an amazing job with what they have, outages are an uncommon occurrence. [11:25:46] chicocvenancio: it's amazing how much the things stay the same - I remember a lot of these same issues in the timesharing computing days of the 1980s with IBM mainframes and needing to be kind to others when running batch jobs (er, punchcard like jobs back then) [11:26:53] Yeah, part of the problem is the WMF cloud came before modern tools to manage some of the services provided [11:27:39] Leading to a lot of in-house development to get things to do the needed bits [11:29:23] Yeah I'm sympathetic – I used to be a devops type (before the term was even coined) back in the 1990s. The technology and tools have advanced so much that it's dizzying. I cannot even imagine having to stage and test all this [11:31:37] fuzheado: did you ever edit Wikidata using PAWS? [11:31:54] just found my usage statistics, and you do not appear there :-) [11:34:09] found 61 user accounts how have used PAWS until now, making 10.7M edits; 90% of these edits stem from four users with five user accounts [11:34:31] (numbers for Wikidata only) [11:35:13] MisterSynergy: Actually I don't think I've done any serious bot-like edits to Wikidata from PAWS, though I do a bunch of prototyping on it. I actually ran most of the bot-like edits from my own laptop in Python [11:35:26] https://quarry.wmflabs.org/query/28295 [11:35:31] okay I see [11:36:18] What I have done is to work with Commons and a machine learning image categorization engine on MS Azure, so you may see lots of activity there [11:37:10] fuzheado: you're aware of cloud vps? [11:38:07] chicocvenancio: aware of it, but have not delved into it. Most of the documentation around it seemed to steer people away from it unless you really need a heavy footprint of installed software that toolforge doesn't have already? [11:38:37] why is "fictional character" (Q95074) not listed as superclass in https://tools.wmflabs.org/sqid/#/view?id=Q15632617? The wikidata page shows Q15632617 as subclassOf Q95074 [11:39:15] MisterSynergy: so my PAWS activity is more to load up a toolsdb table with candidates for the Distribtued Game - Depicts. But that is currently borked because of the toolsdb problems [11:39:30] fuzheado: Well, yes. That's by design because the most common request could be done in Toolforge. [11:39:41] But if you're using ms azure [11:40:13] And machine learning is not an use case Toolforge is particularly good at [11:42:45] chicocvenancio: Ah yes. I don't think the Cloud VPS folks are ready to run a whole machine learning system yet. I'm collaborating with Microsoft Research on the computer vision part, so it's good for now but certainly it'd be more ideal for us to have our own open system for that. Have been in conversation with WMF ML people already about using a GPU in some configuration in the future [11:43:25] Ahh, you need gpu... [11:44:14] The software is MS's as well? [11:44:50] chicocvenancio: MisterSynergy - So far, the PAWS work has been: read Wikidata item that has an image; feed MS Azure machine learning system with a scaled down Commons image; read back a list of predicted image labels from their AI; insert these candidates into toolsdb for Distributed Game [11:45:32] chicocvenancio: MS Azure has a product called Custom Vision that is very similar to Google/AWS "AI as a service" offerings [11:46:11] chicocvenancio: I've worked with The Met Museum to help add training data for recognizing features in artworks, so MS is hosting that [11:47:11] chicocvenancio: If we get the capacity to do this on our servers (maybe after fiscal year starts?) that would be more ideal, certainly [11:47:59] You could probably create a project grant for the extra hardware [11:48:53] (for gpu) [11:49:35] chicocvenancio: My understanding is that WMF already has a GPU at their disposal, but integration of it into our infrastructure is nontrivial. I suppose I can understand that. I'm not sure we ever had a hardware specific requirement like this [11:50:34] I'm sure we don't have gpu in cloud, maybe in prod or analytics or something [11:51:06] chicocvenancio: Oh it's definitely not even plugged in and operating anywhere :) I know it's on the shelf somewhere, but in possession of WMF :) [11:51:22] Hahaha, :) [11:52:11] Maybe you can direct it to cloud with your use case [11:52:57] chicocvenancio: In general, the overall strategy of how our movement engages with more visually rich media is a huge question - we have neglected video for a long time. Sucky patent problems don't help. But even in the area of images, there hasn't been great movement. SDC is a great step. But we need a better strategy [11:53:37] Agree wholeheartedly [11:54:17] Fortunately, we are kind of leading by example with our AI work... it has helped bring attention to the need. And, kind of like Cunningham's Law, there's no better way to prod more innovation in our community than to say, "Since we don't have it. I have to work with Microsoft and their expensive, closed, proprietary system..." [11:54:54] LOL [11:56:14] chicocvenancio: https://outreach.wikimedia.org/wiki/GLAM/Newsletter/January_2019/Contents/USA_report [11:56:39] "Andrew worked with Jennie Choi, The Met's General Manager of Collection Information and Nina Diamond, Managing Editor and Producer along with Microsoft Researchers Patrick Buehler, J.S. Tan and Sam Kazemi Nafchi to train a machine learning model on Microsoft Azure that could predict labels for artworks." [11:57:20] The reality is, the software for doing such predictions is nearly commodity - it's not hard or mysterious. It's just that to perform at scale, we need something cloud based and MS was a partner in doing that [12:02:07] Is there a way to get labels in the output of wikidata's sparql query service that are linked to the wikidata page, without needing two columns (one for the item and another for the label)? [12:10:13] bennofs: you don't need the item? [12:10:31] well if it was linked to the page i don't need the Qxxxxx number [12:11:46] chicocvenancio: all I would use the Qxxxxx item for is clicking on the link :) [12:11:51] Remove it from the select then [12:12:01] but the labels aren't links [12:12:05] Ahhh [12:12:32] That sounds possible, but I'm not sure how to do it [12:15:08] I think it is not possible [12:15:24] at least in the WDQS web interface [12:15:53] it outputs pretty raw data, which cannot really be re-formatted (such as link to some item with its label as link text) [19:01:18] isn't the blocking of open proxies because of copyright issues? [19:02:02] wouldn't it be possible to unblock Wikidata for Tor users since it's public domain anyway? [19:24:46] vsucduxp[m]: I think it's more of a sock puppet problem. [19:31:15] Yeah