[03:00:54] Hello, I'm writing a bot to collect image data (licenses, direct links, etc.) from Wikimedia Commons. I'm not planning to make any edits, but I'd like to be able to use the larger bot-size limit (5000) on generator requests. Is it appropriate to apply for bot permission in this case? [03:01:34] I ask because the page about getting bot permission seems to assume your bot will be editing the wiki, which mine isn't. [03:02:14] You're welcome to ask if you think it would be useful, but be prepared to justify why you need that rate [03:03:20] I'd suggest developing with the normal limit, and seeing if it's actually worth going up to 5000 later [03:03:51] And if you didn't know, the Wikimedia projects have a user-agent policy: https://meta.wikimedia.org/wiki/User-Agent_policy [03:27:22] AntiComposite: Okay, thanks for the heads up. I've been experimenting with the normal limit of 500 a bit, and it seems emprically to get me about 1 day of data per 30 minutes => approx. 50 days of data per day. I need to get up to around 200 days of data per day, so that it becomes feasible to keep things updated (e.g., global usage) for the collection over time. [03:29:26] It might be faster to get some of the data from the [[meta:Data dumps]]. They don't have global usage, but they would have author, license, source, etc [03:30:17] However, that means time parsing the dumps into useful data and dealing with the slow release rate. [03:30:20] https://meta.wikimedia.org/wiki/Data_dumps [03:31:49] I'll read that as well, thanks! [16:27:44] Hi there. I hope I'm in the right place here to report this; if not, please let me know. I just noticed a small issue with one of Wikimedia's servers: 91.198.174.192 does not appear to accept HTTP connections. On two test machines where dyna.wikimedia.org resolves to that IP, this leads to requests of http://en.wikipedia.org/ to time out instead of being redirected to HTTPS. Command to reproduce: [16:27:50] `curl -sv -H 'Host: en.wikipedia.org' http://91.198.174.192/`. (Only IPv4 is affected; on the same two machines, dyna.wikimedia.org AAAA resolves to 2620:0:862:ed1a::1, which accepts HTTP connections and redirects to HTTPS as expected. HTTPS connections to the IPv4 also work as expected.) [16:37:02] JAA: yes, known issue since yesterday; doesn't affect many end-users because of HSTS preload; will be reverted today [16:37:24] cdanis: Good to hear, thanks. :-) [16:37:47] thanks for the report :) [16:38:45] Sure thing. [18:06:11] addshore: you around? [18:19:58] hi hauskatze [18:20:06] hi addshore [18:20:23] addshore: I was told you wrote the 'clear watchlist' feat [18:20:33] but it looks there's a problem with it, cfr. T243449 [18:20:33] T243449: Please clear my watchlist on commons - https://phabricator.wikimedia.org/T243449 [18:21:02] it looks like it is not calling the jobqueue nor batching the query, so in cases of large watchlists, the feat breaks [18:21:09] and the error message is a varnish error [18:21:30] ie: those Wikipedia is down messages [18:21:52] do you think you could take a look when you're free? [18:40:41] hauskatze: i just looked at the code a bit and I think I can see what needs fixing [18:40:57] addshore: thanks, that was quick :) [18:42:07] * hauskatze bbl [18:42:07] Gonna go get some food and then will write a patch