[02:33:00] I'm trying to scope out a way of mirroring Signpost content. I have a (technical, but non-Wikimedian) friend helping me...he's trying to use wget, but is concerned about violating anti-scraping standards. Can anybody help me figure out what are the boundaries of acceptable practice? [02:34:33] The idea is to come up with a system that can mirror the contents of en.wikipedia.org/wiki/WP:Wikipedia_Signpost to a Wordpress site, in order to generate an RSS feed. We publish more or less every two weeks; so this would be a matter of mirroring about 10 pages of Wikipedia content every two weeks. [02:39:28] Hi peteforsyth [02:39:43] Didn't the signpost already have a RSS feed? [02:41:14] Looks like it's broken. [02:41:30] We should probably set one up using FeaturedFeeds instead of trying to rely on an external site... [02:43:56] Hey-- yes, it's broken. [02:44:14] Tell me more about FeaturedFeeds? [02:45:47] I'm reading https://www.mediawiki.org/wiki/Extension:FeaturedFeeds [02:45:51] https://www.mediawiki.org/wiki/Extension:FeaturedFeeds#Configuration is a little technical, but we'd add some configuration and localization messages and we should get a RSS feed [02:46:02] Tech News has this set up too, so it should be possible for Signpost as well. [02:47:32] https://phabricator.wikimedia.org/T65596 [02:47:43] If you file a phabricator task, we can probably figure it out [02:48:39] Awesome. I'm not sure I know enough to give precise specs...do you think a phabricator ticket discussion is a reasonable place to sort that out? [02:50:11] yeah [03:04:05] @legoktm This looks great. I think it offers half of what I want to accomplish -- which is fantastic. However, I'm wondering about how it might fit with the other half... [03:04:46] Machine-readable RSS is huge, and opens many possibilities; but I'm also interested in creating an instance of the Signpost that's nicely human-readable outside of Wikipedia. [03:05:42] Considering getting a separate domain, and having one Wordpress page for each section of the Signpost (plus one front page for each issue). [03:06:13] Dynamically loading images from enwp (rather than downloading them into the Wordpress install). [03:06:43] And setting a "canonical" link that identifies the on-wiki version as the canonical one. [03:07:37] The main goal of that would be to have something that is more accessible to non-Wikimedians, for when we have stories of wider interest. Does that seem like a worthwhile goal to you? Or does it make the project infinitely more complicated? [03:08:21] The main goal of that would be to have something that is more accessible to non-Wikimedians, for when we have stories of wider interest. Does that seem like a worthwhile goal to you? Or does it make the project infinitely more complicated? [03:09:39] My friend has created a basic JS based web app that will grab the front page, and each section page, via wget, and permit copy-pasting them into WordPress. (He was worried that using wget would trigger anti-scripting mechanisms.) [03:48:48] peteforsyth: why is the content on Wikipedia less-accessible compared to a random wordpress site? [04:46:14] peteforsyth: I'll tell you a secret that all the nasty scrapers already know. We don't really have any protections against that at all. If things get really really bad a root would block your IP at the ingress routers but you'd have to be scraping more than say 10x googlebot for that to happen in all likelihood. [04:59:52] @legoktm When non-Wikimedians see the sidebar and tabs, I would tend to doubt they think "ah, here is a news article." It's a subtle difference, but I believe an important one to a certain segment of our audience. I'd like to have a less-cluttered option. [05:00:22] @bd808, thanks much -- good to know, and you're secret is...somewhat...safe with me ;) [16:49:53] peteforsyth : we used to use a wordpress install on wikipediasignpost.com to generate such a feed, see e.g. https://web.archive.org/web/20120126055133/http://www.wikipediasignpost.com/blog [16:50:05] ...but i guess the solution outlined by legoktm is preferable [16:50:46] back then these blog posts were generated by a template on-wiki, and copied over manually [16:51:15] unfortunately the person holding the domain let the registration lapse at some point [16:51:50] @HaeB, I believe I missed some discussion while sleeping -- could you catch me up? (Maybe by email -- I won't be able to engage fully with this today.) [16:52:21] peteforsyth: i was solely referring to what you and lego had typed above [16:53:29] Ah, OK, thanks. [16:54:19] I knew about the old Wordpress site, but did not realize archive links existed -- very glad to know. [16:55:00] wiki-embed is a Wordpress extension that works rather well [16:55:26] There are also MediaWiki skins such as https://www.wikimedia.es/wiki/Portada [16:56:18] @HaeB, do you know who registered wikipediasignpost.com, and whether they still have the registration? (whois shows a private registration, so not easy to determine from there) [16:56:24] i saw "machine readable RSS" and just saying, if you want any feeds added to en.planet.wikimedia.org just let me know [16:57:41] Thanks mutante, will definitely keep that in mind. [16:58:13] Nemo_bis, good info, thanks. I believe we've looked at and ruled out wiki-embed, but I will double check, been looking at a number of tools. [16:58:16] peteforsyth: BTW, the template that generated the feed output / blog posts https://en.wikipedia.org/w/index.php?title=Wikipedia:Wikipedia_Signpost/Newsroom/Coordination&oldid=452399507#Blog_output_for_this_issue [16:58:38] (I have to step away from this discussion, sadly -- but welcome emails or talk page notes if further ideas emerge...) [16:58:51] the domain was registered by ral315, but as i said he let it lapse [16:59:04] now it appears to be held by a domain squatter [16:59:24] Thanks [18:01:58] hello all. Is the wikimedia foundation going to be a part of GSoC 2017 ? [18:07:45] qgil: hello sir. do you have any info about the question I have asked above ? [18:08:15] andre__: ^ [18:08:45] kaartic: yes, i believe so [18:08:55] (based on seeing a mail list thread) [18:08:59] [Wikitech-l] Introduction for Gsoc 2017 [18:10:09] "We don't have any information about GSoC 2017 yet but you could get an [18:10:09] impression by looking at the page for GSoC 2016 at [18:10:09] https://www.mediawiki.org/wiki/Google_Summer_of_Code_2016 [18:10:09] and https://www.mediawiki.org/wiki/Outreach_programs for general info [18:10:09] on workflows and expectations for GSoC, Outreachy and other programs." [18:10:43] kaartic: https://www.mediawiki.org/wiki/Google_Summer_of_Code_2017 [18:11:19] kaartic: We won't know until the end of February if we are picked by Google to be a GSoC partner, but yes we will be applying [18:11:19] kaartic: you might want to add your name there in "Who is interested" section, hope that helped [18:11:31] and it is likely that we will be accepted [18:12:07] mutante: Thanks for all the info. I'll see to them soon. [18:12:35] bd808: Thanks for your info too. [18:12:40] kaartic: I guess Wikimedia will apply. If any org will be part will be decided by Google. [18:13:09] Don't wait for GSoC though! Start contributing now :) [18:15:37] bd808: I am contributing very little currently and I will be trying to do my best in coming days. I wasn't actually waiting for GSoC. I thought I could suggest a project for GSoC. Don't know if it's that good. Could I ? [18:17:06] kaartic, you might want to ask / chat in #wikimedia-devrel (as well?) since this is where you can find our GSoC / Outreachy org admins [18:17:44] qgil: okay. I'll post my question there too. [18:17:58] qgil: Thanks for the info