[02:33:00] <peteforsyth>	 I'm trying to scope out a way of mirroring Signpost content. I have a (technical, but non-Wikimedian) friend helping me...he's trying to use wget, but is concerned about violating anti-scraping standards. Can anybody help me figure out what are the boundaries of acceptable practice?
[02:34:33] <peteforsyth>	 The idea is to come up with a system that can mirror the contents of en.wikipedia.org/wiki/WP:Wikipedia_Signpost to a Wordpress site, in order to generate an RSS feed. We publish more or less every two weeks; so this would be a matter of mirroring about 10 pages of Wikipedia content every two weeks.
[02:39:28] <legoktm>	 Hi peteforsyth
[02:39:43] <legoktm>	 Didn't the signpost already have a RSS feed?
[02:41:14] <legoktm>	 Looks like it's broken.
[02:41:30] <legoktm>	 We should probably set one up using FeaturedFeeds instead of trying to rely on an external site...
[02:43:56] <peteforsyth>	 Hey-- yes, it's broken.
[02:44:14] <peteforsyth>	 Tell me more about FeaturedFeeds?
[02:45:47] <peteforsyth>	 I'm reading https://www.mediawiki.org/wiki/Extension:FeaturedFeeds
[02:45:51] <legoktm>	 https://www.mediawiki.org/wiki/Extension:FeaturedFeeds#Configuration is a little technical, but we'd add some configuration and localization messages and we should get a RSS feed
[02:46:02] <legoktm>	 Tech News has this set up too, so it should be possible for Signpost as well.
[02:47:32] <legoktm>	 https://phabricator.wikimedia.org/T65596
[02:47:43] <legoktm>	 If you file a phabricator task, we can probably figure it out
[02:48:39] <peteforsyth>	 Awesome. I'm not sure I know enough to give precise specs...do you think a phabricator ticket discussion is a reasonable place to sort that out?
[02:50:11] <legoktm>	 yeah
[03:04:05] <peteforsyth>	 @legoktm This looks great. I think it offers half of what I want to accomplish -- which is fantastic. However, I'm wondering about how it might fit with the other half...
[03:04:46] <peteforsyth>	 Machine-readable RSS is huge, and opens many possibilities; but I'm also interested in creating an instance of the Signpost that's nicely human-readable outside of Wikipedia.
[03:05:42] <peteforsyth>	 Considering getting a separate domain, and having one Wordpress page for each section of the Signpost (plus one front page for each issue).
[03:06:13] <peteforsyth>	 Dynamically loading images from enwp (rather than downloading them into the Wordpress install).
[03:06:43] <peteforsyth>	 And setting a "canonical" link that identifies the on-wiki version as the canonical one.
[03:07:37] <peteforsyth>	 The main goal of that would be to have something that is more accessible to non-Wikimedians, for when we have stories of wider interest. Does that seem like a worthwhile goal to you? Or does it make the project infinitely more complicated?
[03:08:21] <peteforsyth>	 The main goal of that would be to have something that is more accessible to non-Wikimedians, for when we have stories of wider interest. Does that seem like a worthwhile goal to you? Or does it make the project infinitely more complicated?
[03:09:39] <peteforsyth>	 My friend has created a basic JS based web app that will grab the front page, and each section page, via wget, and permit copy-pasting them into WordPress. (He was worried that using wget would trigger anti-scripting mechanisms.)
[03:48:48] <legoktm>	 peteforsyth: why is the content on Wikipedia less-accessible compared to a random wordpress site?
[04:46:14] <bd808>	 peteforsyth: I'll tell you a secret that all the nasty scrapers already know. We don't really have any protections against that at all. If things get really really bad a root would block your IP at the ingress routers but you'd have to be scraping more than say 10x googlebot for that to happen in all likelihood.
[04:59:52] <peteforsyth>	 @legoktm When non-Wikimedians see the sidebar and tabs, I would tend to doubt they think "ah, here is a news article." It's a subtle difference, but I believe an important one to a certain segment of our audience. I'd like to have a less-cluttered option.
[05:00:22] <peteforsyth>	 @bd808, thanks much -- good to know, and you're secret is...somewhat...safe with me ;)
[16:49:53] <HaeB>	 peteforsyth : we used to use a wordpress install on wikipediasignpost.com to generate such a feed, see e.g. https://web.archive.org/web/20120126055133/http://www.wikipediasignpost.com/blog
[16:50:05] <HaeB>	 ...but i guess the solution outlined by legoktm is preferable
[16:50:46] <HaeB>	 back then these blog posts were generated by a template on-wiki, and copied over manually
[16:51:15] <HaeB>	 unfortunately the person holding the domain let the registration lapse at some point
[16:51:50] <peteforsyth>	 @HaeB, I believe I missed some discussion while sleeping -- could you catch me up? (Maybe by email -- I won't be able to engage fully with this today.)
[16:52:21] <HaeB>	 peteforsyth: i was solely referring to what you and lego had typed above
[16:53:29] <peteforsyth>	 Ah, OK, thanks.
[16:54:19] <peteforsyth>	 I knew about the old Wordpress site, but did not realize archive links existed -- very glad to know.
[16:55:00] <Nemo_bis>	 wiki-embed is a Wordpress extension that works rather well
[16:55:26] <Nemo_bis>	 There are also MediaWiki skins such as https://www.wikimedia.es/wiki/Portada
[16:56:18] <peteforsyth>	 @HaeB, do you know who registered wikipediasignpost.com, and whether they still have the registration? (whois shows a private registration, so not easy to determine from there)
[16:56:24] <mutante>	 i saw "machine readable RSS" and just saying, if you want any feeds added to en.planet.wikimedia.org just let me know
[16:57:41] <peteforsyth>	 Thanks mutante, will definitely keep that in mind.
[16:58:13] <peteforsyth>	 Nemo_bis, good info, thanks. I believe we've looked at and ruled out wiki-embed, but I will double check, been looking at a number of tools.
[16:58:16] <HaeB>	 peteforsyth: BTW, the template that generated the feed output / blog posts https://en.wikipedia.org/w/index.php?title=Wikipedia:Wikipedia_Signpost/Newsroom/Coordination&oldid=452399507#Blog_output_for_this_issue
[16:58:38] <peteforsyth>	 (I have to step away from this discussion, sadly -- but welcome emails or talk page notes if further ideas emerge...)
[16:58:51] <HaeB>	 the domain was registered by ral315, but as i said he let it lapse
[16:59:04] <HaeB>	 now it appears to be held by a domain squatter
[16:59:24] <peteforsyth>	 Thanks
[18:01:58] <kaartic>	 hello all. Is the wikimedia foundation going to be a part of GSoC 2017 ? 
[18:07:45] <kaartic>	 qgil: hello sir. do you have any info about the question I have asked above ?
[18:08:15] <mutante>	 andre__: ^
[18:08:45] <mutante>	 kaartic: yes, i believe so
[18:08:55] <mutante>	 (based on seeing a mail list thread)
[18:08:59] <mutante>	 [Wikitech-l] Introduction for Gsoc 2017
[18:10:09] <mutante>	 "We don't have any information about GSoC 2017 yet but you could get an
[18:10:09] <mutante>	 impression by looking at the page for GSoC 2016 at
[18:10:09] <mutante>	 https://www.mediawiki.org/wiki/Google_Summer_of_Code_2016
[18:10:09] <mutante>	 and https://www.mediawiki.org/wiki/Outreach_programs for general info
[18:10:09] <mutante>	 on workflows and expectations for GSoC, Outreachy and other programs."
[18:10:43] <mutante>	 kaartic: https://www.mediawiki.org/wiki/Google_Summer_of_Code_2017
[18:11:19] <bd808>	 kaartic: We won't know until the end of February if we are picked by Google to be a GSoC partner, but yes we will be applying
[18:11:19] <mutante>	 kaartic: you might want to add your name there in "Who is interested" section, hope that helped
[18:11:31] <bd808>	 and it is likely that we will be accepted
[18:12:07] <kaartic>	 mutante: Thanks for all the info. I'll see to them soon.
[18:12:35] <kaartic>	 bd808: Thanks for your info too.
[18:12:40] <andre__>	 kaartic: I guess Wikimedia will apply. If any org will be part will be decided by Google.
[18:13:09] <bd808>	 Don't wait for GSoC though! Start contributing now :)
[18:15:37] <kaartic>	 bd808: I am contributing very little currently and I will be trying to do my best in coming days. I wasn't actually waiting for GSoC. I thought I could suggest a project for GSoC. Don't know if it's that good. Could I ?
[18:17:06] <qgil>	 kaartic, you might want to ask / chat in #wikimedia-devrel (as well?) since this is where you can find our GSoC / Outreachy org admins
[18:17:44] <kaartic>	 qgil: okay. I'll post my question there too.
[18:17:58] <kaartic>	 qgil: Thanks for the info