[08:16:54] Hi. In the mediawiki api, the action=parse renders the main content of a page (without the navigation skin) to HTML. Is it possible to send such a query for multiple pages on a wiki at the same time? [08:20:24] I'd like to parse a large amount of pages this way. [08:38:00] mediawiki.org just went down? [08:38:02] Request: GET http://www.mediawiki.org/wiki/Code_of_conduct_for_technical_spaces/Draft, from 10.20.0.176 via cp1067 cp1067 ([10.64.0.104]:3128), Varnish XID 836871031 [08:38:02] Forwarded for: 2003:5b:ee50:69b:39fa:e953:89e3:b705, 10.20.0.175, 10.20.0.175, 10.20.0.176 [08:38:02] Error: 503, Service Unavailable at Mon, 28 Sep 2015 08:37:38 GMT [08:39:16] English Wikipedia too mark [08:39:25] akoopal, [08:39:31] I mean akosiaris [08:40:17] I'm getting 503's [08:40:31] qgil: we already got pages, working on it [08:40:35] ok [08:41:00] Whats wrong worth the servers [08:42:49] there's a ddos [08:42:54] the operations folks are on it [08:43:06] who in their right minds would DDoS Wikipedia [08:43:51] recovering now... [08:43:57] yep [08:44:11] Thank you! [08:44:13] thedj, are you with Wikimedia? [08:44:42] KDDLB: long time volunteer [08:44:49] great [08:46:07] b_jonas: no, not at the same time. also note that you need page specific css/js sometimes, you will have to load those from the attached metadata [08:47:23] thedj: I don't think I need that. Let me explain what I want to do. [08:48:40] I'd like to download about a hundred thousand entries of English words from en.wiktionary, and extract the all pronunciation given in them, if any, together with the dialect (UK or US the most often; any of IPA or SAMPA or enPR, I'd try to transliterate between them). [08:49:17] These are marked up properly in the HTML with element classes, so I can find them (I'll have to find the English entry from the headings first), [08:49:50] but there are templates involved so it would be much easier to get them from the HTML than from the page source, for the page source can use different macros. [08:50:05] You can also retrieve html from restbase api: http://rest.wikimedia.org/en.wikipedia.org/v1/page/html/Wikipedia/683015905 [08:52:56] b_jonas: sounds like a simple scraping bot. as long as you program the bot to keep a reasonable pace (see the bot etiquette) [08:53:57] thedj: sure, I can certainly do it, but if it was possible multiple pages per HTTP request, that would mean less strain on the server. [08:55:10] Also, as my word list is case-insensitive, I'll have to download a list of page titles from the dumps, and find all page titles with matching case (and all pages that exist in first place, though most of these will exist on en.wiktionary because they're English words). [10:17:24] b_jonas: the difference in strain would be minimal. [10:17:56] 'parse' is extremely more ineffecient for the servers then the connection. [10:20:30] thedj: ok, thanks [10:20:56] why is it inefficient? wouldn't it just re-use cached formatted pages most of the time? [10:21:08] I'm parsing pages, not text I supply [10:21:29] though... I could parse text that transcribes twenty pages [10:21:52] but that wouldn't use the cache so it would be a lot less efficient, I don't want to do that [10:23:48] it would hit cache most of the time, but by far not always [10:24:19] and when ever a page needs to be reparsed, that is very expensive [10:24:47] also, the steps the dataserver needs to do to simply retrieve all the information and format the json output is also 'expensive' [10:25:05] more expensive then a normal HTML page view [10:29:14] but isn't it the webserver that does the json formatting part? [10:29:42] If the normal HTML view is that much cheaper, I could just try to download that instead. [11:49:05] [[Tech]]; Crosstor; /* searching */ new section; https://meta.wikimedia.org/w/index.php?diff=13844636&oldid=13809758&rcid=6836958 [11:51:16] [[Tech]]; Crosstor; /* searching */; https://meta.wikimedia.org/w/index.php?diff=13844665&oldid=13844636&rcid=6836961 [11:53:12] [[Tech]]; Crosstor; /* searching */; https://meta.wikimedia.org/w/index.php?diff=13844687&oldid=13844665&rcid=6836968 [11:55:06] [[Tech]]; Crosstor; /* searching */; https://meta.wikimedia.org/w/index.php?diff=13844711&oldid=13844687&rcid=6836976 [11:57:57] [[Tech]]; Crosstor; /* searching */; https://meta.wikimedia.org/w/index.php?diff=13844751&oldid=13844711&rcid=6836980 [17:02:51] qgil: I'm probably going to merge https://www.mediawiki.org/wiki/User:RobLa-WMF/WikiDev16 into https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit_2016 very soon [17:03:14] that is, assuming I don't get distracted by something else :-) [17:03:55] robla, ok! :) [17:05:54] in looking at it, I think I'm going to change "focus areas" subpage into "scope" [17:12:13] better to have a consistent vocabulary, yes [17:27:44] qgil: https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit_2016/Scope [17:28:28] robla, a clean merge, thank you [17:28:40] "Global, duh." [17:30:44] ori: wut? :-) [17:31:05] I saw "Scope" and "MediaWiki" [17:32:00] we're referring to "Scope" the mouthwash, not anything applied to MW. Sorry for the confusion :-) [17:33:51] robla: Oh! In that case: Cool peppermint! [17:35:48] guillom may remember when I was a WMF-noob and several of us had the bikeshed debate over which wiki Wikimedia's development activity should live. [17:36:22] * guillom does remember. [17:36:32] as if it was yesterday [17:36:38] robla: It's all easier now: PHABRICATOR ALL THE THINGS!!! [17:37:07] does someone of you know why the serves were don today around 11am [17:37:22] * guillom doesn't. [17:38:13] * robla recently explained (this weekend) where the "ALL THE THINGS" meme came from to his daughter [17:39:05] ApolloWissen: 11am what timezone? [17:39:46] https://wikitech.wikimedia.org/wiki/Server_Admin_Log is usually a good starting point to find information about outages. [17:39:55] 11am gmt+1 [17:47:37] ApolloWissen: I don't know this either, but I'll still ask a question or two that might cause someone else to chime in who knows what was going on at that time. What do you mean by "down"? [17:48:57] well it said sth like: The servers currently do not work. We we try to fix the problem as fast as possible [17:49:35] ApolloWissen: what were you trying to do, and which servers said that? [17:50:29] I tried to reach de.wikipedia.org/wiki/ and meta.wikimedia.org/wiki/ [17:53:52] ApolloWissen, I'm only aware of connectivity issues around 08:47 UTC this morning but I don't have any details yet either as I wasn't around on IRC [17:54:10] at some point I expect an entry at https://wikitech.wikimedia.org/wiki/Incident_documentation [17:55:49] well dont know. but in fact its not such a big problem. If someone of you finds out, great, if not not bad at all [21:40:09] [[Tech]]; 5.200.97.46; /* مسکسسکس */ new section; https://meta.wikimedia.org/w/index.php?diff=13855065&oldid=13844751&rcid=6840117 [21:44:46] [[Tech]]; Matiia; Undo revision 13855065 by [[Special:Contributions/5.200.97.46|5.200.97.46]] ([[User talk:5.200.97.46|talk]]); https://meta.wikimedia.org/w/index.php?diff=13855111&oldid=13855065&rcid=6840125