[14:00:18] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @amir1 & @ottomata - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [14:01:18] :) [14:26:37] did someone break sorting [14:26:45] oh it's fixed [14:50:17] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @amir1 & @ottomata - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:00:20] o/ [15:00:56] Hello and welcome to the TAIM, ottomata and I are your hosts this evening/morning/whatever and we wish you a pleasant meeting [15:01:26] Hello, thanks and the same for you :) [15:02:16] hello! [15:02:26] Hi [15:02:53] Do we have a specif topic for this meeting? [15:02:59] specific* [15:03:35] sario528: not so far, you can ask any questions you like [15:08:58] Ok, it seems nobody has questions so I'll make an easy one (I think) [15:09:15] sure [15:09:33] It's more a thing to solve than a question, I think [15:09:46] The Spanish locale isn't available in the toolforge bastions [15:10:09] (I write locale -a and I can see others but es_ES isn't there) [15:10:22] And to have proper sorting I have to use fr_FR at the moment [15:11:20] jem: technically, fixing it should not be hard, you can make a change in puppet to get it installed (I can find the exact module for you) [15:11:48] but before moving forward I think it's better to make a phabricator ticket (if doesn't exist) and discuss it there [15:12:10] Hum [15:12:24] I don't think it can create any doubt [15:12:44] Spanish for sure isn't less than Portuguese or French :) [15:13:00] Or Norwegian or Polish or Romanian [15:13:40] Was there a discussion about adding all of them? :) [15:14:06] I have no idea, bstorm_ might be able to answer that [15:18:25] Ok, there is no hurry at the moment so I'll be checking [15:19:36] jem: In general, I would recommand making a phabricator ticket so a) It's tracked down b) seen by people who know more [15:20:23] I understand it's important and useful but maybe it's not required for every little detail [15:20:42] (Or maybe this isn't as little as I think, but we don't know yet) [15:22:01] If all locale additions are in fact tracked down, then it's clear, that's the right way [15:22:19] An audit trail is always good - "why did we do this?" [15:25:04] There are now many JS errors on Commons [15:25:48] I was told the community is responsible for fixing that, but I don't see anyone willing or being able to do it. [15:27:25] yannf: There's also some errors due to https://phabricator.wikimedia.org/T227504 [15:29:52] Hello! Not sure if this is the right channel/time to ask. I am trying to build a pure text version of the current articles in the english wikipedia. It seems to be quite some work: Parse the xml dumps and then parse the wiki format. Is there a more elegant way? [15:31:07] no_gravity: if you're not asking for all articles, the rest API endpoint would be a good place to hit [15:31:33] https://en.wiktionary.org/api/rest_v1/#/Page_content/get_page_definition_term [15:31:37] Amir1: In fact, this api endpoint produces the exact output I need: https://en.wikipedia.org/w/api.php?action=query&format=json&prop=extracts&explaintext&exlimit=1&exsectionformat=plain&pageids=600744 [15:32:13] Amir1: I am just not sure how often I can call it. I would be fine to call it once every second for 3 months if that is ok. [15:32:41] It is to teach an NLP algorithm I am working on. [15:33:25] no_gravity: it should not be a problem, specially if you follow the UA policy [15:33:38] https://meta.wikimedia.org/wiki/User-Agent_policy [15:34:21] Amir1: Great, I will try that then. [15:34:30] Because these are supposed to be cached in the ParserCache, so it's not a big deal on server side [15:35:20] Amir1: "these" = the api calls and the responses? [15:35:25] yup [15:35:52] Is it cached on different levels? Because I would think nobody else will do the *exact* same api call. [15:36:42] yes, there are cached in different levels [15:37:00] Varnish (VCL) and then we have ParserCache [15:37:04] Nice. Ok. Then I will try that. [15:38:26] Amir1: I just found https://phabricator.wikimedia.org/T223777 about ca_ES also missing, but the original title was "UTF-8 support in toolforge, after migration to stretch", which would fit perfectly [15:39:21] bd808 changed it but maybe a generic title is possible, for sure there are a lot of missing languages and the problem is really one, not many [15:39:36] Amir1, I got errors only when I edit [15:40:58] yann_: oh, maybe you should talk to ComTech? [15:41:11] jem: but we migrated to stretch already [15:41:14] Anyway.. almost two months open now isn't encouraging [15:41:50] Amir1: I understand the problem is in stretch nodes [15:43:12] that would take long time to be fixed, the new release (buster) just got out ten days ago [15:43:54] But bastion nodes aren't migrating again soon... or are they? [15:44:01] jem: do you have this issue in the grid as well? [15:45:00] in general, use of bastion should stay at minimum [15:47:37] yann_: can you give more details about the errors you're getting? [15:51:04] Amir1: The node I was in is login-stretch = tools-sgebastion-07, but let me check others [16:03:24] any reason why spliting up database load across multiple replicas would share CPU but not connections? [16:12:11] c, not sure I understand your question [16:14:27] ottomata: okay, this scenario I have 1 master and 5 replica databases for a single wiki with a load balance of 0, 1, 1, 1, 1, 1. During a load spike (editing a high use template for example), all dbs share the increased CPU load but not the increase in db connections. In this specific scenario all the extra connections went to the master instance [16:15:41] c, all the writes have to go the master directly, and then are replicated, right? [16:15:52] so mw will have to apply all the template updates directly to the master [16:16:14] the replicas only help balance read connections/load [16:16:40] this is assuming that all the connections are writes, there have been scenarios before where connections were spread across the board [16:22:13] oops, everybody's gone [16:22:23] sorry, it was dinner time [16:26:50] we use Amazon RDS but perhaps I should find a way to better analyze db traffic, their metrics for type of connections show me counts over a time period but don't get more forensic than that https://imgur.com/a/Ja1xtvY with regard to trying to come up with an ideal and cost effective balance