[08:35:47] did someone experience a recent problem while reading/editing the wikis? [08:48:04] mark, [[Qinyongr]], _fortis, _joe_, abartov, abbey___, abian, addshore, akoopal, AlexZ, AlF3, andre__, ankry, anomie, apergos, APexil, ashley, Athyria, aude, avar: [08:48:36] b0stik, badon, Barras, basile, bblack, bd808, bearND|afk, Betacommand: [08:48:47] brendan_campbell, brion, Bsadowski1, byron, c, camerin, Chenzw, Cherif_, Christian75, CKoerner_WMF, closedmouth, comets, CyberJacob, Daeghrefn, daenerys, Danny_B: [12:31:29] Is it possible to set a collation algorithm for lists the MediaWiki software generates, tables, etc? [12:32:32] I know there's $wgCategoryCollation, but that only affects categories. [12:33:01] Guest471114, do you have a list in mind? [12:35:05] Something like https://en.wikipedia.org/wiki/Special:LongPages ? [12:39:22] jynus: I noticed it here: https://sr.wikipedia.org/wiki/Википедија:Википројекат_Жена/Чланци. It seems like the cyrillic J is being parsed as a roman J and if you click on "Чланак" to sort them by name, it puts it at the bottom instead of between Д and К in that list. I'll try to find an actual list opposed to a table, but can that be fixed there? [12:40:03] table sorting? thats a jQuery module iirc [12:41:25] yes, that is "not mediawiki" [12:41:37] it is done on your browser [12:43:10] May be a library we bundle [12:43:43] well, sure he is executing a specific code we integrate, that is why I used the quotes [12:44:23] there seems to be some options for javascript internationalization [12:44:34] this is jquery.tablesorter [12:45:13] but it seems the support in general, across browsers is not great [12:45:38] Needs a bug in our phab, and likely upstreaming [12:46:07] Reedy: jquery.tablesorter is actually MediaWiki-specific [12:46:48] Guest471114: the collation here is really dumb, and it would be fairly difficult to change it, i'm afraid. (it just sorts by unicode codepoint or something) [12:48:23] Guest471114: although, hmm. you can apparently use mw.config.set('tableSorterCollation', …) to define your own rules. [12:48:51] ...sounds like.. good news, yes? [12:49:21] Guest471114: see the example at https://pl.wikipedia.org/wiki/MediaWiki:Common.js - this defines that 'ą' is sorted like 'azz', 'ć' is sorted like 'czz' etc. [12:49:28] well. relatively, good, i guess ;) [12:50:14] Oh. It bumps characters with carons down the list as well, I've just noticed. [12:50:59] yeah, it only really works for the English alphabet (or ASCII) [12:51:20] So, how does it know to sort Cyrillic up to a point? [12:51:50] There are only like.. 4-5 characters it gets wrong, which is interesting to see. [12:53:00] mostly accidentally, partially due to unicode committee's foresight ;) i think it sorts in this order: https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode#Basic_Cyrillic_alphabet [12:53:46] Yep, seems great for Russian, but not that great for Serbian. [12:56:07] Guest471114: https://en.wikipedia.org/wiki/Serbian_Cyrillic_alphabet#Modern_alphabet this is the right order? let me see if i can hack something up [12:57:30] MatmaRex: It is indeed. I'd appreciate it if you did it (though I'd have to ask you to do another one for Croatian after you're done with that as well if it's that much of a hassle :P) [13:00:47] Guest471114: croatian is latin-based, though, isn't it? [13:01:43] MatmaRex: Yes, but it also bumps characters with carons or special marks down the list (Č, Ć, Š, etc). [13:02:06] https://en.wikipedia.org/wiki/Gaj's_Latin_alphabet#Letters this is right? [13:02:16] I'm guessing that's why the Polish Wiki has ć in there. [13:02:21] MatmaRex: Yes, it is. :) [13:02:26] yeah, same issue [13:11:27] Guest471114: so i think this works for serbian. you can just paste the code at the end of MediaWiki:Common.js on the wiki (or to your own User:…/common.js for testing). http://pastebin.com/0pu7a36K [13:13:49] Guest471114: croatian is going to be worse, since i don't think the mw.config.set('tableSorterCollation') thingy will work for digraphs… :/ [13:14:10] Oh yeah. That might be an issue... [13:19:46] Guest471114: hmm, oh well. we can still fix the accented characters at least [13:20:16] Hm... with that new latin list, could I place it in front of the Cyrillic one for Serbian? [13:20:22] roman* [13:20:40] Since Serbian also uses Gaj's Latin alphabet in some articles. [13:22:40] croatian: http://pastebin.com/u5dnxgtM [13:23:24] Guest471114: yes, you can just merge the lists (pay attention to commas) [13:23:44] Thanks. [13:23:53] I guess I'm SOL when it comes to digraphs. [13:24:49] (like this: http://pastebin.com/MCfCGH37) [13:25:29] yeah… i'm afraid so [13:26:03] Ah, I've noticed another thing. [13:27:08] What do these kind of pages use for sorting? https://sr.wikipedia.org/w/index.php?title=Special:WithoutInterwiki&limit=50&offset=2450 [13:28:22] Guest17606: they also sort by unicode code point, and here it can't be overridden :( there's a bug about it: https://phabricator.wikimedia.org/T32753 [13:29:08] oh. it's been open since nov 22, 2014. damn. [13:29:29] probably longer than that [13:29:39] it was imported from bugzilla (the previous bug tracker) [13:30:15] oh well. might as well sub to it. [13:31:42] Guest471114: want to file a bug for those digraphs in tablesorter? that would probably be rather easy to fix [13:32:07] That'd be good, yeah. [13:32:34] (and we'd be able to map 'dj' to 'dzzzz' or something to make it sort correctly 99.9% of the time, like we do with 'č' to 'czz' and others) [13:33:25] (sorry, ugh, 'lj' to 'lzzzz', and 'nj' to 'nzzzz'. got my languages mixed ;) ) [13:34:55] How do I report it, though? Do I open a phab task? [13:35:05] Yeah, that'd be best [13:44:40] Hm, well, not sure how to title it or describe what needs to be corrected. Maybe it'd be better if MatmaRex or someone who has done it before reports it. [13:44:46] @seen issara [13:44:46] Josve05a: I have never seen issara [13:45:11] @seen isarra [13:45:11] Josve05a: Last time I saw Isarra they were changing the nickname to Athyria and Athyria is still in the channel because he quitted the network 1.13:50:59.0790260 ago. The nick change was done in #wikimedia-collaboration at 7/12/2016 12:20:13 AM (1d13h24m58s ago) [13:45:23] Guest17606: "Make sortable table collations support digraphs". or something. :) [13:45:52] I will let you know when I see isarra and I will deliver that message to them [13:45:52] @notify isarra Sorry...but this made me think of you https://youtu.be/HSu2pKMbqBI?t=503 [13:46:46] hmph, i got disconnected. [13:46:49] Guest471114: "Make sortable table collations support digraphs". or something. :) [13:48:11] MatmaRex_: What should I put under tags? What project would it be assigned to? [13:48:18] *should it [13:49:06] Guest471114: it's okay to file tasks without a project. someone will surely add one [13:49:16] Guest471114: although, i think we have a "tablesorter" project [13:51:35] we have, yeah [13:53:33] Guest471114: thanks :) [13:53:46] MatmaRex_: Nah, thank you. :) [14:53:29] Josve05a: Hey, I'm still here. Just hiiiiding. [14:55:05] Lol. Well at least your not named root [14:59:25] Not toodaaaay. [14:59:36] Actually I don't even know. [15:01:40] ...was it the bucket of sloths? [17:52:16] Elitre: I'm getting error on your test page you created [17:52:51] (Our servers are currently under maintenance or experiencing a technical problem) [17:53:36] 503 at https://www.mediawiki.org/wiki/Analytics/Wikistats/DumpReports/Future_per_report too [17:55:29] They're fixing it in -operations [18:01:05] Stryn: hope I didn't break the wikis :p [18:01:27] :D [22:05:18] jynus: TimStarling: We could continue a bit about the possible schema and migration. Though I'm also fine doing that async on Phab. [22:05:33] https://www.mediawiki.org/wiki/Requests_for_comment/image_and_oldimage_tables / https://www.mediawiki.org/wiki/Manual:Image_table / https://www.mediawiki.org/wiki/Manual:Oldimage_table [22:06:27] basically my point is, rather than create 2 new tables in paralel, slowly convert image into another [22:06:45] less painful [22:06:48] and faster [22:07:27] jynus: Do you know offhand which one is larger? [22:07:36] I imagine most files don't have more than 1 revision. [22:07:37] oldimage vs image on commons? [22:07:40] I was going to run SHOW TABLE STATUS [22:07:40] But we do have some files with a lot. [22:07:43] yes [22:07:48] I can do that [22:07:49] that is still the best way to find out such data, right? [22:08:09] yeah, if you could just run it on commons and pastebin it, that would be fine [22:08:13] actually, a pysical recount may be more intersting [22:08:21] allow me to do both [22:09:07] do you know the background? [22:09:08] I don't know, I think all my commons uploads have multiple revisions [22:09:24] the replication issue with commons? [22:09:45] I saw that there was an incident involving the filearchive table [22:09:56] but filearchive is not even addressed in the RFC yet [22:10:04] we do INSERT...SELECT on delete and undelete [22:10:19] highly dangerous [22:10:28] but yes, larger in scope [22:10:34] we do it for page deletion also, right? [22:11:23] probably, I cannot recall [22:11:30] I think it is less of a problem there [22:11:33] Yeah [22:11:33] because of the indexing [22:11:44] that's when an admin deletes (archives) a page or file. [22:11:46] one uses a file name [22:11:51] In case of file "delete", both happen. [22:11:54] and the other a page [22:12:02] id [22:12:07] I see [22:12:27] but it is a similar problem [22:12:46] whenver there is a difference in locking, autoincrements ids get gorrupted [22:12:58] filearchive has primary keys [22:12:59] interesting [22:13:00] (and I can create those while doing maintenance) [22:13:23] on a slave, generating (potentially) different ids [22:13:41] then as it is a deleted page, at some time in the future, on undelete, things break [22:13:47] but that is a different story [22:13:48] filearchive was brion's work I think, he was a bit better at DB design than Lee [22:15:32] so it is a bug involving replication of INSERT SELECT while allocating autoincrement IDs? [22:15:43] turning image into the new file table should be straight forward. It's essentially adding a primary key and (eventually) dropping old columns. [22:15:44] to be fair, this is not something that normally happens, but you know about my other proposal [22:15:56] I'm more concerned about the image revision table. [22:15:56] migraring to strict sql and row based replication [22:16:06] which would solve the issue technically [22:16:21] so it is a combination of devel + config (that I cannot change right now) [22:16:45] https://phabricator.wikimedia.org/P3426 [22:16:48] gtg. Catch up on Phabricator. Can one of you summarise the proposed schema changes there? [22:16:49] so this is the master^ [22:17:22] I cannot give you pysical sizes because it has a weird filesystem organization [22:17:46] oldimage is only 5 million rows [22:17:47] strawman: rename oldimage to imagerevision, insert all rows from "image" into imagerevision [22:18:51] (sorry, I cannot say anything without looking at the specific fields of each of the 4 tables, but that would be the spirit) [22:19:36] convert 1, generate the other [22:20:23] but you would need to do it on an unpooled slave or something, because while you are running the old code, it has to see the oldimage table without the extra revisions [22:20:30] otherwise duplicates will be shown in the file description pages [22:20:35] if imagerevision is narrower, more metadat-y, I would do it in the other direction [22:21:51] suppose you first added a column to oldimage which told you whether the row was a current revision or not [22:22:31] then you could have temporary code which filtered out the current revisions from the oldimage table while migration is in progress [22:23:03] or do you think this is unnecessary? would you just do it on unpooled slaves somehow? [22:23:09] I think there are several posibilities, we just have to choose the easiest, both in data movement and code [22:23:41] I do not have my mind on a specific one, I would have to reasearch the fields a bit [22:23:49] yeah, considering image is much larger, it would make sense to rename it to imagerevision and insert all the rows from oldimage into it [22:23:51] (or some else acn= [22:24:20] then I would always chose the one with less data movement and many of you the easier on code [22:24:29] and then we would reach an agreement :-) [22:24:57] and again you could have temporary code to filter out the old revisions, thus allowing you to run the migration script on pooled slaves [22:25:01] (it is just that I got alarmed when you said you wanted to duplicate the tables) [22:25:19] yeah, not duplicate, just proxy [22:25:24] that is ok [22:25:29] aslo [22:25:35] I think we have not used it ever [22:25:39] but with views [22:25:43] * Krinkle returned [22:25:58] there are tricks to do to maintain b/c during a transition [22:26:27] yeah, we've never used it in production to my knowledge, but views were used on the old toolserver to do similar things [22:26:51] we still use them for row-based-filtering on tools-db here [22:27:23] not for production, but it can be useful for a short period of time [22:28:15] I can research that part and follow up with a more specific proposal to discuss offline [22:28:29] basically following your directions [22:28:43] that would be useful, thanks [22:29:02] (I just was not prepared for the field-by-field analysis) [22:29:19] I'm wary of just ignoring migration and designing the DB from scratch, because in my experience, migration is the largest part of this sort of project [22:30:07] we don't want to create massive amounts of DBA work unnecessarily [22:30:17] again, it is not discarding you, it is just that it hurts when sometimes I heard saying "we probably cannot do that" [22:30:29] sure [22:30:36] there is a high chance that we can [22:30:44] and I will complain if it is hard! [22:30:56] (you do not need to tell me, I already do) [22:31:23] to be fair [22:31:34] the image table in commons is not easy task [22:31:48] not because it is large, which it is [22:32:13] but because it is very used, which means an online-alter table is almost impossible [22:33:31] once that is taken into account- we have more flexibility- we will group several alter in one and failover the full datacenter [22:33:59] that's an interesting option, we couldn't do that in the olden days ;) [22:34:36] still have to worry about new changes happening while migration is in progress [22:34:46] yes [22:35:15] usually, it means 3 code deployments [22:35:42] the current one, a transitionall that fills-in both format ,and the final one [22:36:35] I haven't even checked if that is possible, and that is the part where the DBA may not be very helpful [22:37:57] my proposal is, I research the field by field migration/table transformation [22:38:13] then I bounce it back for, not only your revision [22:38:25] but to see how that would be possible on code, if it could [22:38:50] we can work on the ticket, ok with that? [22:40:34] sounds good [22:41:09] regarding that replication bug, would it help to get rid of the autoincrement in the filearchive table? [22:41:17] making it more like revision/archive? [22:42:46] the "patch" solution for that, probably should be, in my opinion, eliminating the INSERT...SELECTS [22:43:06] and doing SELECTS + INSERT on application side [22:43:31] that way we know both on master and slaves that we are inserting fields manually specified [22:43:43] I know that sounds weird and can be innefficient [22:43:51] but is is more secure [22:44:09] maybe there is a proposal for that in the revision table [22:44:16] as a more general solution [22:44:29] "do not delete rows" [22:44:59] but that is a very general statement [22:45:14] that cannot and should not be followed every time [22:45:39] we have rev_deleted already, it's just that we don't use it for page deletion, only for deleting individual revisions [22:45:42] however, 90% ef the replication problems I found, were "deletions and undeletions of things that are moved of table" [22:45:47] I know [22:46:14] I think getting rid of INSERT SELECT would fix those [22:47:23] or, row based replication, whatever can be done earlier [22:47:56] which is also blocked on PKs on all tables and some other mediawiki-dependent issues [22:48:36] (I think there is no perfect solution, just pushing in the good direction, and slowly, getting better) [22:48:37] I don't think there's much developer time allocated to fixing this sort of thing on the mediawiki side [22:48:44] oh [22:48:46] not at all [22:48:49] none I would say [22:48:53] :-) [22:48:56] progress seems to be measured in decades [22:49:12] I discussed this with many people [22:49:24] it is more interesting to create new features [22:49:29] than to do maintenance [22:49:29] yup [22:49:47] but if ops (or performance, or security, etc.) [22:49:57] don't just blame it on developers, it's been a deliberate decision of management to ignore mediawiki core maintenance [22:49:59] are the only ones doing maintenance, that is a bad idea [22:50:08] sure [22:50:14] even the people who really care about MW core maintenance have been discouraged from doing so [22:50:22] I actually do not blame it on anyone [22:50:30] I was saying [22:50:35] that we are really bad at that [22:50:58] I was self-loathing [22:51:25] maybe in the next year I will have more time for this sort of thing, we will see [22:51:36] I think that the speed of change is too high, actually [22:51:43] not two low [22:52:15] too many deployments, too many changes- I would like to have time to breath and check what has been done [22:52:55] I also think that mediawiki-core is not a good idea- correction, it is a very good idea [22:53:20] but it can be the place where all maintenance can die, and that would be a problem [22:53:44] mediawiki needs a strong core, and we all shoud participate on it [22:55:23] so I am not denying structure problems, but I think despite it, we could do things better (and I include myself in it)