[01:17:44] I'd like to create a MySQL query that is probably relatively simple to regular quarry users, but there may be pitfalls I'm not aware of, and I'll have to dig into the database structure before being able to do this, so I'm asking just in case someone has fun doing this or giving me tips which tables or commands to use. I'm a MySQL beginner. [01:17:44] From a table of all edits with a specific tag, I'd like to count per user and sort the result descending. Practical idea: Trying to find out who uses Huggle more frequently than me for fun and curiosity. [01:18:02] also, a good first training query, I guess. [01:19:10] change_tag and user tables, probably [01:19:18] * ToBeFree has a look at these [01:19:29] and change_tag_def possibly [01:19:53] need to get the ctd_id from change_tag_def for huggle [01:20:00] you can use that to query change_tag [01:20:04] ah! [01:20:08] hey thanks :) [01:20:25] you'll need to then join change_tag to... I guess revision to get the user/actor [01:20:58] then just do your order/grouping operations [01:21:17] https://www.mediawiki.org/wiki/Manual:Database_layout [01:21:24] staring at that image right now ^.^ [01:21:45] probably don't need to look at it speicfically, just drill down to the specific tables you need [01:21:55] Just note the big diagram is kinda out of date [01:22:36] oh, I see [01:22:37] it's for MW 1.28, we're on 1.34 now [01:23:04] found the pages about the tables you've mentioned [01:23:48] Just the only thing to be aware, it might end up being a slow query due to the number of rows and stuff, so it might struggle ot run on quarry [01:25:26] * ToBeFree nods [01:26:09] if I'm only interested in, say, the last x months, I guess I can limit the amount of rows I'm reading at the very beginning [01:26:37] yeah, if you filter by rev_timestamp [01:26:53] https://quarry.wmflabs.org/query/36907 [01:26:53] ah, wonderful. :) [01:27:10] oh cool :D [01:27:33] and works as an optimisation for your eventual query, as you don't need to join that table [01:28:33] https://quarry.wmflabs.org/query/36905 [01:28:48] 1860468 change_tag rows on enwiki [01:31:21] Biggest issue you might have is the ongoing actor migration stuff [01:31:41] oh, I've read something on phabricator about that, deprecation of old syntax iirc? [01:31:45] what's going on there? [01:31:53] user_id and user_text in many tables is being deprecated/replaced by actor fields (or joining tables, meaning revision_actor_temp is use) [01:32:00] It's to make things like renames trivial [01:32:10] ie if a user has 100,000 edits, we don't have to edit 100,000 rows [01:32:11] user renames? [01:32:12] ah [01:32:23] the sysop assistance for global rename problem [01:32:31] pretty much, yeah [01:33:01] I think labs has some views to make this a bit nicer, but I don't know offhand [01:33:19] * ToBeFree thinks for a moment [01:33:24] why is user_id a problem, even [01:33:44] that seems to be pretty permanent [01:33:55] anon users [01:34:18] 0 is used for all anon users, and then storing the IP in the user_text field [01:34:29] ouch :) [01:34:39] we store the user_id and the user name in the user_text... [01:34:55] Bare in mind... This has worked "okish" for... 15+ years? :P [01:34:59] ^.^ [01:35:08] never fix a half-broken system [01:36:45] so, the impact on my personal experiment here is that I should avoid using user_id and user_text, because these fields will go away from whichever tables I happen to find them in, thinking "oh nice, that's what I need"? [01:37:29] basically, yeah [01:37:31] okay [01:37:41] if it's something you're going to want to run in the future [01:37:54] * ToBeFree nods [01:38:20] I like code that doesn't get outdated too quickly; in the end, I'll probably add your first query as a subquery [01:39:00] (or not! I thought doing so would make "huggle" survive a tag number change, but actually using the ID would make huggle survive a name change. No idea which is more likely) [01:39:35] It probably won't change on enwiki, but it's probably different number on other wikis [01:39:42] interesting aspect [01:39:52] so adds some portability [01:49:06] hey, I'm getting somewhere I think. Interestingly, the query runs quickly despite my limit happening *after* the join. I think I may have a wrong understanding of how the server handles these operations [01:49:11] https://quarry.wmflabs.org/query/36908 [01:49:42] I thought it joins the tables, here a potentially long operation, then selects 100 from them [01:52:44] it could select 100 rows, then join those if it found how to do that [01:53:51] server-side optimization fascinates me [01:54:18] I guess in an ideal world the server always does that work and I don't need to care about performance when querying [01:55:14] well, nice, that means I can "just continue working" here; performance issues currently seem not to occur. I'm currently learning about the "actor" table [01:57:27] Be mindful of performance, but you shouldn't have to worry about it too much indeed [01:57:57] https://tools.wmflabs.org/sql-optimizer can be useful if you're running into issues [02:01:24] this seems to be a pretty cool tool, but it seems to fail with SHOW EXPLAIN failed: Unknown thread id: 25361269 [02:01:36] on the example query as well as mine [02:10:31] heh, now I think I have accidentally broken optimization. [02:10:31] WHERE ct_tag_id = 7 AND rev_timestamp BETWEEN UNIX_TIMESTAMP('2019-01-01 00:00:01') AND UNIX_TIMESTAMP('2019-02-01 00:00:01') [02:11:10] the limit doesn't help, the query takes a long time now. I guess this is because the huge table is being read from the beginning to the end [02:19:47] ToBeFree: They're not stored as unix timstamps [02:20:08] MariaDB [enwiki]> select rev_timestamp from revision limit 1\G [02:20:08] *************************** 1. row *************************** [02:20:08] rev_timestamp: 20010116200833 [02:20:08] 1 row in set (0.00 sec) [02:20:11] * ToBeFree blinks [02:20:49] BETWEEN '20190101000001' AND '20190201000001' [02:21:05] I have no idea how I managed to interpret 20150910111027 as a unix timestamp. :) [02:21:06] thanks :D [02:21:22] (I also had no idea that strings can be used in this way either) [02:21:34] it's fairly flexible with times and dates [02:21:38] YYYYMMDDHHMMSS [02:21:49] isn't there a specific date format for this in SQL? [02:22:00] Not really... [02:22:05] You can use the inbuilt time stuff... [02:22:11] You can store dates as strings... [02:22:12] You can... [02:25:06] ah okay [02:25:08] Huh. I just clicked "Explain" while the query is running, and it's listing "dependent subqueries" like filearchive, image and ipblocks [02:25:45] some kind of magic must be happening behind the scenes. [02:34:02] https://quarry.wmflabs.org/query/36909 [02:34:17] I think I have it, but I can't say for sure because it is still running. :) [03:17:49] heh. Query status: killed -- This query took longer than 30 minutes to execute and was killed. [03:29:41] well, thank you very much -- I'll have a look at performance optimization later [09:48:10] Is there a way to see the timestamp (~ x 5) format for each wiki? [11:24:38] McJill: it's defined in MessagesXx.php $dateFormats https://www.mediawiki.org/wiki/Manual:Date_formatting [15:23:18] A little delayed, but thanks Nemo_bis! Any clue how the specific format is selected or constructed for use in the signature? That's not super clear to me, especialy looking at something like Zh_hans or thinking about the appending of "(UTC)" or "(CEST)" [15:32:12] (or anyone, really) [15:49:01] is there a special process for updating “development guidelines” on mw.org? does it need to go through review or something? [15:49:11] (specifically, I’m thinking about adding a section to https://www.mediawiki.org/wiki/Manual:Coding_conventions/Database) [15:52:16] Lucas_WMDE: if its a major change, maybe proposing it on the talk page first is a good idea [15:54:24] not a change to anything currently on the page, an addition to hopefully reflect the status quo [15:54:29] summarizing an IRC discussion months ago ^^ [18:12:45] If anyone wants to lock https://test.wikipedia.org/wiki/Special:CentralAuth?target=Thisisatestforsomething then feel free to as I'm literally never going to use it again - was testing account creation [18:14:11] We don't normally lock people's test accounts [18:14:36] Well we don't lock accounts at all, that's stewards deal, but they don't lock test accounts without cause [18:14:40] generally [18:15:55] bawolff: yeah, some filter keeps triggering so I can't even put a note on saying THIS ACCOUNT WILL NEVER EDIT [18:16:04] on global userpgae [18:17:04] bawolff: do you know who isn't blocked by that filter [18:17:32] nope [18:17:50] bawolff: trying to check [18:21:25] bawolff: stews, T&S, bots, patroller and renamers apparently so I've asked in -stewards [20:11:21] hello [20:12:21] may I ask a question regarding accounts on WikiMedia/Wikipedia? [20:17:01] PeCeT_full, yes. [20:17:16] PeCeT_full: yes, what is it? [20:21:00] okay, so basically I registered an account on Polish Wikipedia back in 2006 and forgot my password for it [20:21:34] dunno if you could have adjusted your e-mail address for it back then but the lack of it prevents me from reminding it [20:22:39] I don't have the committed identify but the problem is I cannot register a new account like "PeCeT-full" or "PeCeTfull" because the registering system reports that the username "is too close to an already existing one" [20:23:24] PeCeT_full: are you checking the right inbox? Spam? Any errors? And what's your username? [20:23:52] whenever I try to "remind the password", it just says back the account doesn't have an adjusted e-mail address [20:24:11] so whatever e-mail address of mine I try to use for reminding it, it just returns the same message to me [20:24:25] PeCet_full: what username? [20:24:29] the username I'm talking about is "PeCeT full" [20:24:32] (save quotes) [20:24:49] it's literally PeCeT full [20:26:04] One moment [20:26:06] PeCeT_full, links always help! https://pl.wikipedia.org/wiki/Wikipedysta:PeCeT_full [20:26:20] the space is %20 [20:26:31] so I preferred to say it that way but I get you [20:28:15] PeCeT_full: What link are you using when resetting your password? [20:28:29] You should be able to use your username to reset it. [20:28:33] this one: https://pl.wikipedia.org/wiki/Specjalna:Wyczy%C5%9B%C4%87_has%C5%82o because there is no other way for me [20:29:05] there was no "linking" stuff back in 2006 [20:30:02] PeCet_full: Are you just not getting the email? [20:30:24] If so, then contact ca@wikimedia.org and they will have to look into it. [20:30:33] not at all [20:31:01] I'll just retry it and copy and paste the message I get onto the chat along with the English translation [20:31:23] but I've gotta wait first, apparently, because the system has registered too many attempts in it [20:32:06] btw, what does CA on the e-mail address stand for? [20:32:12] PeCet_full: You'll have to ask Trust & Safety via ca@wikimedia.org and they might be able to help you out by checking things in the dB [20:32:29] Honestly, I don't know why it's ca [20:32:34] I see [20:33:03] Tzatziki: still around? ^^ This will be one for you guys [20:33:28] PeCeT_full, if you didn't set an email on your old account (it is not a requirement to do so) then I'm afraid you will have to create a new account, using a more different username. See the details at https://en.wikipedia.org/wiki/Help:Logging_in#What_if_I_forget_the_password%3F [20:33:43] "Community Advocacy" [20:33:50] The former name of our team [20:34:06] It will be much quicker to just create a new account [20:34:11] Thanks tzatziki [20:34:13] But you can email us and we can have a look [20:34:40] yeah, got that eventually [20:35:32] I just wanted to ask if there could be a possibility to recreate a new account but using less complex methods, as in avoiding usernames like 'PeCeTfull1' or 'PeCeTfullOfficial' etc. [20:36:05] I will e-mail then, anyway [20:53:02] I've just sent the issue to CA [20:54:34] if someone's curious why I care about it - it's because I'd like to register an acc on WikiMedia and possibly upload some CC0 photos of mine in the future [20:54:47] and I'd like to have a proper identity for that [20:56:07] I'm just done with websites like Pixabay or other websites that host free-to-use-anywhere (apparently wannabe) public domain stock pictures [20:56:11] Cool, PeCeT_full, You should hear back within 48 hours from then unless tzatziki has a chance to while he's around. [20:56:26] sure, I'll stay patient from now on [20:56:47] PeCeT_full: You've been a pleasure [20:57:03] thanks for the help [20:57:21] No problem [21:04:12] have a nice day/night, everyone, I'm logging off [21:04:27] PeCeT_full: same to you [21:04:34] thanks