[18:05:40] Hm, anyone know if there is already a supported YAML parser for mediawiki? [18:06:10] i was using symfony/yaml, but if there is another (already security reviewed) one (pecl yaml_parse?) that I can use [18:06:14] i'd be happy to use it [18:17:28] ottomata: Translate parses YAML [18:19:24] ah ha and it uses https://github.com/mustangostang/spyc [18:19:26] will try that [18:21:08] We had to change at least once due to some type misunderstanding or whatever, can't remember [18:21:36] i used that some years back. it was somewhat broken at the time, but i think it was sort of the standard how-to-yaml-in-php answer then. [18:27:02] i'm only using it for a test case, so anything that works and will run in jenkins hhvm tests is fine [18:29:09] ottomata: Be careful with parsing YAML. YAML is a scary format [18:29:30] Most libraries have a safe_parse method you need to use instead of the default parsing method [18:30:47] aye, it should be ok here, i'm parsing a single hardcoded fixutre path in a test case [21:22:22] hauskatze: There is still the third item to prepare: "add the securepollglobal computed version" [21:22:26] at https://phabricator.wikimedia.org/T173342 [21:22:36] Would you remember what this means? [21:23:24] ah yes ensure securepollglobal dblist is expanded from the formula to a computed list of databases [21:24:10] dereckson: old ticket :) [21:24:27] but if Performance is still against the proliferation of computed dblists... [21:24:33] Yes I'm browsing old tickets I've still assigned to see if there is some follow-up I can do [21:24:36] yes [21:24:43] and so the idea was to expand it automatically [21:25:00] also, is that dblist still needed? SecurePoll votes now happen at vote.wikimedia [21:25:12] even board ones [21:25:27] so... we should perhaps evaluate if that dblist is still needed [21:25:30] nowikidatadescriptiontaglines-computed.dblist wikipedia-english-computed.dblist [21:25:33] ah we only have two [21:26:07] and a special case with flow-labs.dblist [21:28:28] commit 34eac5fe65dddc76accdf497122d8ebacce79cda [21:28:28] Author: James Alexander [21:28:28] Date: Mon Nov 10 16:02:54 2014 -0800 [21:28:55] Yes the idea was to list wikis with SecurePoll extension [21:30:05] then I'd go with content projects only [21:30:15] otherwise it's just duplicating all.dblist [21:30:47] wikinews.dblist + wikiversity.dblist + wikisource.dblist + wikibooks.dblist + wiktionary.dblist + special.dblist [21:31:06] - loginwiki [21:31:16] and wikipedias perhaps [21:31:23] - zerowiki [21:31:35] - labswiki [21:31:37] I'd remove special.dblist and zerowiki is thankfully dissapearing soon [21:31:54] votewiki is in special :p [21:39:00] dereckson: is still angwiki with so many broken namespaces (namespaceDupes I mean)? [21:39:27] en.wikisource, hiwikiversity, sawikiquote, satwiki, shnwiki, yuewiktionary, zhwikiversity [21:39:34] The content projects without the extension ^ [21:40:33] so perhaps securepoll.dblist should just contain votewiki [21:40:43] it is only being used there [21:40:48] iirc [21:42:41] how can we check if some wiki uses it? [21:42:59] do we need to archive content of past votes? [21:46:39] I've opened https://phabricator.wikimedia.org/T218162 [21:46:52] so angwiki [21:52:18] https://phabricator.wikimedia.org/P8189 [21:52:23] we can solve most [21:58:22] Is it possible to find out the number of unblocked, human active confirmed English Wikipedia editors? [21:59:58] yes [22:00:10] if you define what you consider active [22:00:29] (and assuming that human is defined, too, probably as not being flagged as a bot) [22:01:25] Human = no bot flag. Active = at least 1 edit in last 30 days [22:01:46] hauskatze: if I run the script, who can we contact to check the merged pages? [22:02:43] and confirmed = at least 4 days old with 100 edits, right? [22:03:02] dereckson: angwiki doesn't have any active community I think [22:03:26] 4 days 10 edits Platonides (has the confirmed/autoconfirned right) [22:03:32] I know someone (Ælfgar) who can read ang at least [22:03:36] er, sorry [22:03:52] I mixed it with the edit count for tor edits [22:04:11] No problem Platonides [22:05:37] so, you want users at least 4 days old and more than 10 edits, with an edit in the last 30 days not blocked and not flagged as bot [22:05:43] in enwiki [22:06:59] Just the number of them Platonides, an actual list of them isn't needed. [22:10:35] in fact, the complex bit is adding all those conditions [22:10:47] the actual list would be easy to extract while doing it [22:11:32] I see 259721 registered users that did anything since 10 Feb 2019 on enwiki [22:12:14] 259631 excluding bot edits [22:12:24] Platonides - How would it be possible to get that figure on demand? [22:12:43] I did this query: select count(distinct rc_user) from recentchanges_userindex where rc_user != 0 AND rc_timestamp > '201902100000' AND rc_bot = 0; [22:13:55] Does that take into account confirmed and not blocked? Do you have a link to the WMF Query thing where it could be ran on? [22:14:42] not yet [22:15:09] I'm adding now the condition of being confirmed [22:15:21] select count(distinct rc_user) from recentchanges_userindex join user on (rc_user = user_id) where rc_user != 0 AND rc_timestamp > '201902100000' AND rc_bot = 0 AND user_registration>= '201903080000' AND user_editcount > 10; [22:15:38] also, I'm being a bit sloppy with these timestamps :P [22:16:53] that should be >= 10, not strict [22:20:12] Platonides - Is it just blocked to add? Could you do that then add to Quarry? [22:20:31] yes [22:20:37] I'm now calculating the above [22:20:40] but it takes more [22:21:29] this join slowed it [22:21:32] 386 users [22:21:36] 4 minutes to get that [22:21:39] wow [22:21:49] so many non-confirmed users [22:23:03] The nonconfirmed users will the be the large chunk That create accounts but only read. [22:24:11] hmm [22:24:18] account creation log entries [22:24:23] I didn't think on that [22:28:12] Platonides - Hows it going? [22:29:03] I'm running select count(distinct rc_user) from recentchanges_userindex join user on (rc_user = user_id) LEFT JOIN ipblocks ON (rc_user = ipb_user) WHERE rc_user != 0 AND rc_timestamp > '201902100000' AND rc_bot = 0 AND ipb_id IS NULL AND user_registration>= '201903080000' AND user_editcount >= 10 ; [22:29:17] I expect it to take ~5 min [22:29:26] exactly: 5 min 23.82 sec [22:29:29] 334 users [22:30:18] I may have counted perhaps some users with just 9 edits, or some bots that didn't perform bot actions [22:30:29] but I expect that will be roughly accurate [22:30:30] Link me to its run page on Quarry [22:30:43] I did that directly [22:30:56] but you should be able to set that up on quarry [22:31:42] How would you get the full result as to the number? [22:32:09] E.g. Could a total that meet the criteria be given in one query? [22:32:21] the above gives just a number [22:33:01] So what would be the number that meet the criteria and how do I get to the final result? [22:35:14] I don't understand what the output of each Query is. [22:39:57] ^^ Platonides [22:41:15] you don't need all of them [22:41:36] this one may be better: select count(distinct rc_user) from recentchanges_userindex join user on (rc_user = user_id) LEFT JOIN ipblocks ON (rc_user = ipb_user) WHERE rc_user != 0 AND rc_timestamp > '201902100000' AND rc_type = 0 AND rc_bot = 0 AND ipb_id IS NULL AND user_registration>= '201903080000' AND user_editcount >= 10 ; [22:44:15] that returns 331 [22:44:29] so earlier I was counting 3 extra users [22:45:02] Only 331 meet my criteria???? [22:45:12] I think so [22:46:02] That's low [22:46:53] same query on dewiki: 19 [22:47:27] and 43 on eswiki [22:48:01] Wow [22:49:52] I'm not seeing myself listed there :S [22:51:27] oh, wait [22:51:43] I was counting just users with *less* than 4 days [22:51:52] not users with *more* than 4 days [22:52:15] Just spotted that [22:55:24] Platonides, https://quarry.wmflabs.org/query/34268 shows 67650 [22:55:37] 7546 users on eswiki, 13540 on dewiki [22:56:19] Thanks, Platonides [22:56:55] I am running against enwiki this now: [22:56:57] SELECT COUNT(DISTINCT rc_user) FROM recentchanges_userindex JOIN user ON (rc_user = user_id) LEFT JOIN ipblocks ON (rc_user = ipb_user) WHERE rc_user != 0 AND rc_timestamp > '201902100000' AND rc_type = 0 AND rc_bot = 0 AND ipb_id IS NULL AND (user_registration IS NULL OR user_registration < '201903080000') AND user_editcount >= 10; [22:57:28] Is that better than my quarry one [22:57:57] 70750 [22:58:21] some users will have a null user_registration [22:58:29] What does that mean? [22:59:17] that's why I do user_registration IS NULL OR [22:59:47] What is a null user_registration though? Don't know much about SQL [23:00:12] RhinosF1: I like long walks on the beach and bourbon. [23:00:20] Mostly bourbon... [23:01:05] with automatic date calculation: SELECT COUNT(DISTINCT rc_user) FROM recentchanges_userindex JOIN user ON (rc_user = user_id) LEFT JOIN ipblocks ON (rc_user = ipb_user) WHERE rc_user != 0 AND rc_timestamp > DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 30 DAY), '%Y%m%d%H%i%S') AND rc_type = 0 AND rc_bot = 0 AND ipb_id IS NULL AND (user_registration IS NULL OR user_registration < [23:01:11] DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 4 DAY), '%Y%m%d%H%i%S')) AND user_editcount >= 10; [23:02:12] Very funny, sorry about the ping (although you shouldn't use SQL as a nick) [23:02:42] RhinosF1: I'm very used to it and it isn't a bother at all. Just an opportunity to make a joke. [23:03:10] True, true [23:03:12] Null user_registration.... did we always keep that field? [23:03:25] And/or could it be an issue with oversight? [23:03:41] Don't even know what I'm running, just what it's supposed to do. [23:04:51] user_registration has only been in use since mw 1.6 per https://www.mediawiki.org/wiki/Manual:User_table#user_registration [23:05:06] Specifically mentions old users may have a NULL value [23:05:10] so that's likely the issue [23:05:38] Platonides - Jsut doing https://quarry.wmflabs.org/query/34268 then I'll add auto - Its at the 5 min mark [23:08:54] Running auto date [23:10:14] Was 68091 [23:10:22] Auto date running now [23:13:31] That got 70843, thanks for the help [23:16:48] That only works out at 50% of active users [23:50:42] does anyone know of recent problems in edit conflict detection? my tool seems to be overwriting conflicting edits despite me sending basetimestamp and starttimestamp correctly afaict [23:50:52] (there are some existing issues on Phabricator but I don’t think my case matches any of them) [23:51:21] otherwise I’ll move to Discourse and describe my situation in more detail there