[10:53:18] something is going wrong in http://www.sub-bavaria.de/w/mw-config/index.php?page=Language ...... should rewrite localsettings manually? [14:54:40] Hi, I was told I can find people who know about database dumps here. I'm trying to read plwiktionary-...-pages-articles-multistream with Python (but I think it's a language-agnostic question). The first line of the index file is "670:1:czytać". But if I open the dump file like so: with open(bz2.BZ2File(path)) as f: and then f.seek(670), I'm not getting anywhere close to "czytać", which actually starts at 2730th byte. According to https://en [14:54:40] .wikipedia.org/wiki/Wikipedia:Database_download#Should_I_get_multistream?, " The first field of this index is # of bytes to seek into the archive" [15:00:14] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @Thiemo_WMDE & @CFisch_WMDE - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:50:14] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @Thiemo_WMDE & @CFisch_WMDE - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [16:00:22] \o/ [16:00:40] Welcome to the Technical Advice IRC meeting! [16:01:35] This time with /me and a still has to join Thiemo! [16:02:29] There are no questions pre-posted on the wiki so just fire away! [16:02:35] Hi Thiemo_WMDE :-) [16:02:42] Hi tech! [16:04:03] Hi CFisch_WMDE and Thiemo_WMDE! Can I start? It's a question about (poorly documented) multistream database dumps. I'm trying to read plwiktionary-...-pages-articles-multistream with Python (but I think it's a language-agnostic question). The first line of the index file is "670:1:czytać". But if I open the dump file like so: with open(bz2.BZ2File(path)) as f: and then f.seek(670), I'm not getting anywhere close to "czytać", which actually [16:04:03] starts at 2730th byte. [16:04:21] According to https://en.wikipedia.org/wiki/Wikipedia:Database_download#Should_I_get_multistream?, " The first field of this index is # of bytes to seek into the archive" [16:05:26] hmm *looks* [16:05:50] o/ [16:06:32] (see https://dumps.wikimedia.org/plwiktionary/20181101/ for the latest dump) [16:07:32] I have never done bz2 reading before so I might be completely wrong in my interpretation though! [16:08:23] So you got the index and the multistream but when you try to access the position in the multistream where the index points you to you're not where you expect to be, right? [16:09:00] I have never done bz2 reading before so I might be completely wrong in my interpretation though! <- Same here, but maybe we can solve this together ^^ [16:09:15] Or any volunteer that knows how to do that in the channel? [16:09:16] CFisch_WMDE, yes, thats correct [16:09:23] does the offset refer to the compressed or decompressed stream? [16:09:28] I suspect that might be the cause [16:09:36] I just wanted to say the same [16:09:38] :-) [16:09:56] From what I read in on the wiki I would assume to the compressed stream [16:10:26] but might also be the other way around [16:11:00] Oh, that would make sense [16:11:16] let me try [16:18:43] works like a charm, thanks a lot guys! [16:19:11] I’m a bit surprised seeking into a compressed stream works, but okay :) [16:19:49] alkamid: so how did you change your code? it would probably be nice to update the documentation to clarify this [16:20:30] alkamid: you're welcome [16:22:27] Anyone else? :-) [16:26:37] Thiemo_WMDE: Last meeting we had a discussion about the AutoloaderStructureTest [16:27:36] First approach was trying to fix the AutoloaderStructureTest itself but no luck yet [16:28:18] Looking for the pending patch right now.... [16:28:18] I remember. You said you had 3 ideas. [16:28:33] https://gerrit.wikimedia.org/r/c/mediawiki/core/+/472202 [16:28:54] The 3 ideas came out as a result of the discussion of last meeting [16:29:01] Lucas_WMDE, I'm still figuring out what's the idiomatic way, but something along these lines: https://dpaste.de/LvUh [16:29:02] It is a bunch of suggestions, not from me :) [16:29:39] I was trying to understand how the class was written because any other way will be temporary [16:30:02] But solving the AST problem will be a permanent solution or say long term [16:30:40] I'm happy make a contribution to the documentation, but is that page (Wikipedia:Database_download) an appropriate place to put code samples? For me, those 4 lines in Python are more meaningful than the whole paragraph on that page... [16:30:52] Thiemo_WMDE: I'm wondering if you could just briefly, look at that class and tell me if you see something that can guide me fix it, I've tried [16:32:54] Reedy suggested that something is happening here: $expected = $wgAutoloadLocalClasses + $wgAutoloadClasses; [16:33:07] alkamid: okay, so the offset refers to the compressed stream, not the decompressed contents? [16:33:11] But till now, I still can't see it :( [16:33:18] (it wasn’t clear to me what your previous code did, so I couldn’t guess which way the fix went) [16:33:24] Lucas_WMDE, correct [16:33:28] ok thanks [16:33:55] ah, but I don’t have permission to edit the Wikipedia documentation page to add that clarification :D [16:34:04] I’ll leave a message on the talk page [16:35:06] d3r1ck: Where is the patch with the failing test? [16:35:11] So the currently patch doesn't solve the problem though it makes sense, still digging to track down what is the issue honestly [16:35:19] * d3r1ck looks... [16:35:33] Thiemo_WMDE: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GuidedTour/+/466675 [16:35:49] woops [16:35:56] I was here and not paying attention because work [16:36:44] apergos: o/ [16:37:06] Did you tried splitting the alias to a separate file named "GuidedTourLauncher.php"? [16:37:15] the whole poin of hving the multistream index is to be able to seek into the compressed file [16:37:21] which is fast [16:37:35] seekinginto the uncompressed stream means you hve to decompress everything before it (slow) [16:37:36] Thiemo_WMDE: No, I've not tried that yet, so far, I've been trying option 1 [16:37:49] Which is to fix the AST [16:38:10] I should not use AST again :(, it misleading [16:38:11] once you're there you uncompress the block and you may have multiple pages in the block, but this is ok, dealing with one block is pretty quick [16:38:12] d3r1ck: Do you mind if I upload a PS9 to that patch? [16:38:27] Thiemo_WMDE: Please feel free, I don't mind :) [16:38:31] * apergos checks out again for a bit, sorry but I am in the middle of sorting out the last part of a script [16:42:29] fixed, back [16:44:55] d3r1ck: Done, lets see if this works better https://gerrit.wikimedia.org/r/466675 [16:45:22] Okay! [16:46:25] so all of that was to alkamid but it seems you got it all worked out anyways [16:48:36] Thiemo_WMDE: Is this normal? https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-hhvm-docker/7925/console [16:48:50] That report was available but on checking again later (today) it's not there :( [16:49:01] apergos: still: thanks for your input in that ;-) [16:49:34] sure, as the writer of that poorly maintained doc (and code, heh), figured it ws ok to chime in [16:49:51] hehe [16:51:47] d3r1ck: Yea, these CI reports get deleted after a few weeks. [16:52:20] Okay! Now I know :) [16:52:25] Thanks! [16:52:26] You can retrigger the tests by writing a comment with nothing but "recheck" on the patch on Gerrit. [16:52:40] apergos: But then you might be able to improve that documentation ... see request https://en.wikipedia.org/wiki/Wikipedia_talk:Database_download#Clarify_multistream_description ;-D [16:53:11] Thiemo_WMDE: Yeah, was trying to access that particular report, wanted to see something, it seems it was different since it was a different PS [16:53:18] oh heh I maintain the 'official' (:-P) docs [16:53:24] not the stuff on any given project about it [16:53:26] um [16:53:54] Thiemo_WMDE: Things are a little different now: https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-hhvm-docker/10150/console [16:54:58] ahhh :-) k [16:56:05] well I added one word to the page, they probably won't mind :-D [16:59:28] do I really not have that format documented anywhere... hmm maybe i's in a readme only, that would be unfortunate [17:03:25] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/dumps/mwbzutils/+/master/xmldumps-backup/mwbzutils/README meh [17:03:36] documented in the util used to dump an arbitrary block [17:03:43] remind me to fix that at some pint [17:03:45] *point [17:04:14] Ok the official part of the Technical Advice IRC meeting is done for the week - see you in one of the next sessions or online :-)! [17:04:48] And don't forget: https://discourse-mediawiki.wmflabs.org/ is open 24/7 for your questions ;-) [17:05:06] As always, thank you for doing these :) [17:05:38] :-) [17:08:53] meanwhile... anyone familiar with MW-Vagrant? [17:09:21] I know advice stuff ended just around 10 minutes ago but 10 minutes ago I was not getting problem >_> [17:11:26] I did `vagrant up`, and then I did `vagrant roles enable checkuser` and what I receive is 'WARNING: This has been deprecated in favor of `vagrant cloud auth login` which looks unrelated for MW-Vagrant [17:12:34] vagrant version 2.2.0 [17:16:30] SleepyOne: ^ :-p [17:19:33] did `vagrant provision`, failed, did roles enable, 'Ok' this time [17:19:36] lol huh [17:44:16] apergos, thanks for multistream clarifications! [17:44:28] sure! [17:45:14] it seems I should write an 'output file formats' page someplace and link to it prominently from either meta or wikitech [22:45:03] @tgr: could you approve this updated dashboard.wikiedu.org consumer? It's the same as the previous consumer except that it adds the high-volume editing grant. https://meta.wikimedia.org/wiki/Special:OAuthListConsumers/view/b20ec7f1dae4fab37c83fc30aea07bad [22:45:24] (This matches what outreachdashboard.wmflabs.org already has: https://meta.wikimedia.org/w/index.php?title=Special:OAuthListConsumers/view/5709c54e5e241577730e27c13e1a56cf&name=&publisher=Ragesoss&stage=1 ) [22:46:39] We're enabling the same account creation feature on Wiki Education Dashboard that we added to Programs & Events Dashboard earlier this year. [22:55:41] ragesoss: Hej! On an unrelated note, are the two unpublished GCI tasks (Capybara feature tests; no-undef eslint rule) ready to be published, or do they need more tuning? (I'm unable to tell, unfortunately) [23:06:17] andre__ they're ready. Should I have done something to signal that? [23:09:50] Bryan approved the consumer already, tgr. [23:18:42] ragesoss: Ah, thanks, will publish. For future reference, feel very free to add a "[READY TO PUBLISH] " prefix or such :) [23:18:56] (For all those years I've never come up with a better solution, meh.)