[00:02:33] * robla and AaronSchulz start contemplating switching video thumbnailing to ffmpeg [00:15:55] hi anomie! good to see you [00:16:06] hu sumanah! [00:18:03] anomie: I was just going to email you actually - the new Bug Wrangler and I were looking through BZ and we realized that we should ask you whether you want to be on the default CC list for API bugs [00:18:29] I suppose I may as well be [00:18:44] :) ok [00:23:20] anomie: added. [06:04:32] Hi, I have a design question I could use some help with. I'm pushing client-side event data to bits (bits.wikimedia.org) from *.wikipedia.org pages. I want some protection from CSRF, but the machine this data is ending up on doesn't have easy access to the production mediawiki cluster, so I can't simply randomly generate some value and bury it in the session, and then retrieve it again. I figure I ought to either pre-generate some la [06:04:32] rge quantity of keys and have them available at both ends (mediawiki + log receiver) or derive keys from some shared secret value, but do so in a way that won't be easily crackable. [06:05:31] My questions are: can I use one of the existing token interfaces (edit tokens, API auth tokens) for this? If not, can some existing implementation be easily extended to fulfill this purpose? And if not, how should I implement it? [06:06:15] you can use the referrer [06:06:21] easily spoofed [06:06:34] how can it be spoofed? [06:07:00] curl -H Referer ? [06:07:10] CSRF doesn't affect curl [06:07:45] well, that's a good point, but i guess CSRF isn't all i'm worried about [06:07:53] what exactly do you want protection from? [06:08:05] i don't want someone to mess up the result of A/B tests by just curling fake events in a loop from the command line [06:08:30] you want to make it slightly more difficult? [06:08:59] if you require a session, then someone could write a bot to get lots of sessions [06:09:20] presumably you're not intending on using a captcha [06:09:27] no :) [06:10:02] ideally have to retrieve a page for each +/- 5 events you send would make it impractical [06:10:03] *having [06:10:24] so what you want is security by obscurity [06:10:33] don't get me wrong [06:10:41] it's often maligned but it's better than no security [06:11:36] you want to have a JavaScript module that makes requests that are somehow difficult to generate without understanding the relevant JavaScript code [06:12:16] i mean, each events must declare an event id which references a data model that is available both to mediawikis and the log endpoint [06:12:25] i could hash that using some gnarly js code with lots of bitwise operators and what have you [06:12:49] you can't use the existing interfaces (edit tokens etc.) because you said that you can't use sessions [06:12:51] but that doesn't seem very smart [06:13:01] and the existing interfaces rely on sessions [06:13:33] what level of security do you think is reasonable for something like this? [06:13:50] the ClickTracking API endpoint has no such security and we don't have evidence that anyone has been screwing with it. of course, they could be doing it in a subtle way that we're not detecting. but at least there's no blatant junk being written by anyone. [06:15:00] what generates the event ID? [06:15:08] the server or the client? [06:16:02] in clicktracking, the client generates an event ID, correct? [06:16:17] well, there's a reference to a data model. the data model is put in place by us and assigned an id by us, but events generated on the client reference that id. there's also a uuid assigned to each event instance, but that only happens on the log collector [06:16:18] yeah [06:17:28] I think having no security is a reasonable way to do this [06:17:49] what's your rationale? [06:18:18] well, there is no profit motive, not much of a motive of any kind for fudging the numbers [06:18:35] lots of similar statistics gathering already live which has no security, and no reports of people screwing with it [06:18:49] but if you're uncomfortable with that... [06:19:13] the next step up is probably IP-based rate limiting [06:19:35] just discard events from an IP if it seems to be sending you an unlikely number of events [06:19:50] yeah, already doing that [06:19:52] you can adjust for the effect at the analysis stage [06:20:17] that's (partly) the basis for me saying no one is currently screwing with us, to my knowledge [06:20:23] so if you're doing that and you're still worried, what's the attack scenario? [06:20:46] someone with a botnet? [06:21:06] i'm not paranoid so much as self-doubting. i'm just wondering if there's some standard and simple way of securing a setup like that, that i'm not reaching for because of ignorance. [06:21:16] if you say there isn't one, maybe that's good enough [06:21:49] well, you can tie it to wiki user accounts [06:22:00] that could be made to be reasonably secure [06:22:10] but I don't think there's any way to secure anonymous events [06:22:40] yeah, that just opens a different can of worms (privacy, legal issues, ethical issues, etc.) [06:22:51] unless you want to talk about increasingly complex methods which do increasingly little to stop abuse [06:23:15] like client-side hashing of event IDs [06:23:46] if someone already has a botnet and understands the protocol and what they want to achieve, a few shifts and rotates probably won't slow them down much [06:24:14] yes, you're right [06:26:33] do you know how third-party analytics providers handle this? presumably there could be a profit motive (if your competitors A/B tests heavily, you could systematically skew their results) [06:27:03] I don't know how they handle it, but I would be surprised if they did anything secure [06:27:37] you would think that if there was any event interface that was secure, it would be advertising referrals [06:27:57] true [06:28:05] but I've read articles that say that even that is not secure, despite widespread fraud [06:28:12] costing large amounts of money [06:28:29] okay, so i'm really going for no security at all, aside from post-hoc sanity checks on the data which we already do [06:28:42] sounds good [06:29:18] i think mangling the data client-side won't improve security and it'll hurt the nice debugability we get from having pretty readable query strings flying around [06:29:43] and it might also mislead data analysts into thinking the setup is more secure than it actually is [06:30:59] so it's probably best to just have it be nakedly insecure and if that seriously unnerves someone they probably shouldn't be using it [06:33:21] thanks TimStarling [06:33:34] yw [10:34:02] New patchset: Hashar; "Ext-Wikibase now report back to Gerrit (non voting)" [integration/jenkins] (master) - https://gerrit.wikimedia.org/r/30138 [10:34:26] New review: Hashar; "sync with production" [integration/jenkins] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/30138 [10:34:26] Change merged: Hashar; [integration/jenkins] (master) - https://gerrit.wikimedia.org/r/30138 [10:59:45] New patchset: Hashar; "Skip Dumps tests when other group fail" [integration/jenkins] (master) - https://gerrit.wikimedia.org/r/30142 [11:00:00] Change merged: Hashar; [integration/jenkins] (master) - https://gerrit.wikimedia.org/r/30142 [18:02:45] ^demon: WTF, what does this even MEAN? "Your change requires a recursive merge to resolve." https://gerrit.wikimedia.org/r/#/c/29901/ [18:03:30] <^demon> It's a way of saying "Can't merge, plz rebase" [18:03:32] <^demon> http://www.mediawiki.org/wiki/Git/Workflow#Your_change_requires_a_recursive_merge_to_resolve [18:03:45] <^demon> (First google result for that phrase, might I add) [18:04:05] <^demon> More interestingly: "The problem is the recursive merge strategy. This strategy could be necessary due to file renames etc. Gerrit however uses JGit as Git implementation, and JGit only supports the resolve merge strategy (at least at the moment). So you have to do it locally (and there you may want to use Git and not EGit, since EGit also uses JGit)." [18:04:18] Ah OK [18:04:33] I was able to trivially rebase locally, with no conflicts [18:05:15] Heh, sorry for not Googling, I wasn't aware that this was something that you'd seen before, I never had [18:06:45] <^demon> Yeah. JGit doesn't support recursive merges yet :\ [18:06:46] <^demon> Which is why it trivially resolved. [18:06:49] <^demon> Locally. [18:56:55] New patchset: Stefan.petrea; "Integration of new iPad code" [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/30197 [19:43:49] I must be doing something stupid: calling wfDebugLog in my main extension entrypoint throws: "Fatal error: Call to undefined function wfDebugLog()". What gives? Are wg* functions loaded afterwards? [19:49:07] <^demon|busy> ori-l: GlobalFunctions should already be loaded before LocalSettings. [19:50:51] * ori-l boggles. [19:51:06] this is odd, then. [19:52:59] <^demon|busy> ori-l: Can easily verify. Add "die( var_dump( get_defined_functions() ) );" to your extension and check :) [19:54:52] sure enough, it's not there. [19:55:14] * ori-l has a funny feeling he's being _really_ dumb. [19:55:18] <^demon|busy> Weird. I could've sworn that got setup before LocalSettings. [19:55:33] <^demon|busy> I should know this offhand. [19:56:14] i'll add a die() to both GlobalFunctions & my extension entrypoint and see which one gets outputted [19:56:53] * ^demon|busy actually is |busy now [19:58:07] extension loads first. [20:40:16] New patchset: Stefan.petrea; "Integration of new iPad code" [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/30197 [21:21:07] Any Jenkins/phpunit experts around? I added an API to the E3Experiments extension with a failing test case. Jenkins failed when Ori updated the submodule in core, https://integration.mediawiki.org/ci/job/MediaWiki-Tests-API/7847/ , but I can't tell if it was our code. [21:21:48] 09:02:36 [exec] 3) Wikibase\Test\ApiBotEditTest::testCreateItem with data set #3 (2, true, true, '{}') [21:21:48] 09:02:36 [exec] Must have the value 'q8' for the 'id' in the result from the API [21:21:51] No [21:27:59] Reedy, so Jenkins doesn't run an extension's API tests automatically? [21:37:32] https://integration.mediawiki.org/ci/job/MediaWiki-Tests-API/7847/testReport/(root)/ [21:37:37] seemingly not... [21:58:34] bsitu: yeah, I get nothing when trying to view the Gerrit change to that i18n file too [21:58:45] I figured it was just Gerrit [21:59:31] <^demon|away> Gerrit doesn't like large i18n files. [21:59:45] <^demon|away> Long files make the diff code barf. [22:00:30] Well I guess we can git fetch it and see if it's really there or not (there's a bug where text isn't loading, so it might not be Gerrit's fault) [22:00:48] stevenw: I guess the change was gone during a merge [22:00:50] New review: Diederik; "Ok. Erik Z: please merge at your earliest convenience." [analytics/wikistats] (master); V: 1 C: 1; - https://gerrit.wikimedia.org/r/30197 [22:10:32] bsitu, so did that change not get merged into the repo properly? [22:10:55] I know the diff code doesn't want to let me view it but it should've still gone in properly.. right? [22:11:28] krenair: it was merged and deployed to production [22:11:58] So why is it missing? [22:12:34] I think it's gerrit [22:12:40] https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/MoodBar.git;a=blobdiff;f=MoodBar.i18n.php;h=c84b2febf92c449502a7da8860e25b9433b6e2ac;hp=1817c61debf7f212cf85a51b39934e2209345476;hb=cb79959e145f904bdc5440146b9a0e87c09ff4d8;hpb=d844a9062d46a47c600ea5f3d3988ad1cdfe676f [22:12:41] You can diff in gitweb ^ [22:13:34] it's probably due to this big maintenance update: https://gerrit.wikimedia.org/r/#/c/24571/ [22:14:29] oops, nope, there was no language in this patch [22:14:36] language update [22:21:30] krenair, stevenw: lt got reverted by L10n-bot: https://gerrit.wikimedia.org/r/#/c/27596/ [22:21:38] * StevenW facepalms [22:21:46] Did l10n-bot screw up again? [22:22:02] Apparently so. [22:22:07] * RoanKattouw reverst [22:22:25] Don't bother, I basically already have in a semi-related commit [22:22:33] Ha [22:22:35] Crap, I just reverted [22:22:47] Oh wait, no I didn't [22:22:52] BEcause it conflicted :D [22:23:03] is this the Gerrit equivalent of slapstick? [22:23:07] I suppose [22:23:09] :-D [22:23:09] https://gerrit.wikimedia.org/r/#/c/30286/ [22:23:40] cool, thanks [22:28:00] Ok... About to do my first +2 [22:29:26] Ah so I need to +1 verified as well [22:29:41] yay [22:29:44] \o/ [22:37:33] WTF why is Gerrit 503ing on me [22:40:13] ^this (in less strong words) [22:43:15] When/where? [22:45:50] Reedy, about 37 minutes past, all over Gerrit