[00:02:14] Krinkle: related-ish to save timing is https://phabricator.wikimedia.org/T278206 [02:30:24] drastic increase where? only on dewiki? everywhere? [02:31:22] de wikivoyage [02:31:38] for the lua issue [02:34:12] expensive regex in a hot module? [02:45:03] certainly possible [03:04:39] https://phabricator.wikimedia.org/T278274 [03:04:53] * Krinkle fixes wikibugs to report Radar to -perf-bots [03:35:16] legoktm: this particular spirited man, this particular latency regression seems to also correlate with appserver latencies regressing more generally, so not specific to the small portino that are edits; https://phabricator.wikimedia.org/T278274#6939988 [08:09:41] The branch is not maintaned and in the browser world things breaks when new version comes out. So I mentally accepted to use the master branch if its ok for legal. But I don't feel comfortable to contribute to it. [13:26:34] It would probably not be acceptable for prod or cloud. If it ok now it's only for Aws. And legal are probably evaluating it as an office tool (like betterworks and google) rather than a system we deploy [15:07:40] Well I've asked them anyway, since eventually we'd like to run this on our own infrastructure one day [20:47:18] what backend(s) do statsd counters from mediawiki get sent to? [20:50:11] ori: graphite1004 / graphite2003 [20:50:36] at least it was a few weeks ago when I updated https://wikitech.wikimedia.org/wiki/Graphite#Service_operation [20:51:25] thanks. do we have good/bad experience with setting a sample rate on counters? [20:52:37] I'm writing the CL to make Wikibase Lua only increment counters 1% of the time [20:52:51] (there's day-job real work to be avoided) [20:52:51] I'm not sure we've ever actually used it. [20:53:09] There's a config variable for it, which was misconfigured for a few years until we removed its config from prod [20:53:22] (for wanobjectcache stats) [20:53:24] so I'm basically wondering what are the advantages / disadvantages between incrementing the counter by 1 with a reported sampling rate of 0.01, vs incrementing the counter by 100 (with no sample rate) [20:53:48] ah you mean when calling increment() itself? [20:55:34] Yes. would it be better to call $stat->increment( $key ), $stat->setSampleRate( 0.01 ) vs. $stat->updateCount( $key, 100 ) [20:57:12] what does the L stand for in CL btw? I've seen it before among googles, and I assume means sometihng like patch or pul request, change something, I guess? [20:57:27] sorry, it's changelist [20:57:34] ah, cool [20:57:39] googlism that invaded my lexicon [20:57:43] :) [20:58:32] the actual sampling is done in lua regardless; it's basically a question of what we want to send to the backend: "foo.bar:100|c", or "foo.bar:1|c|@0.01" [20:59:06] ori: I think in practice the diffence should be that setSampleRate means SamplingStatsdClient will do the multiplication for you and discard it in some cases when not in-sample, and direct counter update means you're doing the sampling. [20:59:32] see SamplingStatsdClient::sampleData [20:59:45] I don't think we use statsd's sampling rate natively at any rate [21:00:40] I'm not sure actually, because I'm not aware of this code ever having been active [21:01:04] looks like it might go deeper than I thought, it's allocating objects for each icrement that are arware of their sampling rate down to the sstatsd php lib [21:01:54] I don't want any samples to be discarded in PHP because they're already discarded in Lua. The vanilla statsd client doesn't discard samples, it's only the MediaWiki-specific SamplingStatsdClient that does [21:02:25] ahhh but that's the client class we use [21:02:29] yeah [21:02:39] ok, so I can't set the sampling rate, because it'll discard some data [21:02:50] I have to scale the count instead [21:03:09] I was confused as well, because SamplingStatsdClient extends the composer lib but we don't return it as our stats service, we we-wrap it afterward in MediaWiki.php [21:03:22] yeah, I just came across that too [21:03:59] I'm not sure how the two interact. I would expect SamplingStatsdClient to only apply its own stuff from wgStatsdSamplingRates [21:04:07] so you might be able to set the native one still [21:05:01] personally, I'd just increment normally and then later on in Grafana state in the legend that its sampled, or rate/sample_rate.scale(N) there as needed for visualisation. That's what we do e.g. for the mw-js-deprecate data from statsv.js [21:05:17] but yeah, that's not a good separation of concerns :) [21:06:40] the issue is that these are existing metrics, so I don't want to introduce a discontinuity in the data [21:06:56] I see, so they don't sasmpel today but you want to sample them going forward [21:07:03] yes [21:11:29] ok, well I'd say dig into the statsd lib and determine if you can set sampling on a per metric or per update level there and whether that is indeed preserved throughout [21:11:30] seems like it might be [21:11:40] that is, I think the two layers play nicely with each other [21:11:42] but.. [21:12:00] I don't actually see any way to set the sampling rate on a metric directly from the regular factory. [21:12:59] unless you call produceStatsdData() directly and then call setSampleRate() on that individual entry data object [21:13:11] but the general setSampling method is global [21:14:23] maybe confirm emperically from eval.php in beta or mwdebug/testwiki, I don't have a lot of confidence in my static analysis of so many layers [21:20:58] I think I'm just going to scale the count in Wikibase and report it as unsampled to the statsd client [21:22:18] sounds good to me :)