[19:20:55] wall-clock profiling is live? [19:22:52] I think it makes sense to have both [19:26:41] ori: only on fandom.com :( [19:26:57] oh cool [19:27:14] also using excimer? [19:27:24] yeah :) great stuff! [19:27:40] thanks for building out the pipeline! [19:33:28] it'd be good to have the ability to annotate profiles with multiple tags [19:33:57] right now the wmf pipeline tags profiles by entry point (api.php, load.php, etc.) [19:35:06] but it would be useful to slice the data further: api method (for api.php), page save requests, anon. vs logged in, etc. [19:36:51] but the current denormalized storage model wouldn't work well for that [19:45:50] yeah, I've had some folks bring up that idea as well [19:47:21] although the search function on the generated SVGs somewhat makes up for it [20:13:37] ori: we have a many to one mapping now between trace logs and SVGs. with presend, postsend, and EditAction having dedicated files [20:14:07] since previously the compression meant that edit code paths were basically just the entry point for EditPage with no details [20:14:38] even if you zoom and search, the detail is rather limited, and we dont' want the files to become unloadable in a reasonable time in teh browser either, so it's a compromise. [20:15:11] ( https://phabricator.wikimedia.org/T253679 ) [20:15:29] https://performance.wikimedia.org/arclamp/svgs/daily/2020-12-04.excimer.all.fn-EditAction.svgz [20:59:04] that's pretty neat [21:26:54] ori: I'm writing about our sampling profiler stack (and the current iteration of it), hopefully for this years perf calendar. [21:29:32] In it, I also refer to perf_events and why it didn't work for us. I'm finding a bit of confusion though, maybe you can help me out. As I understand it, a compiled program from C or Rust, should work out of the box (assuming debug symbols available and "frame pointers" enabled, so I hear). For a runtime that uses a VM like Java, PHP or Node.js, it doesn't work normally since we'd be seeing the engines own code execute instead of the [21:29:32] virtual execution of our code. So far I get it, I think. Where I get confused is that it can be made to work for Java via an agent, and with Node.js via a perf-map (and maybe with PHP 8's JIT and its perf maps as well). [21:32:06] I would like to say that perf_events can work with VM'ed runtimes so long as they use a JIT to create machine code for your source and run that directly on the stack. As I understand it, that's what PHP and V8's JIT do, even if they are not strictly "lazy" or "just in time", they somehow eval new native functions and call those, right? But does that mean that some of the time, you'd still see the current frame being something that has no [21:32:07] mapping to source code? Or do they somehow hide this from perf_events? [21:32:35] sometimes I wonder how this collection approach could be extended to other languages - netflix has lots of good literature/software around that, but most of their collection just dumps the data to local files, which is problematic when one has a high volume of logs / containers etc [21:32:53] although one could mess around with output redirection I suppose and have some collection agent/sidecar send the data to the destination [21:36:30] mszabo: yeah, arclamp-log is effectively that for the most part today. it consumes a pubsub stream of stack traces which coudl be from any application or programming language. (It's also written in Python, not PHP.) This is all ori's work btw, we've just maintained it since then (originally for hhvm-xenon rather than php-excimer). And once we add support for external object storage, it would be even more stateless and can then probably [21:36:30] be labelled "cloud native". [21:37:49] yup, arclamp-log on the receiving side is pretty straightforward. but one does have to get the samples into redis first, which excimer solves neatly by letting you do it from the application, but most other profilers don't [21:38:16] so one would probably need another middleman to take a stream of stack traces and put it into redis [21:40:53] right, yeah, it depends a lot on the program. The approach we have now is that it's internal to the process. That's hard to generalize. I would assume that anything that is similar to this for other languages/runtimes will also provide a callback and allow you to beacon it off to an open file socket or redis stream. But I think the ideal generalisation would not involve anykind of in-process helper. Rather, I'd like to standardise on [21:40:53] perf_events longterm, which already solves all this basically. At which point the "client" would be a sidecar process that uses the Linux 'perf' tool (perf_events) to get samples from whatever applicatin you run, and send it off to Redis. [21:41:49] PHP 8 has a JIT now and perf_events perf.map support. I haven't played it with it, but in theory that would allow us to adopt or create a fully php-agonstic arclamp client. [21:45:27] yeah, it looks like nodejs's v8 and the jvm both already have support for perf_events based profiling, which would cover two other big ones [21:45:58] yup [21:46:05] I [21:46:17] I'm linking a fair bit to http://www.brendangregg.com/perf.html already from this blog post draf [22:15:52] beautiful, there are lots of good resources linked from there - nice rabbit hole :)