The document discusses various techniques for profiling CPU and memory performance in Rust programs, including:
- Using the flamegraph tool to profile CPU usage by sampling a running process and generating flame graphs.
- Integrating pprof profiling into Rust programs to expose profiles over HTTP similar to how it works in Go.
- Profiling heap usage by integrating jemalloc profiling and generating heap profiles on program exit.
- Some challenges with profiling asynchronous Rust programs due to the lack of backtraces.
The key takeaways are that there are crates like pprof-rs and techniques like jemalloc integration that allow collecting CPU and memory profiles from Rust programs, but profiling asynchronous programs
11. CPU profiling
$ cargo install flamegraph
$ cargo flamegraph --dev
dtrace: system integrity protection is on, some features will not be
available
dtrace: failed to initialize dtrace: DTrace requires additional
privileges
failed to sample program
12.
13. Just use linux
$ docker run --rm rust:1.51
# apt-get install linux-perf
...
# cargo flamegraph
Finished dev [unoptimized + debuginfo] target(s) in 0.03s
/usr/bin/perf: line 13: exec: perf_5.10: not found
E: linux-perf-5.10 is not installed.
failed to sample program
14.
15. Match the kernel version
$ docker run --rm rust:1.51-bullseye
# apt-get install linux-perf
...
# cargo flamegraph --dev
Finished dev [unoptimized + debuginfo] target(s) in 0.03s
….
16.
17. Profiling a running program
$ cargo install inferno
$ perf record -p "$(pgrep profexample)" -F 997 -g
…^C
[ perf record: Captured and wrote 5.535 MB perf.data (81144 samples) ]
$ perf script | inferno-collapse-perf > stacks.folded
$ inferno-flamegraph < stacks.folded > flamegraph.svg
$ open flamegraph.svg
...
20. Small digression (2)
● On x86-64 the default is to omit frame pointers
● X86-64 ABI says:
● The conventional use of %rbp as a frame pointer for the stack frame may be avoided by using %rsp (the stack
pointer) to index into the stack frame. This technique saves two instructions in the prologue and epilogue and
makes one additional general-purpose register (%rbp) available.
● Gcc since 4.6 omits frame pointers by default on x84-6
● Rust omits frame pointers also on dev builds
● DWARF info is used to figure out the layout of the stack frame for
each function. You don’t need full debug info for backtraces:
[profile.release]
debug = 1
21. Profiling a running program
$ cargo install inferno
$ perf record -p "$(pgrep profexample)" -F 997 -g --call-graph dwarf
…^C
[ perf record: Captured and wrote 461.199 MB perf.data (57251 samples) ]
$ perf script | inferno-collapse-perf > stacks.folded
$ inferno-flamegraph < stacks.folded > flamegraph.svg
$ open flamegraph.svg
...
24. Questions:
● What if I want to run this on k8s where I don’t control my kernel version?
● What if I want to run this on mac without pulling my hair out?
● What if I don’t have a shell on prod?
25. Questions:
● What if I want to run this on k8s where I don’t control my kernel version?
● What if I want to run this on mac without pulling my hair out?
● What if I don’t have a shell on prod?
28. $ go tool pprof --http localhost:4080
'http://localhost:6060/debug/pprof/profile'
Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile
Saved profile in /Users/mkm/pprof/pprof.samples.cpu.001.pb.gz
Serving web UI on http://localhost:4080
53. //! The standard API includes: the [`malloc`], [`calloc`], [`realloc`], and
//! [`free`], which conform to to ISO/IEC 9899:1990 (“ISO C90”),
//! [`posix_memalign`] which conforms to conforms to POSIX.1-2016, and
//! [`aligned_alloc`].