Slides of my talk on Devel::NYTProf and optimizing perl code at YAPC::NA in June 2014. It covers use of NYTProf and outlines a multi-phase approach to optimizing your perl code.
A video of the talk and questions is available at https://www.youtube.com/watch?v=T7EK6RZAnEA&list=UU7y4qaRSb5w2O8cCHOsKZDw
4. CPU Time Real Time
Subroutines
Statements
? ?
? ?
What To Measure?
5. Subroutine vs Statement
• Subroutine Profiling
- Measures time between subroutine entry and exit
- That’s the Inclusive time. Exclusive by subtraction.
- Reasonably fast, reasonably small data files
• Problems
- Can be confused by funky control flow (goto &sub)
- No insight into where time spent within large subs
- Doesn’t measure code outside of a sub
6. Subroutine vs Statement
• Line/Statement profiling
- Measure time from start of one statement to the start
of the next statement, whereever that might be
- Fine grained detail
• Problems
- Very expensive in CPU & I/O
- Assigns too much time in some cases
- Too much detail for large subs
- Hard to get overall subroutine times
7. CPU Time vs Real Time
• CPU Time
- Measures time the CPU sent executing your code
- Not (much) affected by other load on system
- Doesn’t include time spent waiting for i/o etc.
• Real Time
- Measures the elapsed time-of-day
- Your time is affected by other load on system
- Includes time spent waiting for i/o etc.
9. Public Service
Announcement!
The NYTProf name is an accident of history
I do not work for the New York Times
I have never worked for the New York Times
I have no affiliation with the New York Times
The New York Times last contributed in 2008
10. Running NYTProf
perl -d:NYTProf ...
perl -MDevel::NYTProf ...
Configure profiler via the NYTPROF env var
perldoc Devel::NYTProf for the details
To profile code that’s invoked elsewhere:
PERL5OPT=-d:NYTProf
NYTPROF=file=/tmp/nytprof.out:addpid=1:...
11. Reporting: KCachegrind
• KCachegrind call graph - new and cool
- contributed by C. L. Kao.
- requires KCachegrind
$ nytprofcg # generates nytprof.callgraph
$ kcachegrind # load the file via the gui
13. Reporting: HTML
• HTML report
- page per source file, annotated with times and links
- subroutine index table with sortable columns
- interactive Treemap of subroutine times
- generates Graphviz dot file of call graph
- -m (--minimal) faster generation but less detailed
$ nytprofhtml # writes HTML report in ./nytprof/...
$ nytprofhtml --file=/tmp/nytprof.out.793 --open
16. Inclusive vs Exclusive Time
Inclusive
sub foo
Exclusive
sub bar
bar() bar()
foo()
Inclusive
17. Inclusive vs. Exclusive
• Inclusive Time is best for Top Down
- Overview of time spent “in and below this sub”
- Useful to prioritize structural optimizations
• Exclusive Time is best for Bottom Up
- Detail of time spent “in the code of this sub”
- Where the time actually gets spent
- Useful for localized (peephole) optimization
19. Overall time spent in and below this sub
(in + below)
Color coding based on
Median Average Deviation
relative to rest of this file
Timings for each location
that calls this subroutine
Time between starting this perl
statement and starting the next.
So includes overhead of calls to
perl subs.
Timings for each subroutine
called by each line
20.
21. Boxes represent subroutines
Colors only used to show
packages (and aren’t pretty yet)
Hover over box to see details
Click to drill-down one level
in package hierarchy
Treemap showing relative
proportions of exclusive time
27. Do your own testing
With your own perl binary
On your own hardware
Beware My Examples!
28. Take care comparing code fragments!
Edge-effects at loop and scope boundaries.
Statement time includes time getting to the next
perl statement, wherever that may be.
Beware 2!
29. Consider effect of CPU-level data and code caching
Tends to make second case look faster!
Swap the order to double-check alternatives
Beware Your Examples!
32. “The First Rule of Program Optimization:
Don't do it.
The Second Rule of Program Optimization
(for experts only!): Don't do it yet.”
- Michael A. Jackson
34. “More computing sins are committed in the
name of efficiency (without necessarily
achieving it) than for any other single
reason - including blind stupidity.”
- W.A. Wulf
35. “We should forget about small efficiencies,
say about 97% of the time: premature
optimization is the root of all evil.”
- Donald Knuth
36. “We should forget about small efficiencies,
say about 97% of the time: premature
optimization is the root of all evil.
Yet we should not pass up our
opportunities in that critical 3%.”
- Donald Knuth
38. “Throw hardware at it!”
Hardware == Cheap
Programmers == Expensive (& error prone)
Hardware upgrades are usually much less
risky than software optimizations.
39. “Bottlenecks occur in surprising places, so
don't try to second guess and put in a
speed hack until you have proven that's
where the bottleneck is.”
- Rob Pike
42. Low Hanging Fruit
1. Profile code running representative workload.
2. Look at Exclusive Time of subroutines.
3. Do they look reasonable?
4. Examine worst offenders.
5. Fix only simple local problems.
6. Profile again.
7. Fast enough? Then STOP!
8. Rinse and repeat once or twice, then move on.
46. Use faster accessors
Class::Accessor
-> Class::Accessor::Fast
--> Class::Accessor::Faster
---> Class::Accessor::Fast::XS
----> Class::XSAccessor
These aren’t all compatible so consider your actual usage.
(The list above is out of date.)
47. Avoid calling subs that
don’t do anything!
my $unused_variable = $self->get_foo;
my $is_logging = $log->info(...);
while (...) {
$log->info(...) if $is_logging;
...
}
48. Exit subs and loops early
Delay initializations
return if not ...a cheap test...;
return if not ...a more expensive test...;
my $foo = ...initializations...;
...body of subroutine...
59. Walk up call chain
to find good spots
for caching
Remember cache invalidation!
60. Creating many objects
that don’t get used?
Try a lightweight proxy
e.g. DateTime::Tiny, DateTimeX::Lite, DateTime::LazyInit
61. Reconfigure your Perl
can yield useful gains with little effort
thread support costs ~2..30%
debugging support costs ~15%
Also consider: usemymalloc, use64bitint, use64bitall,
uselongdouble, optimize, disable taint mode.
Consider using a different compiler.
62. Upgrade your Perl
Newer versions often faster at some things
(though occasionally slower at others)
Sometimes have specific micro-optimizations
Many memory usage and performance
improvements from 5.8 thru 5.20
65. Push loops down
- $object->walk($_) for @dogs;
+ $object->walk_these(@dogs);
66. Use faster modules
sort ! Sort::Key
Storable ! Sereal
LWP ! HTTP::Tiny ! HTTP::Lite ! *::Curl ! Hijk
These aren’t all compatible or full-featured or ‘better’
Consider your actual needs
See http://neilb.org/reviews/
70. Small changes add up!
“I achieved my fast times by
multitudes of 1% reductions”
- Bill Raymond
71. See also “Top 10 Perl
Performance Tips”
• A presentation by Perrin Harkins
• Covers higher-level issues, including
- Good DBI usage
- Fastest modules for serialization, caching,
templating, HTTP requests etc.
• http://docs.google.com/present/view?id=dhjsvwmm_26dk9btn3g