4. ‣ Highly Dynamic
• Very high level operations
• New code can be introduced at anytime
• Dynamic typing
• Exclusively late bound method calls
• Easier to implement as an interpreter
Wednesday, September 16, 2009
7. ‣Prior Work
• Smalltalk
• 1980-1994: Extensive work to make it fast
• Self
• 1992-1996: A primary research vehicle for making dynamic
languages fast
• Java / Hotspot
• 1996-present: A battle hardened engine for (limited) dynamic
dispatch
Wednesday, September 16, 2009
8. ‣What Can We Learn From Them?
Wednesday, September 16, 2009
9. ‣What Can We Learn From Them?
• Complied code is faster than interpreted code
• It’s very hard (almost impossible) to figure things out staticly
• The type profile of a program is stable over time
• Therefore:
• Learn what a program does and optimize based on that
• This is called Type Feedback
Wednesday, September 16, 2009
10. ‣Code Generation (JIT)
• Eliminating overhead of interpreter instantly increases
performance a fixed percentage
• Naive code generation results in small improvement over
interpreter
• Method calling continues to dominate time
• Need a way to generate better code
• Combine with program type information!
Wednesday, September 16, 2009
11. ‣Type Profile
• As the program executes, it’s possible to see how one method
calls another methods
• The relationship of one method and all the methods it calls is the
type profile of the method
• Just because you CAN use dynamic dispatch, doesn’t mean you
always do.
• It’s common that a call site always calls the same method every
time it’s run
Wednesday, September 16, 2009
13. ‣Type Profiling (Cont.)
• 98% of all method calls are to the same method
every time
• In other words, 98% of all method calls are statically
bound
Wednesday, September 16, 2009
14. ‣Type Feedback
• Optimize a semi-static relationship to generate faster code
• Semi-static relationships are found by profiling all call sites
• Allow JIT to make vastly better decisions
• Most common optimization: Method Inlining
Wednesday, September 16, 2009
15. ‣Method Inlining
• Rather than emit a call to a target method, copy it’s body at the
call site
• Eliminates code to lookup and begin execution of target method
• Simplifies (or eliminates) setup for target method
• Allows for type propagation, as well as providing a wider horizon
for optimization.
• A wider horizon means better generated code, which means
less work to do per method == faster execution.
Wednesday, September 16, 2009
17. ‣Code Generation (JIT)
• Early experimentation with custom JIT
• Realized we weren’t experts
• Would take years to get good code being generated
• Switched to LLVM
Wednesday, September 16, 2009
18. ‣LLVM
• Provides an internal AST (LLVM IR) for describing work to
be done
• Text representation of AST allows for easy debugging
• Provides ability to compile AST to machine code in
memory
• Contains thousands of optimizations
• Competitive with GCC
Wednesday, September 16, 2009
19. ‣Type Profiling
• All call sites use a class called InlineCache, one per call site
• InlineCache accelerates method dispatch by caching previous
method used
• In addition, tracks a fixed number of receiver classes seen when
there is a cache miss
• When compiling a method using LLVM, all InlineCaches for a
method can be read
• InlineCaches with good information can be used to accurately
find a method to inline
Wednesday, September 16, 2009
20. ‣When To Compile
• It takes time for a method’s type information to settle down
• Compiling too early means not having enough type info
• Compiling too late means lost performance
• Use simple call counters to allow a method to “heat up”
• Each invocation of a method increments counter
• When counter reaches a certain value, method is queued for
compilation.
• Threshold value is tunable: -Xjit.call_til_compile
• Still experimenting with good default values
Wednesday, September 16, 2009
21. ‣How to Compile
• To impact runtime as little as possible, all JIT compilation happens
in a background OS thread
• Methods are queued, and background thread reads queue to find
methods to compile
• After compiling, function pointers to JIT generated code are
installed in methods
• All future invocations of method use JIT code
Wednesday, September 16, 2009
22. ‣Benchmarks Seconds
9
8.02
6.75
def foo() 5.30
5.90
ary = [] 4.5
100.times { |i| ary << i }
3.60
end
2.25 2.59
300,000 times
0
1.8 1.9 rbx rbx jit rbx jit +blocks
Wednesday, September 16, 2009
26. ‣Conclusion
• Ruby is a wonderful language because it is organized for humans
• By gather and using information about a running program, it’s
possible to make that program much faster without impacting
flexibility
• Thank You!
Wednesday, September 16, 2009