ESUG 2017
Video: https://youtu.be/yDKaHphbFow
At ESUG in Cambridge I introduced Sista, an optimizing JIT design for the Pharo VM. The current implementation is now running 1.5x times faster on production applications and up to 5x faster on specific benchmarks that the production Pharo VM. In this talk, I will present the overall optimization pipeline and I will try to show the myriad of implementation details, including the interaction between Sista and other optimizations (Context-to-Stack mapping, closure optimizations, ...), pathological code patterns or the problems related to stack deoptimization and closures.
Bio: Clement Bera implemented the Sista optimizing JIT in the Cog VM for Pharo. He worked 5 years with Eliot Miranda on improving the Cog VM.
21. Run
Number
Compilation
time
Execution
time
1 to 6 Slow
7 Low
8 to
10,000
Average
10,000 High
10,001 +
Fast
Tier
Interpreter
baseline JIT
optimising
JIT
Speculations
Always
correct
Bytecode Native
Bytecode Native
Bytecode
Bytecode
49. Decompilation
Bytecode to Scorch IR (CFG SSA IR)
IR LLVM-style, higher level, deoptimisation info
Annotations using send and branch data
50. Decompilation
Bytecode to Scorch IR (CFG SSA IR)
IR LLVM-style, higher level, deoptimisation info
Annotations using send and branch data
Basic block ordering
Reverse post order
1
2 3
5
7 6
8
9 10
4
51. Decompilation
Bytecode to Scorch IR (CFG SSA IR)
IR LLVM-style, higher level, deoptimisation info
Annotations using send and branch data
Basic block ordering
Reverse post order, Loop canonicalisation
1
2 3
5
7 6
8
9 10
4
54. Speculative inlining
What to inline ?
Heuristics (closures, constants, escapes)
How to inline ?
Non local returns
Context access, exceptions, continuations …
57. Back-end
From SSA to Bytecode (stack-based)
Liveness analysis and graph colouring
Generate patterns efficient for Cogit
(Smi comparison followed by branch, …)
58. Installation
Where to install ?
Copy down ?
Dependency management
Optimisations such as inlining track dependencies
Register optimised method
62. Register allocation
Cheap heuristic was enough for baseline JIT
We now need a proper linear scan allocator
First version moved to production
Second version with control flow management
69. Discard
If multiple deoptimisation on the same code,
discard optimised code
Scorch may reoptimise it differently next time
Discard can also happen when loading / editing code
70. Others
Improved closure implementation
Avoids outerContext issues of old implementation
Decreases metadata
Write barrier (Read-only objects)
Immutable literals
Compiler informed of object mutation (Global, …)
…
71. Research directions
Warm-up time to reach peak performance
Sista: Persistence of bytecode
Metacircular optimising JIT
On top of existing C VM
72. Conclusion & Questions
Readable and performant code.
Overall performance boost.
Alpha release: Sista works.
Moving to production long tasks.
High complexity,
Many details and edge cases,
We made it work.