SlideShare une entreprise Scribd logo
1  sur  103
Télécharger pour lire hors ligne
Sista: a Metacircular
Architecture for Runtime
Optimisations persistence
Clément Béra
Ph.D defense
Ph.D Setup
International collaboration
Goal: transfer new knowledge to RMoD
2
France (32 months) California (4 months)
RMoD Researchers Domain expert
Stéphane Ducasse
Marcus Denker
Eliot Miranda
Inria Stellect Systems LLC
3
Context
Research problems
Solution: Sista
Validation
4
Context
Research problems
Solution: Sista
Validation
Sista ?
5
Speculative Inlining Small-Talk Architecture
Sista ?
6
Optimising Just-in-time compiler (JIT)
for Smalltalk
Need
7
Program readability
Overall Smalltalk performance
Program readability
8
Technique
name
Program code
do: selector array do: #yourself.
do: block array do: [ :elem | elem yourself ].
to: do: 1 to: array size do: [ :i | (array at: i) yourself ].
0	
20	
40	
60	
80	
100	
120	
0	 10	 20	 30	 40	 50	 60	 70	 80	 90	 100	
Millions	of	execu-on	per	second	
Array	size	
do:	selector	
do:	block	
to:	do:
0	
20	
40	
60	
80	
100	
120	
0	 10	 20	 30	 40	 50	 60	 70	 80	 90	 100	
Millions	of	execu-on	per	second	
Array	size	
do:	selector	
do:	block	
to:	do:	
Array size = 0
to:do: 5x faster than do: block
0	
20	
40	
60	
80	
100	
120	
0	 10	 20	 30	 40	 50	 60	 70	 80	 90	 100	
Millions	of	execu-on	per	second	
Array	size	
do:	selector	
do:	block	
to:	do:	
Array size = 10
to:do: 2.5x faster than do: block
12
Technique
name
Program code
do: selector array do: #yourself.
do: block array do: [ :elem | elem yourself ].
to: do: 1 to: array size do: [ :i | (array at: i) yourself ].
Readability
Performance
Wanted
13
Program readability
AND
Performance
Overall Smalltalk
Performance
14
Faster than anything you can write
Overall Smalltalk
Performance
15
Faster than anything you can write
Optimisations on low-level behavior
that cannot be expressed at Smalltalk level
Speculations
16
Speculative optimisations
First runs non optimised
Optimise based on first runs
Ex: Unused branch in first runs
Speculations
17
Speculative optimisations may be incorrect
Ex: Unused branch is used
Deoptimise the code at runtime
Re-optimise differently
Main existing solution
18
Three-tiers execution model
Java Hotspot, Javascript V8, …
19
Run
Number
Compilation
time
Execution
time
1 to 6 Slow
7 Low
8 to
10,000
Average
10,000 High
10,001 +
Fast
Tier
Interpreter
baseline JIT
optimising
JIT
20
Run
Number
Compilation
time
Execution
time
1 to 6 Slow
7 Low
8 to
10,000
Average
10,000 High
10,001 +
Fast
Tier
Interpreter
baseline JIT
optimising
JIT
Speculations
Always
correct
Bytecode Native
Bytecode Native
Bytecode
Bytecode
21
Existing
Run
Number
Compilation
time
Execution
time
1 to 6 Slow
7 Low
8 to
10,000
Average
10,000 High
10,001 +
Fast
Tier
Interpreter
baseline JIT
optimising
JIT
22
Sista
Run
Number
Compilation
time
Execution
time
1 to 6 Slow
7 Low
8 to
10,000
Average
10,000 High
10,001 +
Fast
Tier
Interpreter
baseline JIT
optimising
JIT
Alternative architecture
23
Meta-tracing JIT
Record optimised traces of loop bodies
Pypy, LuaJIT
Existing proposal
24
Eliot Miranda,
with contributions of
Paolo Bonzini, Steve Dahl, David Griswold, Urs Hölzle,
Ian Piumarta and David Simmons
Adaptive Optimiser for Small-Talk Architecture
Key ideas
Build on top of the existing VM
Split optimising JIT design
25
Build on top of existing VM
Market acceptance
Lower risk and investment
26
Split optimising JIT design
27
Bytecode
Bytecode
Bytecode
Bytecode Native code
1 2
Lowering maintenance cost
1 is written in Smalltalk
More open source contributors
2 reuses the baseline JIT
One back-end to maintain
28
Context
Research problems
Solution: Sista
Validation
Is the split optimising JIT design possible ?
29
Bytecode
Bytecode
Bytecode
Bytecode Native code
1 2
1. Architecture
1. Architecture
Can we build a maintainable optimising JIT
with two people part-time ?
30
Practical problem:
Cannot be proven (empirical study)
Existing solutions
31
Reusing an existing adaptable JIT
Truffle
RPython toolchain
Non optimising tiers are slower to execute code
Runtime compilation time
Deoptimisation and re-optimisation
32
Run
Number
Compilation
time
Execution
time
1 to 6 Slow
7 Low
8 to
10,000
Average
10,000 High
10,001 +
Fast
Tier
Interpreter
baseline JIT
optimising
JIT
Time to reach peak performance
2. Warm-up
How to reduce the time to reach peak
performance ?
33
Existing solutions
Many tiers architecture
Baseline JIT
4 tiers in Javascript Webkit
Cloneable JVMs
Work of Kawachiya et al.
Clone a running JVM including native code
34
Existing solutions
Persistence of Runtime information
Saving inlining decisions (Strongtalk)
Shared profiling information between VMs (Arnold et
al.)
Persistence of Machine code
Azul
JRockit
35
Metacircular JITs
Written in itself
Compiled AOT (RPython toolchain)
Same runtime (Maxine, Graal)
36
3. Metacircular
Can the optimising JIT optimise its own
code at runtime ?
37
Existing solution
Truffle-Graal
Built on top of the JVM
Graal optimising JIT in Java
38
Problems
1. Architecture
2. Warm-up
3. Metacircular
39
40
Context
Research problems
Solution: Sista
Validation
Terminology
Function: Method or Closure
Virtual function VS Native function
41
Smalltalk runtime
Virtual machine
Cogit
n-function to v-function
Machine-specific optimisations
virtual functions
native functions
Baseline JIT
Optimising JIT
Smalltalk runtime
Virtual machine
Cogit
n-function to v-function
Machine-specific optimisations
Scorch
non optimised v-functions to
optimised v-function
Smalltalk-specific optimisations
virtual functions
(persisted across start-ups)
native functions
(discarded on shut-down)
Baseline JIT
Optimising JIT
Smalltalk runtime
Virtual machine
Cogit
n-function to v-function
Machine-specific optimisations
Scorch
non optimised v-functions to
optimised v-function
Smalltalk-specific optimisations
virtual functions
(persisted across start-ups)
native functions
(discarded on shut-down)
Baseline JIT
Optimising JIT
Smalltalk runtime
Virtual machine
Cogit
n-function to v-function
Machine-specific optimisations
Scorch
non optimised v-functions to
optimised v-function
Smalltalk-specific optimisations
virtual functions
(persisted across start-ups)
native functions
(discarded on shut-down)
Baseline JIT
Optimising JIT
My work
alone
My work in
collaboration
with Stellect
Systems LLC
Optimising JIT
46
Smalltalk image
Virtual machine
Cogit
Hot spot
detection
Scorch
What to
optimise
Optimisation Installation
Optimised
native function
Optimised virtual function
Hot spot detection
47
Profiling counters on branches
Counters as pinned byteArray on heap
No cpu i-cache flush
Direct reference to counter
Detection: VM call-back to Scorch
D5 C8 A2 5E 24 1A AD 7C 23 60 9
D2 3C A4 B9 66 28 CF A2 A8 27 A2
FA 7B 88 7E CA AD B5 43 6B EF 43
5C 48 A 55 D5 88 52 AE E5 68 77 1
D3 D8 F5 13 42 A5 F2 3B 76 7 CA 26
16 65 27 4E F6 EB 68 98 6B C3 91
42 68 8 B7 4B B3 6E 81 C5 F0 3D 44
E2 94 8E 29 61 82 93 5 D4 10 96 C3
EB C5 5 46 FB 52 61 A6 6B 44 11
BA 8E EB EA 91 70 20 C3 D7 67 E0
91 58 32 79 9A 31 50 ED 6D CB BF
6C E0 25 5B 74 82 D9 C3 E5 54 F6
87 B5 88 C1 16 65 BF B7 F1 6F E0
91 18 6E 84 2E B7 E8 3 4C 4B 92 7B
81 BE 84 4C C0 78 8C CB EB 87 7F
D0 7B 58 E6 DA CD 81 3 94 8D 42
89 F6 8A 24 A8 7C B0 62 46 F5 FF
E9 CC C2 8F DC ED E 70 42 AA BE
A7 3D 60 A3 88 E7 FC 40 56 8 66 4C
C7 22 41 86 B1 BE D0 AA D9 FD E5
6F BC 19 E8 3C 6A EA 68 62 3 38
CB FE CF 91 35 33 6F F8 8A C3 9C
Non optimised
native function
Pinned ByteArray
Counter 1A Counter 1B
Counter 2A Counter 2B
0 1615 31
48
What to optimise ?
Example>>exampleArrayLoop
array do: [:item | item displayOnScreen ]
Array>>do: aClosure
1 to: self size do: [:index |
aClosure value: (self at: index) ]
49
What to optimise ?
Example>>exampleArrayLoop
array do: [:item | item displayOnScreen ]
Array>>do: aClosure
1 to: self size do: [:index |
aClosure value: (self at: index) ]
Example >>
exampleArrayLoop
Array >> do:
[:item |
item displayOnScreen ]
Stack
growing
down
50
What to optimise ?
Example>>exampleArrayLoop
array do: [:item | item displayOnScreen ]
Array>>do: aClosure
1 to: self size do: [:index |
aClosure value: (self at: index) ]
Hot spot detected
Example >>
exampleArrayLoop
Array >> do:
[:item |
item displayOnScreen ]
Stack
growing
down
Hidden
branch
51
What to optimise ?
Example>>exampleArrayLoop
array do: [:item | item displayOnScreen ]
Array>>do: aClosure
1 to: self size do: [:index |
aClosure value: (self at: index) ]
Method to optimise
Hot spot detected
Example >>
exampleArrayLoop
Array >> do:
[:item |
item displayOnScreen ]
Stack
growing
down
Virtual function optimisation
52
non
optimised
v-function
Compiled
Code
Compiled
Code
non
optimised
v-function
Scorch IR
optimised
v-function
Decompilation Generation
Range optimisations,
Loop optimisations,
...
Speculative
inlining
Bytecode optimisation
Through a CFG SSA IR (similar to LLVM)
Lots of details and edge cases
Details in the thesis
53
Installation
54
Where to install ?
Copy down ?
Collection
Array Dictionary
Ordered
Collection
Sequenceable
Collection
Hashed
Collection
Set
Copy down
Installation
55
Dependency management
Optimisations such as inlining track dependencies
Register optimised method
56
Next call uses optimised
virtual function
VM extensions
Execution of optimised virtual functions
Register Allocation
New operations
57
New operations
• unsafe array access
• arithmetic without overflow
• inlined allocation
• Efficient type-checks
• ….
58
A bytecode set for adaptive optimizations, IWST’14
New operations
59
A bytecode set for adaptive optimizations, IWST’14
• unsafe array access
• arithmetic without overflow
• inlined allocation
• Efficient type-checks
• ….
BEST PAPER
AWARD
Deoptimisation
Debugger requested
Incorrect speculation during optimisation
60
Deoptimisation
61
Smalltalk image
Virtual machine
Cogit
Trap
Tripped
Scorch
Reconstruct
objects
Stack
edition
Resume
execution
Traps
Most trap branches are not taken
Traps should be off the cpu i-cache
Call-back to Scorch
62
D5 C8 A2 5E 24 1A AD 7C 23 60 9
D2 3C A4 B9 66 28 CF A2 A8 27 A2
FA 7B 88 7E CA AD B5 43 6B EF 43
5C 48 A 55 D5 88 52 AE E5 68 77 1
D3 D8 F5 13 42 A5 F2 3B 76 7 CA 26
16 65 27 4E F6 EB 68 98 6B C3 91
42 68 8 B7 4B B3 6E 81 C5 F0 3D 44
E2 94 8E 29 61 82 93 5 D4 10 96 C3
EB C5 5 46 FB 52 61 A6 6B 44 11
BA 8E EB EA 91 70 20 C3 D7 67 E0
91 58 32 79 9A 31 50 ED 6D CB BF
6C E0 25 5B 74 82 D9 C3 E5 54 F6
87 B5 88 C1 16 65 BF B7 F1 6F E0
91 18 6E 84 2E B7 E8 3 4C 4B 92 7B
81 BE 84 4C C0 78 8C CB EB 87 7F
D0 7B 58 E6 DA CD 81 3 94 8D 42
89 F6 8A 24 A8 7C B0 62 46 F5 FF
E9 CC C2 8F DC ED E 70 42 AA BE
A7 3D 60 A3 88 E7 FC 40 56 8 66 4C
C7 22 41 86 B1 BE D0 AA D9 FD E5
6F BC 19 E8 3C 6A EA 68 62 3 38
CB FE CF 91 35 33 6F F8 8A C3 9C
Optimised
native function
Type check
Trap call-back
Object re-construction
63
Application frame
requesting
deoptimisation
Deoptimised
Context
Object
Reconstruction
Deoptimised
Context
Deoptimised
Context
Reconstructed
Closure
Reconstructed
Temp Vector
Reconstructed
Object
Deoptimisation metadata: objects to reconstruct
Objects to reconstruct includes contexts, closures,
temp vectors
64
Stack edition
Application frame
Application frame
Application frame
requesting
deoptimisation
Stack
growing
down
65
Stack edition
Application frame
Application frame
Application frame
requesting
deoptimisation
Call-back frame
Scorch
deoptimiser frame
Scorch
deoptimiser frame
Stack
growing
down
66
Stack edition
Application frame
Application frame
Application frame
requesting
deoptimisation
Call-back frame
Scorch
deoptimiser frame
Scorch
deoptimiser frame
Stack
growing
down
Application frame
Application frame
Deoptimised
Context
Call-back frame
Scorch
deoptimiser frame
Scorch
deoptimiser frame
Scorch
deoptimiser
Deoptimised
Context
Deoptimised
Context
Working implementation
Language “features” were incompatible
Old-style Memory Manager
Literal mutability
…
67
Memory Manager
68
A Partial Read Barrier for Efficient Support of Live Object-
oriented Programming, ISMM’15
Old-style memory
representation
Old-style GC
Improved memory
representation
Efficient scavenger
Memory Manager
69
A Partial Read Barrier for Efficient Support of Live Object-
oriented Programming, ISMM’15
TOP
CONFERENCE
Old-style memory
representation
Old-style GC
Improved memory
representation
Efficient scavenger
Literal Mutability
Literals are not constants
Limited compiler optimisations
70
Literal Mutability
Read-only objects
Any object can change mutability state
Hook before object mutation
Literals are now read-only
Optimiser notified upon mutation
71
A low Overhead Per Object Write Barrier for the Cog VM,
IWST’16
Making it work
Tools needed
Debugging tools
VM simulator
Accurate VM profiler
72
Accurate VM profiler for the Cog VM, IWST’17
Making it work
73
Accurate VM profiler for the Cog VM, IWST’17BEST PAPER
AWARD
Tools needed
Debugging tools
VM simulator
Accurate VM profiler
Research problems
74
1. Architecture
Working Implementation
75
Bytecode
Bytecode
Bytecode
Bytecode Native code
1 2
1. Architecture
Development cost
2 people part time
No empirical study
76
Bytecode
Bytecode
Bytecode
Bytecode Native code
1 2
1. Architecture
Maintenance cost
A few open-source contributors
Shared back-end
77
Bytecode
Bytecode
Bytecode
Bytecode Native code
1 2
2. Warm-up
Persistence of optimised virtual functions
78
Smalltalk runtime
Virtual machine
Cogit
n-function to v-function
Machine-specific optimisations
Scorch
non optimised v-functions to
optimised v-function
Smalltalk-specific optimisations
virtual functions
(persisted across start-ups)
native functions
(discarded on shut-down)
Baseline JIT
Optimising JIT
2. Warm-up
Persistence of optimised virtual functions
79
Smalltalk runtime
Virtual machine
Cogit
n-function to v-function
Machine-specific optimisations
Scorch
non optimised v-functions to
optimised v-function
Smalltalk-specific optimisations
virtual functions
(persisted across start-ups)
native functions
(discarded on shut-down)
Baseline JIT
Optimising JIT
Sista: Saving Optimized Code in Snapshots for Fast Start-Up,
ManLang’17
2. Warm-up: Comparison
Many tiers architecture
We have 3 tiers
4+ tiers is difficult to maintain
80
2. Warm-up: Comparison
Persistence of machine code (incl. Clones)
Arguably quicker start-up
Machine dependent
Difficult to implement & maintain
Security issue (Future work)
81
2. Warm-up: Comparison
Persistence of runtime information
More compilation time
82
3. Metacircular
Scorch implemented in Smalltalk
Can optimise its own code under certain constraints
Similar to Graal-Truffle
83
84
Context
Research problems
Solution: Sista
Validation
Benchmarks
Cog: production VM
Cog+Counters: production VM + profiling counters
Sista (Cold): Sista runtime with non optimised v-functions
Sista (Warm): Sista runtime from optimised v-functions
85
86
(a) A*
0
20
40
60
80avgmsperiteration
(b) Binary tree
0
2
4
6
8
10
avgmsperiteration
(c) JSON parsing
0
2
4
6
8
10
avgmsperiteration
(d) Richards
0
2
4
6
avgmsperiteration
(e) K-Nucleotide
0
1,000
2,000
3,000
4,000
avgmsperiteration
(f) Thread ring
0
200
400
600
800
1,000
1,200
avgmsperiteration
(g) N-body
0
100
200
300
400
avgmsperiteration
(h) DeltaBlue
0
10
20
30
40
50
avgmsperiteration
(i) Mandelbrot
0
500
1,000
1,500
2,000
avgmsperiteration
(j) Spectral Norm
0
100
200
300
avgmsperiteration
(k) Meteor
0
100
200
300
avgmsperiteration
Legend
Cog
Cog + Counters
Sista (Cold)
Sista (Warm)
87
(a) A*
0
20
40
60
80avgmsperiteration
(b) Binary tree
0
2
4
6
8
10
avgmsperiteration
(c) JSON parsing
0
2
4
6
8
10
avgmsperiteration
(d) Richards
0
2
4
6
avgmsperiteration
(e) K-Nucleotide
0
1,000
2,000
3,000
4,000
avgmsperiteration
(f) Thread ring
0
200
400
600
800
1,000
1,200
avgmsperiteration
(g) N-body
0
100
200
300
400
avgmsperiteration
(h) DeltaBlue
0
10
20
30
40
50
avgmsperiteration
(i) Mandelbrot
0
500
1,000
1,500
2,000
avgmsperiteration
(j) Spectral Norm
0
100
200
300
avgmsperiteration
(k) Meteor
0
100
200
300
avgmsperiteration
Legend
Cog
Cog + Counters
Sista (Cold)
Sista (Warm)
Avg ms per iteration
The smaller the better
88
(a) A*
0
20
40
60
80avgmsperiteration
(b) Binary tree
0
2
4
6
8
10
avgmsperiteration
(c) JSON parsing
0
2
4
6
8
10
avgmsperiteration
(d) Richards
0
2
4
6
avgmsperiteration
(e) K-Nucleotide
0
1,000
2,000
3,000
4,000
avgmsperiteration
(f) Thread ring
0
200
400
600
800
1,000
1,200
avgmsperiteration
(g) N-body
0
100
200
300
400
avgmsperiteration
(h) DeltaBlue
0
10
20
30
40
50
avgmsperiteration
(i) Mandelbrot
0
500
1,000
1,500
2,000
avgmsperiteration
(j) Spectral Norm
0
100
200
300
avgmsperiteration
(k) Meteor
0
100
200
300
avgmsperiteration
Legend
Cog
Cog + Counters
Sista (Cold)
Sista (Warm)
89
(a) A*
0
20
40
60
80avgmsperiteration
(b) Binary tree
0
2
4
6
8
10
avgmsperiteration
(c) JSON parsing
0
2
4
6
8
10
avgmsperiteration
(d) Richards
0
2
4
6
avgmsperiteration
(e) K-Nucleotide
0
1,000
2,000
3,000
4,000
avgmsperiteration
(f) Thread ring
0
200
400
600
800
1,000
1,200
avgmsperiteration
(g) N-body
0
100
200
300
400
avgmsperiteration
(h) DeltaBlue
0
10
20
30
40
50
avgmsperiteration
(i) Mandelbrot
0
500
1,000
1,500
2,000
avgmsperiteration
(j) Spectral Norm
0
100
200
300
avgmsperiteration
(k) Meteor
0
100
200
300
avgmsperiteration
Legend
Cog
Cog + Counters
Sista (Cold)
Sista (Warm)
90
(a) A*
0
20
40
60
80avgmsperiteration
(b) Binary tree
0
2
4
6
8
10
avgmsperiteration
(c) JSON parsing
0
2
4
6
8
10
avgmsperiteration
(d) Richards
0
2
4
6
avgmsperiteration
(e) K-Nucleotide
0
1,000
2,000
3,000
4,000
avgmsperiteration
(f) Thread ring
0
200
400
600
800
1,000
1,200
avgmsperiteration
(g) N-body
0
100
200
300
400
avgmsperiteration
(h) DeltaBlue
0
10
20
30
40
50
avgmsperiteration
(i) Mandelbrot
0
500
1,000
1,500
2,000
avgmsperiteration
(j) Spectral Norm
0
100
200
300
avgmsperiteration
(k) Meteor
0
100
200
300
avgmsperiteration
Legend
Cog
Cog + Counters
Sista (Cold)
Sista (Warm)
91
(a) A*
0
20
40
60
80avgmsperiteration
(b) Binary tree
0
2
4
6
8
10
avgmsperiteration
(c) JSON parsing
0
2
4
6
8
10
avgmsperiteration
(d) Richards
0
2
4
6
avgmsperiteration
(e) K-Nucleotide
0
1,000
2,000
3,000
4,000
avgmsperiteration
(f) Thread ring
0
200
400
600
800
1,000
1,200
avgmsperiteration
(g) N-body
0
100
200
300
400
avgmsperiteration
(h) DeltaBlue
0
10
20
30
40
50
avgmsperiteration
(i) Mandelbrot
0
500
1,000
1,500
2,000
avgmsperiteration
(j) Spectral Norm
0
100
200
300
avgmsperiteration
(k) Meteor
0
100
200
300
avgmsperiteration
Legend
Cog
Cog + Counters
Sista (Cold)
Sista (Warm)
Deoptimiser validation
Practical Validation of Bytecode to Bytecode JIT
Compiler Dynamic Deoptimization, JOT’16
92
Good validation = Good symbolic execution = High
engineering time
Symbolic
non
optimised
stack
Symbolic
optimised
stack
Symbolic
deoptimised
stack
Scorch
deoptimiser
Symbolic
values
comparison
Runtime information
Inferring Types by Mining Class Usage Frequency
from Inline Caches, IWST’16
Mining Inline Cache Data to Order Inferred Types in
Dynamic Languages, SCP’17
93
Collaboration with SCG (Oscar Nierstrasz)
Runtime information for type inference
Contributions
&
Publications
94
95
Contributions
1 Hot spot detection in Cogit
2 Support of extended instruction set
3 VM call-backs to trigger Scorch
4 Runtime information primitive
5 Scorch
6 Spur Memory Manager
7 New bytecode set
8 Register allocation
9 Read-only objects
10 Improved closure implementation95
96
Contributions Prod
1 Hot spot detection in Cogit
2 Support of extended instruction set
3 VM call-backs to trigger Scorch
4 Runtime information primitive
5 Scorch
6 Spur Memory Manager
7 New bytecode set
8 Register allocation
9 Read-only objects
10 Improved closure implementation
In progress
In progress
In progress
In progress
In progress
97
Contributions Prod
1 Hot spot detection in Cogit
2 Support of extended instruction set
3 VM call-backs to trigger Scorch
4 Runtime information primitive
5 Scorch
6 Spur Memory Manager
7 New bytecode set
8 Register allocation
9 Read-only objects
10 Improved closure implementation
In progress
In progress
In progress
In progress
In progress
98
Contributions Prod
1 Hot spot detection in Cogit
2 Support of extended instruction set
3 VM call-backs to trigger Scorch
4 Runtime information primitive
5 Scorch
6 Spur Memory Manager
7 New bytecode set
8 Register allocation
9 Read-only objects
10 Improved closure implementation
In progress
In progress
In progress
In progress
In progress
Publications
99
1
A Partial Read Barrier for Efficient Support of Live Object-oriented
Programming
ISMM’15
2
Practical Validation of Bytecode to Bytecode JIT Compiler Dynamic
Deoptimization
JOT’16
3 Mining Inline Cache Data to Order Inferred Types in Dynamic Languages SCP’17
4 Sista: Saving Optimized Code in Snapshots for Fast Start-Up ManLang’17
5 Towards a flexible Pharo Compiler IWST’13
6 A bytecode set for adaptive optimizations IWST’14
7 Inferring Types by Mining Class Usage Frequency from Inline Caches IWST’16
8 A low Overhead Per Object Write Barrier for the Cog VM IWST’16
9 Accurate VM profiler for the Cog VM IWST’17
Conferences and Journals
Workshops
Publications
100
1
A Partial Read Barrier for Efficient Support of Live Object-oriented
Programming
ISMM’15
2
Practical Validation of Bytecode to Bytecode JIT Compiler Dynamic
Deoptimization
JOT’16
3 Mining Inline Cache Data to Order Inferred Types in Dynamic Languages SCP’17
4 Sista: Saving Optimized Code in Snapshots for Fast Start-Up ManLang’17
5 Towards a flexible Pharo Compiler IWST’13
6 A bytecode set for adaptive optimizations IWST’14
7 Inferring Types by Mining Class Usage Frequency from Inline Caches IWST’16
8 A low Overhead Per Object Write Barrier for the Cog VM IWST’16
9 Accurate VM profiler for the Cog VM IWST’17
Conferences and Journals
Workshops
TOP
CONFERENCE
Publications
101
1
A Partial Read Barrier for Efficient Support of Live Object-oriented
Programming
ISMM’15
2
Practical Validation of Bytecode to Bytecode JIT Compiler Dynamic
Deoptimization
JOT’16
3 Mining Inline Cache Data to Order Inferred Types in Dynamic Languages SCP’17
4 Sista: Saving Optimized Code in Snapshots for Fast Start-Up ManLang’17
5 Towards a flexible Pharo Compiler IWST’13
6 A bytecode set for adaptive optimizations IWST’14
7 Inferring Types by Mining Class Usage Frequency from Inline Caches IWST’16
8 A low Overhead Per Object Write Barrier for the Cog VM IWST’16
9 Accurate VM profiler for the Cog VM IWST’17
Conferences and Journals
Workshops
CORE OF THE
THESIS
Publications
102
1
A Partial Read Barrier for Efficient Support of Live Object-oriented
Programming
ISMM’15
2
Practical Validation of Bytecode to Bytecode JIT Compiler Dynamic
Deoptimization
JOT’16
3 Mining Inline Cache Data to Order Inferred Types in Dynamic Languages SCP’17
4 Sista: Saving Optimized Code in Snapshots for Fast Start-Up ManLang’17
5 Towards a flexible Pharo Compiler IWST’13
6 A bytecode set for adaptive optimizations IWST’14
7 Inferring Types by Mining Class Usage Frequency from Inline Caches IWST’16
8 A low Overhead Per Object Write Barrier for the Cog VM IWST’16
9 Accurate VM profiler for the Cog VM IWST’17
Conferences and Journals
Workshops
BEST PAPER
AWARDS
103
Smalltalk runtime
Virtual machine
Cogit
n-function to v-function
Machine-specific optimisations
Scorch
non optimised v-functions to
optimised v-function
Smalltalk-specific optimisations
virtual functions
(persisted across start-ups)
native functions
(discarded on shut-down)
Baseline JIT
Optimising JIT
1. Split optimising JIT
architecture
2. Persistence of optimised
functions
3. Metacircular JIT
optimising itself
A Partial Read Barrier for Efficient Support of Live Object-oriented Programming,
ISMM’15
Sista: Saving Optimized Code in Snapshots for Fast Start-Up, ManLang’17
A bytecode set for adaptive optimizations, IWST’14, Best paper award
Accurate VM profiler for the Cog VM, IWST’17, Best paper award

Contenu connexe

Similaire à Sista: a Metacircular Architecture for Runtime Optimisations

Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Tal Bar-Zvi
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-optJeff Larkin
 
02 intel v_tune_session_02
02 intel v_tune_session_0202 intel v_tune_session_02
02 intel v_tune_session_02Vivek chan
 
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...Stefan Marr
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceESUG
 
Delivery at Scale
Delivery at ScaleDelivery at Scale
Delivery at ScaleAgilar
 
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017Andrey Karpov
 
Keeping Your CI/CD Pipeline as Fast as It Needs to Be
Keeping Your CI/CD Pipeline as Fast as It Needs to BeKeeping Your CI/CD Pipeline as Fast as It Needs to Be
Keeping Your CI/CD Pipeline as Fast as It Needs to BeAbraham Marin-Perez
 
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"LogeekNightUkraine
 
Squeezing Blood From a Stone V1.2
Squeezing Blood From a Stone V1.2Squeezing Blood From a Stone V1.2
Squeezing Blood From a Stone V1.2Jen Costillo
 
Constraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in SchedulingConstraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in SchedulingEray Cakici
 
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"Yulia Tsisyk
 
EKON 23 Code_review_checklist
EKON 23 Code_review_checklistEKON 23 Code_review_checklist
EKON 23 Code_review_checklistMax Kleiner
 
Bounded Model Checking for C Programs in an Enterprise Environment
Bounded Model Checking for C Programs in an Enterprise EnvironmentBounded Model Checking for C Programs in an Enterprise Environment
Bounded Model Checking for C Programs in an Enterprise EnvironmentAdaCore
 
GlobalLogic Test Automation Online TechTalk “Test Driven Development as a Per...
GlobalLogic Test Automation Online TechTalk “Test Driven Development as a Per...GlobalLogic Test Automation Online TechTalk “Test Driven Development as a Per...
GlobalLogic Test Automation Online TechTalk “Test Driven Development as a Per...GlobalLogic Ukraine
 
09.50 Ernst Vrolijks
09.50 Ernst Vrolijks09.50 Ernst Vrolijks
09.50 Ernst VrolijksThemadagen
 
Code instrumentation
Code instrumentationCode instrumentation
Code instrumentationBryan Reinero
 
JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...
JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...
JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...JSFestUA
 

Similaire à Sista: a Metacircular Architecture for Runtime Optimisations (20)

Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
02 intel v_tune_session_02
02 intel v_tune_session_0202 intel v_tune_session_02
02 intel v_tune_session_02
 
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performance
 
Delivery at Scale
Delivery at ScaleDelivery at Scale
Delivery at Scale
 
Delivery at Scale
Delivery at ScaleDelivery at Scale
Delivery at Scale
 
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017
PVS-Studio. Static code analyzer. Windows/Linux, C/C++/C#. 2017
 
Keeping Your CI/CD Pipeline as Fast as It Needs to Be
Keeping Your CI/CD Pipeline as Fast as It Needs to BeKeeping Your CI/CD Pipeline as Fast as It Needs to Be
Keeping Your CI/CD Pipeline as Fast as It Needs to Be
 
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
 
Squeezing Blood From a Stone V1.2
Squeezing Blood From a Stone V1.2Squeezing Blood From a Stone V1.2
Squeezing Blood From a Stone V1.2
 
Constraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in SchedulingConstraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in Scheduling
 
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
 
EKON 23 Code_review_checklist
EKON 23 Code_review_checklistEKON 23 Code_review_checklist
EKON 23 Code_review_checklist
 
Bounded Model Checking for C Programs in an Enterprise Environment
Bounded Model Checking for C Programs in an Enterprise EnvironmentBounded Model Checking for C Programs in an Enterprise Environment
Bounded Model Checking for C Programs in an Enterprise Environment
 
GlobalLogic Test Automation Online TechTalk “Test Driven Development as a Per...
GlobalLogic Test Automation Online TechTalk “Test Driven Development as a Per...GlobalLogic Test Automation Online TechTalk “Test Driven Development as a Per...
GlobalLogic Test Automation Online TechTalk “Test Driven Development as a Per...
 
09.50 Ernst Vrolijks
09.50 Ernst Vrolijks09.50 Ernst Vrolijks
09.50 Ernst Vrolijks
 
Code instrumentation
Code instrumentationCode instrumentation
Code instrumentation
 
JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...
JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...
JS Fest 2018. Никита Галкин. Микросервисная архитектура с переиспользуемыми к...
 

Dernier

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 

Dernier (20)

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 

Sista: a Metacircular Architecture for Runtime Optimisations

  • 1. Sista: a Metacircular Architecture for Runtime Optimisations persistence Clément Béra Ph.D defense
  • 2. Ph.D Setup International collaboration Goal: transfer new knowledge to RMoD 2 France (32 months) California (4 months) RMoD Researchers Domain expert Stéphane Ducasse Marcus Denker Eliot Miranda Inria Stellect Systems LLC
  • 5. Sista ? 5 Speculative Inlining Small-Talk Architecture
  • 6. Sista ? 6 Optimising Just-in-time compiler (JIT) for Smalltalk
  • 8. Program readability 8 Technique name Program code do: selector array do: #yourself. do: block array do: [ :elem | elem yourself ]. to: do: 1 to: array size do: [ :i | (array at: i) yourself ].
  • 9. 0 20 40 60 80 100 120 0 10 20 30 40 50 60 70 80 90 100 Millions of execu-on per second Array size do: selector do: block to: do:
  • 10. 0 20 40 60 80 100 120 0 10 20 30 40 50 60 70 80 90 100 Millions of execu-on per second Array size do: selector do: block to: do: Array size = 0 to:do: 5x faster than do: block
  • 11. 0 20 40 60 80 100 120 0 10 20 30 40 50 60 70 80 90 100 Millions of execu-on per second Array size do: selector do: block to: do: Array size = 10 to:do: 2.5x faster than do: block
  • 12. 12 Technique name Program code do: selector array do: #yourself. do: block array do: [ :elem | elem yourself ]. to: do: 1 to: array size do: [ :i | (array at: i) yourself ]. Readability Performance
  • 15. Overall Smalltalk Performance 15 Faster than anything you can write Optimisations on low-level behavior that cannot be expressed at Smalltalk level
  • 16. Speculations 16 Speculative optimisations First runs non optimised Optimise based on first runs Ex: Unused branch in first runs
  • 17. Speculations 17 Speculative optimisations may be incorrect Ex: Unused branch is used Deoptimise the code at runtime Re-optimise differently
  • 18. Main existing solution 18 Three-tiers execution model Java Hotspot, Javascript V8, …
  • 19. 19 Run Number Compilation time Execution time 1 to 6 Slow 7 Low 8 to 10,000 Average 10,000 High 10,001 + Fast Tier Interpreter baseline JIT optimising JIT
  • 20. 20 Run Number Compilation time Execution time 1 to 6 Slow 7 Low 8 to 10,000 Average 10,000 High 10,001 + Fast Tier Interpreter baseline JIT optimising JIT Speculations Always correct Bytecode Native Bytecode Native Bytecode Bytecode
  • 21. 21 Existing Run Number Compilation time Execution time 1 to 6 Slow 7 Low 8 to 10,000 Average 10,000 High 10,001 + Fast Tier Interpreter baseline JIT optimising JIT
  • 22. 22 Sista Run Number Compilation time Execution time 1 to 6 Slow 7 Low 8 to 10,000 Average 10,000 High 10,001 + Fast Tier Interpreter baseline JIT optimising JIT
  • 23. Alternative architecture 23 Meta-tracing JIT Record optimised traces of loop bodies Pypy, LuaJIT
  • 24. Existing proposal 24 Eliot Miranda, with contributions of Paolo Bonzini, Steve Dahl, David Griswold, Urs Hölzle, Ian Piumarta and David Simmons Adaptive Optimiser for Small-Talk Architecture
  • 25. Key ideas Build on top of the existing VM Split optimising JIT design 25
  • 26. Build on top of existing VM Market acceptance Lower risk and investment 26
  • 27. Split optimising JIT design 27 Bytecode Bytecode Bytecode Bytecode Native code 1 2 Lowering maintenance cost 1 is written in Smalltalk More open source contributors 2 reuses the baseline JIT One back-end to maintain
  • 29. Is the split optimising JIT design possible ? 29 Bytecode Bytecode Bytecode Bytecode Native code 1 2 1. Architecture
  • 30. 1. Architecture Can we build a maintainable optimising JIT with two people part-time ? 30 Practical problem: Cannot be proven (empirical study)
  • 31. Existing solutions 31 Reusing an existing adaptable JIT Truffle RPython toolchain
  • 32. Non optimising tiers are slower to execute code Runtime compilation time Deoptimisation and re-optimisation 32 Run Number Compilation time Execution time 1 to 6 Slow 7 Low 8 to 10,000 Average 10,000 High 10,001 + Fast Tier Interpreter baseline JIT optimising JIT Time to reach peak performance
  • 33. 2. Warm-up How to reduce the time to reach peak performance ? 33
  • 34. Existing solutions Many tiers architecture Baseline JIT 4 tiers in Javascript Webkit Cloneable JVMs Work of Kawachiya et al. Clone a running JVM including native code 34
  • 35. Existing solutions Persistence of Runtime information Saving inlining decisions (Strongtalk) Shared profiling information between VMs (Arnold et al.) Persistence of Machine code Azul JRockit 35
  • 36. Metacircular JITs Written in itself Compiled AOT (RPython toolchain) Same runtime (Maxine, Graal) 36
  • 37. 3. Metacircular Can the optimising JIT optimise its own code at runtime ? 37
  • 38. Existing solution Truffle-Graal Built on top of the JVM Graal optimising JIT in Java 38
  • 41. Terminology Function: Method or Closure Virtual function VS Native function 41
  • 42. Smalltalk runtime Virtual machine Cogit n-function to v-function Machine-specific optimisations virtual functions native functions Baseline JIT Optimising JIT
  • 43. Smalltalk runtime Virtual machine Cogit n-function to v-function Machine-specific optimisations Scorch non optimised v-functions to optimised v-function Smalltalk-specific optimisations virtual functions (persisted across start-ups) native functions (discarded on shut-down) Baseline JIT Optimising JIT
  • 44. Smalltalk runtime Virtual machine Cogit n-function to v-function Machine-specific optimisations Scorch non optimised v-functions to optimised v-function Smalltalk-specific optimisations virtual functions (persisted across start-ups) native functions (discarded on shut-down) Baseline JIT Optimising JIT
  • 45. Smalltalk runtime Virtual machine Cogit n-function to v-function Machine-specific optimisations Scorch non optimised v-functions to optimised v-function Smalltalk-specific optimisations virtual functions (persisted across start-ups) native functions (discarded on shut-down) Baseline JIT Optimising JIT My work alone My work in collaboration with Stellect Systems LLC
  • 46. Optimising JIT 46 Smalltalk image Virtual machine Cogit Hot spot detection Scorch What to optimise Optimisation Installation Optimised native function Optimised virtual function
  • 47. Hot spot detection 47 Profiling counters on branches Counters as pinned byteArray on heap No cpu i-cache flush Direct reference to counter Detection: VM call-back to Scorch D5 C8 A2 5E 24 1A AD 7C 23 60 9 D2 3C A4 B9 66 28 CF A2 A8 27 A2 FA 7B 88 7E CA AD B5 43 6B EF 43 5C 48 A 55 D5 88 52 AE E5 68 77 1 D3 D8 F5 13 42 A5 F2 3B 76 7 CA 26 16 65 27 4E F6 EB 68 98 6B C3 91 42 68 8 B7 4B B3 6E 81 C5 F0 3D 44 E2 94 8E 29 61 82 93 5 D4 10 96 C3 EB C5 5 46 FB 52 61 A6 6B 44 11 BA 8E EB EA 91 70 20 C3 D7 67 E0 91 58 32 79 9A 31 50 ED 6D CB BF 6C E0 25 5B 74 82 D9 C3 E5 54 F6 87 B5 88 C1 16 65 BF B7 F1 6F E0 91 18 6E 84 2E B7 E8 3 4C 4B 92 7B 81 BE 84 4C C0 78 8C CB EB 87 7F D0 7B 58 E6 DA CD 81 3 94 8D 42 89 F6 8A 24 A8 7C B0 62 46 F5 FF E9 CC C2 8F DC ED E 70 42 AA BE A7 3D 60 A3 88 E7 FC 40 56 8 66 4C C7 22 41 86 B1 BE D0 AA D9 FD E5 6F BC 19 E8 3C 6A EA 68 62 3 38 CB FE CF 91 35 33 6F F8 8A C3 9C Non optimised native function Pinned ByteArray Counter 1A Counter 1B Counter 2A Counter 2B 0 1615 31
  • 48. 48 What to optimise ? Example>>exampleArrayLoop array do: [:item | item displayOnScreen ] Array>>do: aClosure 1 to: self size do: [:index | aClosure value: (self at: index) ]
  • 49. 49 What to optimise ? Example>>exampleArrayLoop array do: [:item | item displayOnScreen ] Array>>do: aClosure 1 to: self size do: [:index | aClosure value: (self at: index) ] Example >> exampleArrayLoop Array >> do: [:item | item displayOnScreen ] Stack growing down
  • 50. 50 What to optimise ? Example>>exampleArrayLoop array do: [:item | item displayOnScreen ] Array>>do: aClosure 1 to: self size do: [:index | aClosure value: (self at: index) ] Hot spot detected Example >> exampleArrayLoop Array >> do: [:item | item displayOnScreen ] Stack growing down Hidden branch
  • 51. 51 What to optimise ? Example>>exampleArrayLoop array do: [:item | item displayOnScreen ] Array>>do: aClosure 1 to: self size do: [:index | aClosure value: (self at: index) ] Method to optimise Hot spot detected Example >> exampleArrayLoop Array >> do: [:item | item displayOnScreen ] Stack growing down
  • 52. Virtual function optimisation 52 non optimised v-function Compiled Code Compiled Code non optimised v-function Scorch IR optimised v-function Decompilation Generation Range optimisations, Loop optimisations, ... Speculative inlining
  • 53. Bytecode optimisation Through a CFG SSA IR (similar to LLVM) Lots of details and edge cases Details in the thesis 53
  • 54. Installation 54 Where to install ? Copy down ? Collection Array Dictionary Ordered Collection Sequenceable Collection Hashed Collection Set Copy down
  • 55. Installation 55 Dependency management Optimisations such as inlining track dependencies Register optimised method
  • 56. 56 Next call uses optimised virtual function
  • 57. VM extensions Execution of optimised virtual functions Register Allocation New operations 57
  • 58. New operations • unsafe array access • arithmetic without overflow • inlined allocation • Efficient type-checks • …. 58 A bytecode set for adaptive optimizations, IWST’14
  • 59. New operations 59 A bytecode set for adaptive optimizations, IWST’14 • unsafe array access • arithmetic without overflow • inlined allocation • Efficient type-checks • …. BEST PAPER AWARD
  • 62. Traps Most trap branches are not taken Traps should be off the cpu i-cache Call-back to Scorch 62 D5 C8 A2 5E 24 1A AD 7C 23 60 9 D2 3C A4 B9 66 28 CF A2 A8 27 A2 FA 7B 88 7E CA AD B5 43 6B EF 43 5C 48 A 55 D5 88 52 AE E5 68 77 1 D3 D8 F5 13 42 A5 F2 3B 76 7 CA 26 16 65 27 4E F6 EB 68 98 6B C3 91 42 68 8 B7 4B B3 6E 81 C5 F0 3D 44 E2 94 8E 29 61 82 93 5 D4 10 96 C3 EB C5 5 46 FB 52 61 A6 6B 44 11 BA 8E EB EA 91 70 20 C3 D7 67 E0 91 58 32 79 9A 31 50 ED 6D CB BF 6C E0 25 5B 74 82 D9 C3 E5 54 F6 87 B5 88 C1 16 65 BF B7 F1 6F E0 91 18 6E 84 2E B7 E8 3 4C 4B 92 7B 81 BE 84 4C C0 78 8C CB EB 87 7F D0 7B 58 E6 DA CD 81 3 94 8D 42 89 F6 8A 24 A8 7C B0 62 46 F5 FF E9 CC C2 8F DC ED E 70 42 AA BE A7 3D 60 A3 88 E7 FC 40 56 8 66 4C C7 22 41 86 B1 BE D0 AA D9 FD E5 6F BC 19 E8 3C 6A EA 68 62 3 38 CB FE CF 91 35 33 6F F8 8A C3 9C Optimised native function Type check Trap call-back
  • 63. Object re-construction 63 Application frame requesting deoptimisation Deoptimised Context Object Reconstruction Deoptimised Context Deoptimised Context Reconstructed Closure Reconstructed Temp Vector Reconstructed Object Deoptimisation metadata: objects to reconstruct Objects to reconstruct includes contexts, closures, temp vectors
  • 64. 64 Stack edition Application frame Application frame Application frame requesting deoptimisation Stack growing down
  • 65. 65 Stack edition Application frame Application frame Application frame requesting deoptimisation Call-back frame Scorch deoptimiser frame Scorch deoptimiser frame Stack growing down
  • 66. 66 Stack edition Application frame Application frame Application frame requesting deoptimisation Call-back frame Scorch deoptimiser frame Scorch deoptimiser frame Stack growing down Application frame Application frame Deoptimised Context Call-back frame Scorch deoptimiser frame Scorch deoptimiser frame Scorch deoptimiser Deoptimised Context Deoptimised Context
  • 67. Working implementation Language “features” were incompatible Old-style Memory Manager Literal mutability … 67
  • 68. Memory Manager 68 A Partial Read Barrier for Efficient Support of Live Object- oriented Programming, ISMM’15 Old-style memory representation Old-style GC Improved memory representation Efficient scavenger
  • 69. Memory Manager 69 A Partial Read Barrier for Efficient Support of Live Object- oriented Programming, ISMM’15 TOP CONFERENCE Old-style memory representation Old-style GC Improved memory representation Efficient scavenger
  • 70. Literal Mutability Literals are not constants Limited compiler optimisations 70
  • 71. Literal Mutability Read-only objects Any object can change mutability state Hook before object mutation Literals are now read-only Optimiser notified upon mutation 71 A low Overhead Per Object Write Barrier for the Cog VM, IWST’16
  • 72. Making it work Tools needed Debugging tools VM simulator Accurate VM profiler 72 Accurate VM profiler for the Cog VM, IWST’17
  • 73. Making it work 73 Accurate VM profiler for the Cog VM, IWST’17BEST PAPER AWARD Tools needed Debugging tools VM simulator Accurate VM profiler
  • 76. 1. Architecture Development cost 2 people part time No empirical study 76 Bytecode Bytecode Bytecode Bytecode Native code 1 2
  • 77. 1. Architecture Maintenance cost A few open-source contributors Shared back-end 77 Bytecode Bytecode Bytecode Bytecode Native code 1 2
  • 78. 2. Warm-up Persistence of optimised virtual functions 78 Smalltalk runtime Virtual machine Cogit n-function to v-function Machine-specific optimisations Scorch non optimised v-functions to optimised v-function Smalltalk-specific optimisations virtual functions (persisted across start-ups) native functions (discarded on shut-down) Baseline JIT Optimising JIT
  • 79. 2. Warm-up Persistence of optimised virtual functions 79 Smalltalk runtime Virtual machine Cogit n-function to v-function Machine-specific optimisations Scorch non optimised v-functions to optimised v-function Smalltalk-specific optimisations virtual functions (persisted across start-ups) native functions (discarded on shut-down) Baseline JIT Optimising JIT Sista: Saving Optimized Code in Snapshots for Fast Start-Up, ManLang’17
  • 80. 2. Warm-up: Comparison Many tiers architecture We have 3 tiers 4+ tiers is difficult to maintain 80
  • 81. 2. Warm-up: Comparison Persistence of machine code (incl. Clones) Arguably quicker start-up Machine dependent Difficult to implement & maintain Security issue (Future work) 81
  • 82. 2. Warm-up: Comparison Persistence of runtime information More compilation time 82
  • 83. 3. Metacircular Scorch implemented in Smalltalk Can optimise its own code under certain constraints Similar to Graal-Truffle 83
  • 85. Benchmarks Cog: production VM Cog+Counters: production VM + profiling counters Sista (Cold): Sista runtime with non optimised v-functions Sista (Warm): Sista runtime from optimised v-functions 85
  • 86. 86 (a) A* 0 20 40 60 80avgmsperiteration (b) Binary tree 0 2 4 6 8 10 avgmsperiteration (c) JSON parsing 0 2 4 6 8 10 avgmsperiteration (d) Richards 0 2 4 6 avgmsperiteration (e) K-Nucleotide 0 1,000 2,000 3,000 4,000 avgmsperiteration (f) Thread ring 0 200 400 600 800 1,000 1,200 avgmsperiteration (g) N-body 0 100 200 300 400 avgmsperiteration (h) DeltaBlue 0 10 20 30 40 50 avgmsperiteration (i) Mandelbrot 0 500 1,000 1,500 2,000 avgmsperiteration (j) Spectral Norm 0 100 200 300 avgmsperiteration (k) Meteor 0 100 200 300 avgmsperiteration Legend Cog Cog + Counters Sista (Cold) Sista (Warm)
  • 87. 87 (a) A* 0 20 40 60 80avgmsperiteration (b) Binary tree 0 2 4 6 8 10 avgmsperiteration (c) JSON parsing 0 2 4 6 8 10 avgmsperiteration (d) Richards 0 2 4 6 avgmsperiteration (e) K-Nucleotide 0 1,000 2,000 3,000 4,000 avgmsperiteration (f) Thread ring 0 200 400 600 800 1,000 1,200 avgmsperiteration (g) N-body 0 100 200 300 400 avgmsperiteration (h) DeltaBlue 0 10 20 30 40 50 avgmsperiteration (i) Mandelbrot 0 500 1,000 1,500 2,000 avgmsperiteration (j) Spectral Norm 0 100 200 300 avgmsperiteration (k) Meteor 0 100 200 300 avgmsperiteration Legend Cog Cog + Counters Sista (Cold) Sista (Warm) Avg ms per iteration The smaller the better
  • 88. 88 (a) A* 0 20 40 60 80avgmsperiteration (b) Binary tree 0 2 4 6 8 10 avgmsperiteration (c) JSON parsing 0 2 4 6 8 10 avgmsperiteration (d) Richards 0 2 4 6 avgmsperiteration (e) K-Nucleotide 0 1,000 2,000 3,000 4,000 avgmsperiteration (f) Thread ring 0 200 400 600 800 1,000 1,200 avgmsperiteration (g) N-body 0 100 200 300 400 avgmsperiteration (h) DeltaBlue 0 10 20 30 40 50 avgmsperiteration (i) Mandelbrot 0 500 1,000 1,500 2,000 avgmsperiteration (j) Spectral Norm 0 100 200 300 avgmsperiteration (k) Meteor 0 100 200 300 avgmsperiteration Legend Cog Cog + Counters Sista (Cold) Sista (Warm)
  • 89. 89 (a) A* 0 20 40 60 80avgmsperiteration (b) Binary tree 0 2 4 6 8 10 avgmsperiteration (c) JSON parsing 0 2 4 6 8 10 avgmsperiteration (d) Richards 0 2 4 6 avgmsperiteration (e) K-Nucleotide 0 1,000 2,000 3,000 4,000 avgmsperiteration (f) Thread ring 0 200 400 600 800 1,000 1,200 avgmsperiteration (g) N-body 0 100 200 300 400 avgmsperiteration (h) DeltaBlue 0 10 20 30 40 50 avgmsperiteration (i) Mandelbrot 0 500 1,000 1,500 2,000 avgmsperiteration (j) Spectral Norm 0 100 200 300 avgmsperiteration (k) Meteor 0 100 200 300 avgmsperiteration Legend Cog Cog + Counters Sista (Cold) Sista (Warm)
  • 90. 90 (a) A* 0 20 40 60 80avgmsperiteration (b) Binary tree 0 2 4 6 8 10 avgmsperiteration (c) JSON parsing 0 2 4 6 8 10 avgmsperiteration (d) Richards 0 2 4 6 avgmsperiteration (e) K-Nucleotide 0 1,000 2,000 3,000 4,000 avgmsperiteration (f) Thread ring 0 200 400 600 800 1,000 1,200 avgmsperiteration (g) N-body 0 100 200 300 400 avgmsperiteration (h) DeltaBlue 0 10 20 30 40 50 avgmsperiteration (i) Mandelbrot 0 500 1,000 1,500 2,000 avgmsperiteration (j) Spectral Norm 0 100 200 300 avgmsperiteration (k) Meteor 0 100 200 300 avgmsperiteration Legend Cog Cog + Counters Sista (Cold) Sista (Warm)
  • 91. 91 (a) A* 0 20 40 60 80avgmsperiteration (b) Binary tree 0 2 4 6 8 10 avgmsperiteration (c) JSON parsing 0 2 4 6 8 10 avgmsperiteration (d) Richards 0 2 4 6 avgmsperiteration (e) K-Nucleotide 0 1,000 2,000 3,000 4,000 avgmsperiteration (f) Thread ring 0 200 400 600 800 1,000 1,200 avgmsperiteration (g) N-body 0 100 200 300 400 avgmsperiteration (h) DeltaBlue 0 10 20 30 40 50 avgmsperiteration (i) Mandelbrot 0 500 1,000 1,500 2,000 avgmsperiteration (j) Spectral Norm 0 100 200 300 avgmsperiteration (k) Meteor 0 100 200 300 avgmsperiteration Legend Cog Cog + Counters Sista (Cold) Sista (Warm)
  • 92. Deoptimiser validation Practical Validation of Bytecode to Bytecode JIT Compiler Dynamic Deoptimization, JOT’16 92 Good validation = Good symbolic execution = High engineering time Symbolic non optimised stack Symbolic optimised stack Symbolic deoptimised stack Scorch deoptimiser Symbolic values comparison
  • 93. Runtime information Inferring Types by Mining Class Usage Frequency from Inline Caches, IWST’16 Mining Inline Cache Data to Order Inferred Types in Dynamic Languages, SCP’17 93 Collaboration with SCG (Oscar Nierstrasz) Runtime information for type inference
  • 95. 95 Contributions 1 Hot spot detection in Cogit 2 Support of extended instruction set 3 VM call-backs to trigger Scorch 4 Runtime information primitive 5 Scorch 6 Spur Memory Manager 7 New bytecode set 8 Register allocation 9 Read-only objects 10 Improved closure implementation95
  • 96. 96 Contributions Prod 1 Hot spot detection in Cogit 2 Support of extended instruction set 3 VM call-backs to trigger Scorch 4 Runtime information primitive 5 Scorch 6 Spur Memory Manager 7 New bytecode set 8 Register allocation 9 Read-only objects 10 Improved closure implementation In progress In progress In progress In progress In progress
  • 97. 97 Contributions Prod 1 Hot spot detection in Cogit 2 Support of extended instruction set 3 VM call-backs to trigger Scorch 4 Runtime information primitive 5 Scorch 6 Spur Memory Manager 7 New bytecode set 8 Register allocation 9 Read-only objects 10 Improved closure implementation In progress In progress In progress In progress In progress
  • 98. 98 Contributions Prod 1 Hot spot detection in Cogit 2 Support of extended instruction set 3 VM call-backs to trigger Scorch 4 Runtime information primitive 5 Scorch 6 Spur Memory Manager 7 New bytecode set 8 Register allocation 9 Read-only objects 10 Improved closure implementation In progress In progress In progress In progress In progress
  • 99. Publications 99 1 A Partial Read Barrier for Efficient Support of Live Object-oriented Programming ISMM’15 2 Practical Validation of Bytecode to Bytecode JIT Compiler Dynamic Deoptimization JOT’16 3 Mining Inline Cache Data to Order Inferred Types in Dynamic Languages SCP’17 4 Sista: Saving Optimized Code in Snapshots for Fast Start-Up ManLang’17 5 Towards a flexible Pharo Compiler IWST’13 6 A bytecode set for adaptive optimizations IWST’14 7 Inferring Types by Mining Class Usage Frequency from Inline Caches IWST’16 8 A low Overhead Per Object Write Barrier for the Cog VM IWST’16 9 Accurate VM profiler for the Cog VM IWST’17 Conferences and Journals Workshops
  • 100. Publications 100 1 A Partial Read Barrier for Efficient Support of Live Object-oriented Programming ISMM’15 2 Practical Validation of Bytecode to Bytecode JIT Compiler Dynamic Deoptimization JOT’16 3 Mining Inline Cache Data to Order Inferred Types in Dynamic Languages SCP’17 4 Sista: Saving Optimized Code in Snapshots for Fast Start-Up ManLang’17 5 Towards a flexible Pharo Compiler IWST’13 6 A bytecode set for adaptive optimizations IWST’14 7 Inferring Types by Mining Class Usage Frequency from Inline Caches IWST’16 8 A low Overhead Per Object Write Barrier for the Cog VM IWST’16 9 Accurate VM profiler for the Cog VM IWST’17 Conferences and Journals Workshops TOP CONFERENCE
  • 101. Publications 101 1 A Partial Read Barrier for Efficient Support of Live Object-oriented Programming ISMM’15 2 Practical Validation of Bytecode to Bytecode JIT Compiler Dynamic Deoptimization JOT’16 3 Mining Inline Cache Data to Order Inferred Types in Dynamic Languages SCP’17 4 Sista: Saving Optimized Code in Snapshots for Fast Start-Up ManLang’17 5 Towards a flexible Pharo Compiler IWST’13 6 A bytecode set for adaptive optimizations IWST’14 7 Inferring Types by Mining Class Usage Frequency from Inline Caches IWST’16 8 A low Overhead Per Object Write Barrier for the Cog VM IWST’16 9 Accurate VM profiler for the Cog VM IWST’17 Conferences and Journals Workshops CORE OF THE THESIS
  • 102. Publications 102 1 A Partial Read Barrier for Efficient Support of Live Object-oriented Programming ISMM’15 2 Practical Validation of Bytecode to Bytecode JIT Compiler Dynamic Deoptimization JOT’16 3 Mining Inline Cache Data to Order Inferred Types in Dynamic Languages SCP’17 4 Sista: Saving Optimized Code in Snapshots for Fast Start-Up ManLang’17 5 Towards a flexible Pharo Compiler IWST’13 6 A bytecode set for adaptive optimizations IWST’14 7 Inferring Types by Mining Class Usage Frequency from Inline Caches IWST’16 8 A low Overhead Per Object Write Barrier for the Cog VM IWST’16 9 Accurate VM profiler for the Cog VM IWST’17 Conferences and Journals Workshops BEST PAPER AWARDS
  • 103. 103 Smalltalk runtime Virtual machine Cogit n-function to v-function Machine-specific optimisations Scorch non optimised v-functions to optimised v-function Smalltalk-specific optimisations virtual functions (persisted across start-ups) native functions (discarded on shut-down) Baseline JIT Optimising JIT 1. Split optimising JIT architecture 2. Persistence of optimised functions 3. Metacircular JIT optimising itself A Partial Read Barrier for Efficient Support of Live Object-oriented Programming, ISMM’15 Sista: Saving Optimized Code in Snapshots for Fast Start-Up, ManLang’17 A bytecode set for adaptive optimizations, IWST’14, Best paper award Accurate VM profiler for the Cog VM, IWST’17, Best paper award