This is my slides of COSCUP 2015 at Taipei, Taiwan.
The material is about the engine implementation overview from a 3 month experienced mentor bug contributor.
33. Performance Disadvantage
• Immediate execution without proper redundancy
elimination and task specialized optimization
Example
Object Property Access
Obj.Prop
34. JS Object
var People = {
Name : “Me”,
Age : 1,
Gender : “M”
};
Property Value
People.Name
People.Age
People.Gender
Property Access
35. Object Internal
• A list of shapes each of which
• Represents a named property
• A vector of slots each of which
• Stores the value of the mapped property
• A shape to describe its overall attributes
Object
Name
“Me”
Shape List
SlotVectorAttr
Shape Age Gender
1 “M”
36. Object Property Access
• Object layout traversal
1. Search shape list to locate
the target property shape
2. Access slot vector with the
index found in the shape
P1
Pi
Pj
Pn
Object
37. Object Property Access
• Object layout traversal
1. Search shape list to locate
the target property shape
2. Access slot vector with the
index found in the shape
• To speed up traversal
• Attach hash tables with some
shapes for table indexing
P1
Pi
Pj
Pn
Object
Pi
Pj
38. Performance Gap
lea eax, obj
mov ebx, [eax + 4]
AoT Compilation
Direct access Slow object
layout traversal
struct Object {
int Prop1;
int Prop2;
};
int prop = obj -> Prop2;
var obj = {
Prop1 : 1,
Prop2 : 2,
}
var prop = obj.Prop2;
Interpretation
VS
GetName obj
GetProp Prop2
39. Can we improve the performance?
In addition to object property access,
Still many issues…
40. Can we improve the performance?
In addition to object property access,
Still many issues…
Interpretation
JIT Compilation
41. JIT Compilation
• Generate extremely fast native code
• Baseline for hot methods
• Inline cache to speed up dynamic property lookup
• IonMonkey for very hot methods
• Comprehensive optimization to remove redundancy
42. Inline Cache
• Objective
• Mitigate the overhead of object layout traversal
for each single property access
• Idea
• Cache the resolved value after dynamic lookup
• Emit a piece of direct access code for that value
44. Inline Cache
var res = obj.prop;
GetName “obj”
GetProp “prop”
Dynamic lookup logic
45. Inline Cache
• Efficient code for direct access
• But if obj is modified, the code will be unsafe
var res = obj.prop;
GetName “obj”
GetProp “prop” mov eax, obj
mov eax, [eax + OfstSlot]
46. Direct Access Guard
• If an object is modified with property insertion or
deletion, its layout is also changed
• Execute the cached code may cause invalid access
• Need a guard to check for object modification
• Object remains the same, enter cached code
• Otherwise, fallback to dynamic lookup and reoptimize
47. Direct Access Guard
• Benefit from object shape
• Object has a shape to describe its overall attribute
• The object shape is synchronized with its layout
48. Direct Access Guard
• Benefit from object shape
• Object has a shape to describe its overall attribute
• The object shape is synchronized with its layout
• Applying object shape to guard the cached code
mov eax, obj
cmp [eax + ShapeOfst], CachedShape
52. Inline Cache Instance
Prologue Interpreter Callback
mov eax, obj 1. Resolve designated property
2. Generate direct access code
3. Modify original call site
cmp [eax+ShapeOfst], CachedShape
jne MISS
mov eax, [eax+CachedSlotOfst]
jmp EXIT
MISS:
call VM_CallBack
EXIT:
Cached code
call VM_CallBack
call Cached_Code
53. Inline Cache Instance
Prologue Interpreter Callback
mov eax, obj 1. Resolve designated property
2. Generate direct access code
3. Modify original call site
4. Jump to cached code
cmp [eax+ShapeOfst], CachedShape
jne MISS
mov eax, [eax+CachedSlotOfst]
jmp EXIT
MISS:
call VM_CallBack
EXIT:
Cached code
call VM_CallBack
call Cached_Code
54. Inline Cache Instance
Prologue Interpreter Callback
mov eax, obj 1. Resolve designated property
2. Generate direct access code
3. Modify original call site
4. Jump to cached code
cmp [eax+ShapeOfst], CachedShape
jne MISS
mov eax, [eax+CachedSlotOfst]
jmp EXIT
MISS:
call VM_CallBack
EXIT:
Cached code
call VM_CallBack
call Cached_Code
After code linking,
It will be direct access,
If shape not changed
55. What If ...
var dog = {
Name : “dog”,
Bow : function( ){ },
}
var cat = {
Name : “cat”,
Meow : function( ){ },
}
for (var i = 0 ; i < 100 ; i++) {
WhoAmI(dog);
WhoAmI(cat);
}
function WhoAmI (obj)
{ return obj.Name; }
dog cat dog cat . . .
Expensive cache and flush
56. Polymorphic IC
• Cache multiple sets of object shapes and the
resolved values
cmp [eax+ShapeOfst], CachedShape1
jne SHAPE2
mov eax, [eax+CachedSlotOfst1]
jmp EXIT
SHAPE2:
cmp [eax+ShapeOfst], CachedShape2
jne SHAPE3
mov eax, [eax+CachedSlotOfst2]
jmp EXIT
………
MISS:
call VM_CallBack
EXIT:
57. IonMonkey
• Translate bytecode to static single assignment
form (SSA) and build control flow graph
• Apply data and control flow hybrid optimization
• Translate optimized SSAs to native code
59. Static Single Assignment
• Each expression has at most 3 operands
• Each target operand has an unique assignment
X = 1
X = 2
Y = X + 1
Z = 3
Y = X + 2
X1 = 1
X2 = 2
Y1 = X2 + 1
Z1 = 3
Y2 = X2 + 2
Original Code SSA Form
60. Control Flow Graph
• The control flow relation
among basic blocks
• Basic block
Consecutiveinstructionswith
last one as control transferGotoCond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
Cond
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T F
T F
B2 B3
B4 B5
B1
62. Value Numbering
• Eliminate redundant expressions
X1 = A1 + B1
Y1 = 1
Z1 = A1 + B1
X1 = A1 + B1
Y1 = 1
Z1 = X1
• Often combined with other optimizations
• Constant folding and propagation
• Expression simplification
• Unreachable code elimination
63. Value Numbering
• Assign a hash value to each expression
• Expressions containing the same value of a
former expression can be reduced
• Same set of source values
• Same operator considering algebraic commutative
X1 = A1 + B1
Z1 = B1 + A1
(+,V1,V2) V3
Hash Key Value
Z1 = X1
72. Extend to Global Scope
• Require analysis for dominating relation in CFG
• For exprs e1 and e2, e2 can be reduced if
• e2 has the same value with e1
• e1 dominates e2 in CFG, that is, all paths from entry
point to e2 must go through e1
• Examine basic blocks in reverse post order
• Guarantee dominating exprs are handled first
73. Global Scope
GotoCond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
T1 = A1 – B1
Z1 > 3
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T F
T F
B1
B2 B3
B4 B5
• Dominating relation
• B1 dominates B2,B3,B4,B5
• Reverse post order
• B1, B3, B2, B5, B4
• In B1
• In B4
74. Global Scope
GotoCond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
T1 = A1 – B1
Z1 > 3
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T F
T F
B1
B2 B3
B4 B5
• Dominating relation
• B1 dominates B2,B3,B4,B5
• Reverse post order
• B1, B3, B2, B5, B4
• In B1
• In B4
75. Global Scope
GotoCond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
T1 = A1 – B1
Z1 > 3
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T F
T F
B1
B2 B3
B4 B5
• Dominating relation
• B1 dominates B2,B3,B4,B5
• Reverse post order
• B1, B3, B2, B5, B4
• In B1
• Z1 = 6
• In B4
76. Global Scope
Cond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
T1 = A1 – B1
Z1 > 3
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T
T F
B1
B2
B4 B5
• Dominating relation
• B1 dominates B2,B3,B4,B5
• Reverse post order
• B1, B3, B2, B5, B4
• In B1
• Z1 = 6
• B3 is removed via UCE
• In B4
77. Global Scope
Cond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
T1 = A1 – B1
Z1 > 3
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T
T F
B1
B2
B4 B5
• Dominating relation
• B1 dominates B2,B3,B4,B5
• Reverse post order
• B1, B3, B2, B5, B4
• In B1
• Z1 = 6
• B3 is removed via UCE
• In B4
78. Global Scope
Cond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
T1 = A1 – B1
Z1 > 3
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T
T F
B1
B2
B4 B5
• Dominating relation
• B1 dominates B2,B3,B4,B5
• Reverse post order
• B1, B3, B2, B5, B4
• In B1
• Z1 = 6
• B3 is removed via UCE
• In B4
• V1 =Y1
79. Global Scope
Cond
X1 = 3
Y1 = A1+B1
Z1 = X1+ 3
T1 = A1 – B1
Z1 > 3
V1 = A1+B1
W1 = B1- 3
U1 = B1- 3
T
T F
B1
B2
B4 B5
• Dominating relation
• B1 dominates B2,B3,B4,B5
• Reverse post order
• B1, B3, B2, B5, B4
• In B1
• Z1 = 6
• B3 is removed via UCE
• In B4
• V1 =Y1
• W1 cannot be simplified
80. Loop Invariant Code Motion
• Hoist the loop invariant exprs outside the loop
• For a loop invariant expression x = y + z
• y and z should not depend on the operands defined
in the loop
83. More Optimizations
• SSA and control flow optimizations
• Dead code elimination
• Value range analysis
• Loop unrolling
• And more . . .
• Native code generation
• Linear scan register allocation
• And more . . .
84. Conclusion
•Under the hood of SpiderMonkey
•General but slow bytecode interpretation
•Two level JIT optimizations for hot codes
85. About Me
Security Researcher from
DSNS Lab @ NCTU
• Interests
• Virtual Machine
• Binary Translation
• Current Works
• Android Code Obfuscation
• App Protection