SlideShare une entreprise Scribd logo
1  sur  19
Tracing versus Partial Evaluation
Which Meta-Compilation Approach is
Better for Self-Optimizing
Interpreters?
Stefan Marr, Stéphane Ducasse
OOPSLA, October 28, 2015
Work Done At
Disclaimer
2
I am currently funded by
* Würthinger, T.; Wimmer, C.; Wöß A.; Stadler, L.; Duboscq, G.; Humer, C.; Richards, G.; Simon, D. & Wolczko, M,
One VM to Rule Them All,
in Proceedings of the 2013 ACM International Symposium on New Ideas,
New Paradigms, and Reflections on Programming & Software, ACM.
Oracle Labs
3
Compare Concrete Systems
Truffle + Graal
with Partial Evaluation
RPython
with Meta-Tracing
[3] Würthinger et al., One VM to Rule Them All, Onward!
2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT
Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
Oracle Labs
Selecting A Case Study
 On both Systems
5
 Self-Optimizing AST Interpreter
Represents Large Group of
Dynamic Languages
Dynamically Typed (Smalltalk)
Classes
(and everything is an Object)
Closures (lambdas)
Non-local Returns
(almost exceptions)
Set of Benchmark
6
http://som-st.github.io
SOMMT versus SOMPE
Meta-Tracing Partial Evaluation
7
cnt
1
+
cnt:
=
if
cnt:
=
0
cnt
1
+
cnt:
=if cnt:
=
0
[3] Würthinger et al., One VM to Rule Them
All, Onward! 2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy's
Tracing JIT Compiler, ICOOOLPS Workshop
2009, ACM, pp. 18-25.
WHICH APPROACH IS FASTER FAST?
minimal amount of engineering to get good performance
8
Peak Performance of Basic Interpreters
Runtime
Normalized
to Java 8
(lower is
better)
Compiled
SOM[MT]
Compiled
SOM[PE]
10
100
Bounce
BubbleSort
DeltaBlue
Fannkuch
GraphSearch
Json
Mandelbrot
NBody
PageRank
Permute
Queens
QuickSort
Richards
Sieve
Storage
Towers
Bounce
BubbleSort
DeltaBlue
Fannkuch
GraphSearch
Json
Mandelbrot
NBody
PageRank
Permute
Queens
QuickSort
Richards
Sieve
Storage
Towers
Runtimenormalizedto
Java(compiledorinterpreted)
SOMMT on RPython SOMPE on Truffle
Minimal SOMMT
5.5x slower
min. 1.6x
max. 14x
Minimal SOMPE
170x slower
min. 60x
max. 600x
WHICH APPROACH IS THE FASTEST?
best peak performance
10
Which Self-Optimizations Should a
Language Implementer Add?
• Type-specialize variables
• Type-specialize object fields
• Type-specialize collection storage
• Lower control structures from library
• Lower common library operations
• Inline caching
• Inline primitive operations
• Cache globals
• …
11
Peak Performance of Optimized Interpreter
Compiled
SOM[MT]
Compiled
SOM[PE]
1
4
8
12
Bounce
BubbleSort
DeltaBlue
Fannkuch
GraphSearch
Json
Mandelbrot
NBody
PageRank
Permute
Queens
QuickSort
Richards
Sieve
Storage
Towers
Bounce
BubbleSort
DeltaBlue
Fannkuch
GraphSearch
Json
Mandelbrot
NBody
PageRank
Permute
Queens
QuickSort
Richards
Sieve
Storage
Towers
Runtimenormalizedto
Java(compiledorinterpreted)
SOMMT on RPython SOMPE on Truffle
Runtime
Normalized
to Java 8
(lower is
better)
Optimized SOMMT
3x slower
min. 1.5x
max. 11x
OptimizedSOMPE
2.3x slower
min. 4%
max. 4.9x
2.4x
speedup
80x
speedup
Optimization Impact on SOMPE
13
I
I
I
I
I
I
I
I
I
I
I
I
I
lower control structures
inline caching
cache globals
typed fields
lower common ops
array strategies
inline basic ops.
typed vars
opt. local vars
baseline
min. escaping closures
typed args
catch−return nodes 0.85
1.00
1.20
1.50
2.00
3.00
4.00
5.00
7.00
8.00
10.00
12.00
Speedup Factor
(higher is better, logarithmic scale)Speedup Factor
(higher is better, logarithmic scale)
Implementation Sizes
RPython
From Minimal to Optimized
+57% LOC
From 3,455 LOC to 5,414 LOC
Truffle
From Minimal to Optimized
+ 103% LOC
From 5,424 LOC to 11,037 LOC
14
The Way I write
Python
The Way I write
Java
WHICH APPROACH GIVES BETTER
STARTUP PERFORMANCE?
Considering the User-Perceived System Performance
15
Measuring “Whole Program” Runtime
16
4
8
12
16
0 200 400 600
GeoMeanOf(Wall−ClockTimeforxIterations,dividedbycorrespondingJavaresult)
VM
Java
RTruffleSOM−jit−ex
TruffleSOM−graal−n
Wall−Clock Behavior for Various Run Lengths: Aggregation over all Benchmarks
FactoroverJava,forx-iterations
Iterations of Benchmark in Same Process
8sec 25sec 46sec
• Process Start to Finish
• Overall Wall-clock time
• Normalized to Java
Java
SOMMT
SOMPE
CONCLUSIONS
17
Tracing vs. Partial Evaluation
• Peak performance seems similar
– No indications of conceptual limitations
• Startup Performance
– Unclear, tiered compilation?
• But, tracing is faster fast!
– Requires less optimizations
– Better ‘prototype’ performance
18
Peak Performance of Optimized Interpreter
Compiled
SOM[MT]
Compiled
SOM[PE]
1
4
8
12
Bounce
BubbleSort
DeltaBlue
Fannkuch
GraphSearch
Json
Mandelbrot
NBody
PageRank
Permute
Queens
QuickSort
Richards
Sieve
Storage
Towers
Bounce
BubbleSort
DeltaBlue
Fannkuch
GraphSearch
Json
Mandelbrot
NBody
PageRank
Permute
Queens
QuickSort
Richards
Sieve
Storage
Towers
Runtimenormalizedto
Java(compiledorinterpreted)
SOMMT on RPython SOMPE on Truffle
Runtime
Normalized
to Java 8
(lower is
better)
Optimized SOMMT
3x slower
min. 1.5x
max. 11x
OptimizedSOMPE
2.3x slower
min. 4%
max. 4.9x

Contenu connexe

Tendances

What make Swift Awesome
What make Swift AwesomeWhat make Swift Awesome
What make Swift Awesome
Sokna Ly
 
Run * on the JVM - Simonyi Conference Budapest April 15
Run * on the JVM - Simonyi Conference Budapest April 15Run * on the JVM - Simonyi Conference Budapest April 15
Run * on the JVM - Simonyi Conference Budapest April 15
Balázs Varga
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
ISSEL
 

Tendances (20)

What make Swift Awesome
What make Swift AwesomeWhat make Swift Awesome
What make Swift Awesome
 
Concurrency and Python - PyCon MY 2015
Concurrency and Python - PyCon MY 2015Concurrency and Python - PyCon MY 2015
Concurrency and Python - PyCon MY 2015
 
An Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in JavaAn Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in Java
 
Clojure
ClojureClojure
Clojure
 
PostgreSQL and Compressed Documents (pgconf.ru 2018)
PostgreSQL and Compressed Documents (pgconf.ru 2018)PostgreSQL and Compressed Documents (pgconf.ru 2018)
PostgreSQL and Compressed Documents (pgconf.ru 2018)
 
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
 
Iron Languages - NYC CodeCamp 2/19/2011
Iron Languages - NYC CodeCamp 2/19/2011Iron Languages - NYC CodeCamp 2/19/2011
Iron Languages - NYC CodeCamp 2/19/2011
 
The impact of supercomputers on MSR
The impact of supercomputers on MSRThe impact of supercomputers on MSR
The impact of supercomputers on MSR
 
Loom and concurrency latest
Loom and concurrency latestLoom and concurrency latest
Loom and concurrency latest
 
Reactive Programming and RxJS
Reactive Programming and RxJSReactive Programming and RxJS
Reactive Programming and RxJS
 
Algorithm Complexity & Big-O Analysis
Algorithm Complexity & Big-O AnalysisAlgorithm Complexity & Big-O Analysis
Algorithm Complexity & Big-O Analysis
 
Run * on the JVM - Simonyi Conference Budapest April 15
Run * on the JVM - Simonyi Conference Budapest April 15Run * on the JVM - Simonyi Conference Budapest April 15
Run * on the JVM - Simonyi Conference Budapest April 15
 
Why Functional Programming and Clojure - LightningTalk
Why Functional Programming and Clojure - LightningTalkWhy Functional Programming and Clojure - LightningTalk
Why Functional Programming and Clojure - LightningTalk
 
Clojure made-simple - John Stevenson
Clojure made-simple - John StevensonClojure made-simple - John Stevenson
Clojure made-simple - John Stevenson
 
RxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScriptRxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScript
 
Cilk - An Efficient Multithreaded Runtime System
Cilk - An Efficient Multithreaded Runtime SystemCilk - An Efficient Multithreaded Runtime System
Cilk - An Efficient Multithreaded Runtime System
 
DLR MCQs
DLR MCQsDLR MCQs
DLR MCQs
 
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...
 
Iron* - An Introduction to Getting Dynamic on .NET
Iron* - An Introduction to Getting Dynamic on .NETIron* - An Introduction to Getting Dynamic on .NET
Iron* - An Introduction to Getting Dynamic on .NET
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
 

Similaire à Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?

MongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB
 
EclipseCon Eu 2015 - Breathe life into your Designer!
EclipseCon Eu 2015 - Breathe life into your Designer!EclipseCon Eu 2015 - Breathe life into your Designer!
EclipseCon Eu 2015 - Breathe life into your Designer!
melbats
 
B2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftB2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draft
Steve Feldman
 

Similaire à Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters? (20)

Ruby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xRuby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3x
 
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
 
Shorten Device Boot Time for Automotive IVI and Navigation Systems
Shorten Device Boot Time for Automotive IVI and Navigation SystemsShorten Device Boot Time for Automotive IVI and Navigation Systems
Shorten Device Boot Time for Automotive IVI and Navigation Systems
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?
 
Object- Relational Persistence in Smalltalk
Object- Relational Persistence in SmalltalkObject- Relational Persistence in Smalltalk
Object- Relational Persistence in Smalltalk
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
 
Performance Testing Java Applications
Performance Testing Java ApplicationsPerformance Testing Java Applications
Performance Testing Java Applications
 
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Ch1
Ch1Ch1
Ch1
 
Ch1
Ch1Ch1
Ch1
 
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps Perspective
 
10 Ways To Improve Your Code
10 Ways To Improve Your Code10 Ways To Improve Your Code
10 Ways To Improve Your Code
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 
EclipseCon Eu 2015 - Breathe life into your Designer!
EclipseCon Eu 2015 - Breathe life into your Designer!EclipseCon Eu 2015 - Breathe life into your Designer!
EclipseCon Eu 2015 - Breathe life into your Designer!
 
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
Lessons Learned in Software Development: QA Infrastructure – Maintaining Rob...
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
B2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftB2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draft
 
computer architecture.
computer architecture.computer architecture.
computer architecture.
 
OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11
 

Plus de Stefan Marr

Building High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortBuilding High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low Effort
Stefan Marr
 
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Stefan Marr
 
Sly and the RoarVM: Exploring the Manycore Future of Programming
Sly and the RoarVM: Exploring the Manycore Future of ProgrammingSly and the RoarVM: Exploring the Manycore Future of Programming
Sly and the RoarVM: Exploring the Manycore Future of Programming
Stefan Marr
 
Metaprogrammierung und Reflection
Metaprogrammierung und ReflectionMetaprogrammierung und Reflection
Metaprogrammierung und Reflection
Stefan Marr
 
Traits: A New Language Feature for PHP?
Traits: A New Language Feature for PHP?Traits: A New Language Feature for PHP?
Traits: A New Language Feature for PHP?
Stefan Marr
 

Plus de Stefan Marr (19)

Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
Metaprogramming, Metaobject Protocols, Gradual Type Checks: Optimizing the "U...
 
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
Zero-Overhead Metaprogramming: Reflection and Metaobject Protocols Fast and w...
 
Building High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortBuilding High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low Effort
 
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile ActorsCloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
 
Supporting Concurrency Abstractions in High-level Language Virtual Machines
Supporting Concurrency Abstractions in High-level Language Virtual MachinesSupporting Concurrency Abstractions in High-level Language Virtual Machines
Supporting Concurrency Abstractions in High-level Language Virtual Machines
 
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...
Identifying A Unifying Mechanism for the Implementation of Concurrency Abstra...
 
Sly and the RoarVM: Parallel Programming with Smalltalk
Sly and the RoarVM: Parallel Programming with SmalltalkSly and the RoarVM: Parallel Programming with Smalltalk
Sly and the RoarVM: Parallel Programming with Smalltalk
 
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Mul...
 
Sly and the RoarVM: Exploring the Manycore Future of Programming
Sly and the RoarVM: Exploring the Manycore Future of ProgrammingSly and the RoarVM: Exploring the Manycore Future of Programming
Sly and the RoarVM: Exploring the Manycore Future of Programming
 
PHP.next: Traits
PHP.next: TraitsPHP.next: Traits
PHP.next: Traits
 
The Price of the Free Lunch: Programming in the Multicore Era
The Price of the Free Lunch: Programming in the Multicore EraThe Price of the Free Lunch: Programming in the Multicore Era
The Price of the Free Lunch: Programming in the Multicore Era
 
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...
Locality and Encapsulation: A Foundation for Concurrency Support in Multi-Lan...
 
Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fi...
Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fi...Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fi...
Insertion Tree Phasers: Efficient and Scalable Barrier Synchronization for Fi...
 
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...
Encapsulation and Locality: A Foundation for Concurrency Support in Multi-Lan...
 
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...
Intermediate Language Design of High-level Language VMs: Towards Comprehensiv...
 
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from...
 
VMADL: An Architecture Definition Language for Variability and Composition ...
VMADL: An Architecture Definition Language  for Variability and Composition  ...VMADL: An Architecture Definition Language  for Variability and Composition  ...
VMADL: An Architecture Definition Language for Variability and Composition ...
 
Metaprogrammierung und Reflection
Metaprogrammierung und ReflectionMetaprogrammierung und Reflection
Metaprogrammierung und Reflection
 
Traits: A New Language Feature for PHP?
Traits: A New Language Feature for PHP?Traits: A New Language Feature for PHP?
Traits: A New Language Feature for PHP?
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?

  • 1. Tracing versus Partial Evaluation Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters? Stefan Marr, Stéphane Ducasse OOPSLA, October 28, 2015 Work Done At
  • 2. Disclaimer 2 I am currently funded by * Würthinger, T.; Wimmer, C.; Wöß A.; Stadler, L.; Duboscq, G.; Humer, C.; Richards, G.; Simon, D. & Wolczko, M, One VM to Rule Them All, in Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, ACM. Oracle Labs
  • 3. 3
  • 4. Compare Concrete Systems Truffle + Graal with Partial Evaluation RPython with Meta-Tracing [3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204. [2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25. Oracle Labs
  • 5. Selecting A Case Study  On both Systems 5  Self-Optimizing AST Interpreter
  • 6. Represents Large Group of Dynamic Languages Dynamically Typed (Smalltalk) Classes (and everything is an Object) Closures (lambdas) Non-local Returns (almost exceptions) Set of Benchmark 6 http://som-st.github.io
  • 7. SOMMT versus SOMPE Meta-Tracing Partial Evaluation 7 cnt 1 + cnt: = if cnt: = 0 cnt 1 + cnt: =if cnt: = 0 [3] Würthinger et al., One VM to Rule Them All, Onward! 2013, ACM, pp. 187-204. [2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
  • 8. WHICH APPROACH IS FASTER FAST? minimal amount of engineering to get good performance 8
  • 9. Peak Performance of Basic Interpreters Runtime Normalized to Java 8 (lower is better) Compiled SOM[MT] Compiled SOM[PE] 10 100 Bounce BubbleSort DeltaBlue Fannkuch GraphSearch Json Mandelbrot NBody PageRank Permute Queens QuickSort Richards Sieve Storage Towers Bounce BubbleSort DeltaBlue Fannkuch GraphSearch Json Mandelbrot NBody PageRank Permute Queens QuickSort Richards Sieve Storage Towers Runtimenormalizedto Java(compiledorinterpreted) SOMMT on RPython SOMPE on Truffle Minimal SOMMT 5.5x slower min. 1.6x max. 14x Minimal SOMPE 170x slower min. 60x max. 600x
  • 10. WHICH APPROACH IS THE FASTEST? best peak performance 10
  • 11. Which Self-Optimizations Should a Language Implementer Add? • Type-specialize variables • Type-specialize object fields • Type-specialize collection storage • Lower control structures from library • Lower common library operations • Inline caching • Inline primitive operations • Cache globals • … 11
  • 12. Peak Performance of Optimized Interpreter Compiled SOM[MT] Compiled SOM[PE] 1 4 8 12 Bounce BubbleSort DeltaBlue Fannkuch GraphSearch Json Mandelbrot NBody PageRank Permute Queens QuickSort Richards Sieve Storage Towers Bounce BubbleSort DeltaBlue Fannkuch GraphSearch Json Mandelbrot NBody PageRank Permute Queens QuickSort Richards Sieve Storage Towers Runtimenormalizedto Java(compiledorinterpreted) SOMMT on RPython SOMPE on Truffle Runtime Normalized to Java 8 (lower is better) Optimized SOMMT 3x slower min. 1.5x max. 11x OptimizedSOMPE 2.3x slower min. 4% max. 4.9x 2.4x speedup 80x speedup
  • 13. Optimization Impact on SOMPE 13 I I I I I I I I I I I I I lower control structures inline caching cache globals typed fields lower common ops array strategies inline basic ops. typed vars opt. local vars baseline min. escaping closures typed args catch−return nodes 0.85 1.00 1.20 1.50 2.00 3.00 4.00 5.00 7.00 8.00 10.00 12.00 Speedup Factor (higher is better, logarithmic scale)Speedup Factor (higher is better, logarithmic scale)
  • 14. Implementation Sizes RPython From Minimal to Optimized +57% LOC From 3,455 LOC to 5,414 LOC Truffle From Minimal to Optimized + 103% LOC From 5,424 LOC to 11,037 LOC 14 The Way I write Python The Way I write Java
  • 15. WHICH APPROACH GIVES BETTER STARTUP PERFORMANCE? Considering the User-Perceived System Performance 15
  • 16. Measuring “Whole Program” Runtime 16 4 8 12 16 0 200 400 600 GeoMeanOf(Wall−ClockTimeforxIterations,dividedbycorrespondingJavaresult) VM Java RTruffleSOM−jit−ex TruffleSOM−graal−n Wall−Clock Behavior for Various Run Lengths: Aggregation over all Benchmarks FactoroverJava,forx-iterations Iterations of Benchmark in Same Process 8sec 25sec 46sec • Process Start to Finish • Overall Wall-clock time • Normalized to Java Java SOMMT SOMPE
  • 18. Tracing vs. Partial Evaluation • Peak performance seems similar – No indications of conceptual limitations • Startup Performance – Unclear, tiered compilation? • But, tracing is faster fast! – Requires less optimizations – Better ‘prototype’ performance 18
  • 19. Peak Performance of Optimized Interpreter Compiled SOM[MT] Compiled SOM[PE] 1 4 8 12 Bounce BubbleSort DeltaBlue Fannkuch GraphSearch Json Mandelbrot NBody PageRank Permute Queens QuickSort Richards Sieve Storage Towers Bounce BubbleSort DeltaBlue Fannkuch GraphSearch Json Mandelbrot NBody PageRank Permute Queens QuickSort Richards Sieve Storage Towers Runtimenormalizedto Java(compiledorinterpreted) SOMMT on RPython SOMPE on Truffle Runtime Normalized to Java 8 (lower is better) Optimized SOMMT 3x slower min. 1.5x max. 11x OptimizedSOMPE 2.3x slower min. 4% max. 4.9x

Notes de l'éditeur

  1. It is about how to determine the compilation unit. Remember, the interpreter is implemented in one language, and the compilation works on the meta-level. The main idea is that we want to take the implementation, add information from the execution context, and use that to do very aggressive and speculative optimizations on the interpreter implementation. This avoids the need to write custom JIT compilers.
  2. VM type BenchRatio.geomean BenchRatio.min BenchRatio.max 1 Java Compiled 1.000000 1.000000 1.00000 2 SOM[MT] Compiled 5.528967 1.565665 13.90805 3 SOM[PE] Compiled 176.488620 63.952457 606.62440 >
  3. Type-specialize function arguments Min. escaping closures Catch-return nodes Opt. local vars Min escaping vards
  4. Cores time.ms time.s time.m 1 1 2428.125 2.428125 0.04046875 2 5 3617.917 3.617917 0.06029861 3 10 4930.000 4.930000 0.08216667 4 50 13810.625 13.810625 0.23017708 5 100 24861.250 24.861250 0.41435417 6 200 46516.250 46.516250 0.77527083 7 400 89221.875 89.221875 1.48703125 8 500 110605.417 110.605417 1.84342361 9 750 164434.583 164.434583 2.74057639 10 1000 217541.875 217.541875 3.62569792 11 1250 270658.750 270.658750 4.51097917 12 1500 325657.917 325.657917 5.42763194