SlideShare a Scribd company logo
1 of 31
CODE GENERATION IN
SERIALIZERS AND
COMPARATORS OF APACHE
FLINK
GÁBOR HORVÁTH
PARADIGM SHIFT IN BIG DATA PLATFORMS
•Applications used to be I/O bound (Network,
Disk)
•InfiniBand, SSDs reduced I/O overhead
significantly
•CPU increasingly became a bottleneck
•Even in I/O bound applications, reduced CPU
usage might mean reduced electricity costs
SERIALIZATION IN FLINK
•Several methods: Avro, Kryo, Flink
•Flink serialization is more efficient than Kryo
• Not to mention the default Java serialization
•Crucial, not just for I/O, operating on serialized
data
•Still some room for improvements
SERIALIZATION IN FLINK
INEFFICIENCIES OF CURRENT FLINK
SERIALIZERS
• Fields accessed using reflection
• Each iteration might dispatch to a different method, inhibits
inlining
• Null checks and null and subclass flags
• Extra code to deal with subclasses
• Hard to unroll the loop, upper bound is not a compile time
for (int i = 0; i < numFields; i++) {
Object o = fields[i].get(value);
if (o == null) {
target.writeBoolean(true);
} else {
target.writeBoolean(false);
fieldSerializers[i].serialize(o, target);
}
}
SEVERAL SERIALIZER RELATED
INNOVATIONS IN APACHE FLINK
•Object reusing overloads
•Delicate type system
•Code generation (not mainline yet, this talk’s
topic)
• Fix the inefficiencies of Flink serializers
RUNTIME CODE GENERATION
• Focus on POJOs (Plain Old Java Objects)
• Best ROI due to eliminating reflection
• Specialization
• No reflection for serialization (direct field access code
generated)
• No null checks, subclass handling for primitive types
• No subclass handling for final types
• Unrolled loops, better for inlining
QUESTIONNAIRE
• Who has written a custom serializer to improve
performance?
• Who has written a custom comparator to improve
performance?
• Who used Tuples instead of POJOs only to improve
performance?
Who wants performance close to Tuples with null
value support?
LET’S SEE THE NUMBERS!
6X PERFORMANCE
IMPROVEMENT
Rest of Flink Job
Serializers/Compara
tors
NINE MEN’S MORRIS BENCHMARK
•Calculates game-theoretical values of game
states
•Iterative job
•Group by, reduce, outer joins, flat maps, and
filter
•Heavy use of POJOs
•Real world complexity
LET’S SEE THE NUMBERS!
•Measured on ReducePerformance,
WordCountPojo and Nine Men’s Morris on local
machine
•Measured ReducePerformance and Nine Men’s
Morris on a cluster
•The results were consistent
LET’S SEE THE NUMBERS! (LOCAL MACHINE)
0
10
20
30
40
50
60
Serializer: Flink Handwritten Generated Handwritten
Comparator: Flink Flink Generated Generated
CLOSE TO HAND WRITTEN SERIALIZERS
•About 20% speedup compared to Flink
serializers
•Some gap left to handwritten
• Smarter getLength
• Flattening
• Null and subclass flags
• Better handling of primitives (less
boxing/unboxing, inlining)
HOW DOES THIS WORK?
HIGH LEVEL OVERVIEW: THE TRADITIONAL
WAY
POJO
Object
Serialize
d
POJO
TypeInfo
Serialize
rPOJO
Class
Instantiate
HIGH LEVEL OVERVIEW: THE NEW WAY
POJO
Object
Generate
d
Serialize
r
Serialize
d
POJO
TypeInfo
FreeMark
er
Template
Janino
Serialize
r
Generato
r
POJO
Class
ClassLoad
er
HOW TO LOAD GENERATED CODE?
•We need to serialize serializers
•First step of deserialization: load the class
•Which ClassLoader to use?
•Custom ClassLoader to the rescue!
Sourc
e
Code
Class
Loader
MULTIPLE NODES/JVMS?
JVM
A
JVM
B
Serializer ?Serializ
er
MULTIPLE NODES/JVMS?
JVM
A
JVM
B
Wrapper
Serializ
er
Serializ
er
LET’S TRY IT OUT!
Class cast exception:
SerializerA cannot be
cast to SerializerA.
LETS CACHE AND TRY IT OUT!
Class cast exception:
UserObjectA cannot be
cast to UserObjcetA.
LETS CACHE AND INVALIDATE AND TRY IT
OUT!
ACTUALLY... THERE ARE COUPLE OF MORE
•Janino bugs
•Compatibility with Scala POJO like classes
•Generated code harder to debug
•…
WHAT’S NEXT?
• Versioning serialization format
• Replace reflection where performance matters
• d.sortPartition("f0.author", Order.DESCENDING);
• Better utilization of getLength information
• Eliminate redundant null/subclass flags
• Beating Tuples!
DISTANT FUTURE
•Vision: more JVM independent optimizations!
•Columnar serialization format (end to end
optimization)
• Final goal: Faster than naive handwritten serializers!
•Customized NormalizedKeySorter
•Lots of opportunities due to the delicate type
system
CONCLUSION
•Significant performance improvement
•Ground work for lots of possible performance
improvements
•ClassLoader issues are not newcommer
friendly
•Not part of mainline Flink yet, happy to receive
reviews 
ACKNOWLEDGEMENT
•Huge thanks to GSoC:
•Márton Balassi
•Gábor Gévay
•Thanks to data
Artisans for
brainstorming
•Thanks for your

More Related Content

Viewers also liked

Viewers also liked (20)

Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
Eron Wright - Introducing Flink on Mesos
Eron Wright - Introducing Flink on MesosEron Wright - Introducing Flink on Mesos
Eron Wright - Introducing Flink on Mesos
 
Trevor Grant - Apache Zeppelin - A friendlier way to Flink
Trevor Grant - Apache Zeppelin - A friendlier way to FlinkTrevor Grant - Apache Zeppelin - A friendlier way to Flink
Trevor Grant - Apache Zeppelin - A friendlier way to Flink
 
Alexander Kolb - Flinkspector – Taming the squirrel
Alexander Kolb - Flinkspector – Taming the squirrelAlexander Kolb - Flinkspector – Taming the squirrel
Alexander Kolb - Flinkspector – Taming the squirrel
 
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkMaxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
 
Ted Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftTed Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink Drift
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
 
Julian Hyde - Streaming SQL
Julian Hyde - Streaming SQLJulian Hyde - Streaming SQL
Julian Hyde - Streaming SQL
 
Ted Dunning - Keynote: How Can We Take Flink Forward?
Ted Dunning -  Keynote: How Can We Take Flink Forward?Ted Dunning -  Keynote: How Can We Take Flink Forward?
Ted Dunning - Keynote: How Can We Take Flink Forward?
 
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
 
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs

 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache Flink
 
Jamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache Flink
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink Everywhere
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
 
Flink Case Study: Amadeus
Flink Case Study: AmadeusFlink Case Study: Amadeus
Flink Case Study: Amadeus
 

Similar to Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink

VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012
Eonblast
 
FP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleFP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit Hole
Christophe Grand
 
Exploitation and State Machines
Exploitation and State MachinesExploitation and State Machines
Exploitation and State Machines
Michael Scovetta
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new framework
Tomas Doran
 
Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...
Lucas Leong
 
Serving Deep Learning Models At Scale With RedisAI: Luca Antiga
Serving Deep Learning Models At Scale With RedisAI: Luca AntigaServing Deep Learning Models At Scale With RedisAI: Luca Antiga
Serving Deep Learning Models At Scale With RedisAI: Luca Antiga
Redis Labs
 

Similar to Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink (20)

VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012
 
FP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleFP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit Hole
 
Exploitation and State Machines
Exploitation and State MachinesExploitation and State Machines
Exploitation and State Machines
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new framework
 
CONDOR @ NGCLE@e-Novia 15.11.2017
CONDOR @ NGCLE@e-Novia 15.11.2017CONDOR @ NGCLE@e-Novia 15.11.2017
CONDOR @ NGCLE@e-Novia 15.11.2017
 
Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...
 
Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...
Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...
Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...
 
Smashing the stack with Hydra
Smashing the stack with HydraSmashing the stack with Hydra
Smashing the stack with Hydra
 
Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)
 
When DevOps and Networking Intersect by Brent Salisbury of socketplane.io
When DevOps and Networking Intersect by Brent Salisbury of socketplane.ioWhen DevOps and Networking Intersect by Brent Salisbury of socketplane.io
When DevOps and Networking Intersect by Brent Salisbury of socketplane.io
 
DConf2015 - Using D for Development of Large Scale Primary Storage
DConf2015 - Using D for Development  of Large Scale Primary StorageDConf2015 - Using D for Development  of Large Scale Primary Storage
DConf2015 - Using D for Development of Large Scale Primary Storage
 
Serving Deep Learning Models At Scale With RedisAI: Luca Antiga
Serving Deep Learning Models At Scale With RedisAI: Luca AntigaServing Deep Learning Models At Scale With RedisAI: Luca Antiga
Serving Deep Learning Models At Scale With RedisAI: Luca Antiga
 
Realtime web2012
Realtime web2012Realtime web2012
Realtime web2012
 
The_Final_Presentation
The_Final_PresentationThe_Final_Presentation
The_Final_Presentation
 
pps Matters
pps Matterspps Matters
pps Matters
 
Redis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Day Keynote Salvatore Sanfillipo Redis LabsRedis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Day Keynote Salvatore Sanfillipo Redis Labs
 
Ratpack for Real
Ratpack for RealRatpack for Real
Ratpack for Real
 
Scaling
ScalingScaling
Scaling
 
To Infiniband and Beyond
To Infiniband and BeyondTo Infiniband and Beyond
To Infiniband and Beyond
 
Security research over Windows #defcon china
Security research over Windows #defcon chinaSecurity research over Windows #defcon china
Security research over Windows #defcon china
 

More from Flink Forward

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Recently uploaded

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Recently uploaded (20)

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink

  • 1. CODE GENERATION IN SERIALIZERS AND COMPARATORS OF APACHE FLINK GÁBOR HORVÁTH
  • 2. PARADIGM SHIFT IN BIG DATA PLATFORMS •Applications used to be I/O bound (Network, Disk) •InfiniBand, SSDs reduced I/O overhead significantly •CPU increasingly became a bottleneck •Even in I/O bound applications, reduced CPU usage might mean reduced electricity costs
  • 3. SERIALIZATION IN FLINK •Several methods: Avro, Kryo, Flink •Flink serialization is more efficient than Kryo • Not to mention the default Java serialization •Crucial, not just for I/O, operating on serialized data •Still some room for improvements
  • 5. INEFFICIENCIES OF CURRENT FLINK SERIALIZERS • Fields accessed using reflection • Each iteration might dispatch to a different method, inhibits inlining • Null checks and null and subclass flags • Extra code to deal with subclasses • Hard to unroll the loop, upper bound is not a compile time for (int i = 0; i < numFields; i++) { Object o = fields[i].get(value); if (o == null) { target.writeBoolean(true); } else { target.writeBoolean(false); fieldSerializers[i].serialize(o, target); } }
  • 6. SEVERAL SERIALIZER RELATED INNOVATIONS IN APACHE FLINK •Object reusing overloads •Delicate type system •Code generation (not mainline yet, this talk’s topic) • Fix the inefficiencies of Flink serializers
  • 7. RUNTIME CODE GENERATION • Focus on POJOs (Plain Old Java Objects) • Best ROI due to eliminating reflection • Specialization • No reflection for serialization (direct field access code generated) • No null checks, subclass handling for primitive types • No subclass handling for final types • Unrolled loops, better for inlining
  • 8. QUESTIONNAIRE • Who has written a custom serializer to improve performance? • Who has written a custom comparator to improve performance? • Who used Tuples instead of POJOs only to improve performance? Who wants performance close to Tuples with null value support?
  • 9. LET’S SEE THE NUMBERS! 6X PERFORMANCE IMPROVEMENT Rest of Flink Job Serializers/Compara tors
  • 10. NINE MEN’S MORRIS BENCHMARK •Calculates game-theoretical values of game states •Iterative job •Group by, reduce, outer joins, flat maps, and filter •Heavy use of POJOs •Real world complexity
  • 11. LET’S SEE THE NUMBERS! •Measured on ReducePerformance, WordCountPojo and Nine Men’s Morris on local machine •Measured ReducePerformance and Nine Men’s Morris on a cluster •The results were consistent
  • 12. LET’S SEE THE NUMBERS! (LOCAL MACHINE) 0 10 20 30 40 50 60 Serializer: Flink Handwritten Generated Handwritten Comparator: Flink Flink Generated Generated
  • 13. CLOSE TO HAND WRITTEN SERIALIZERS •About 20% speedup compared to Flink serializers •Some gap left to handwritten • Smarter getLength • Flattening • Null and subclass flags • Better handling of primitives (less boxing/unboxing, inlining)
  • 14. HOW DOES THIS WORK?
  • 15. HIGH LEVEL OVERVIEW: THE TRADITIONAL WAY POJO Object Serialize d POJO TypeInfo Serialize rPOJO Class Instantiate
  • 16. HIGH LEVEL OVERVIEW: THE NEW WAY POJO Object Generate d Serialize r Serialize d POJO TypeInfo FreeMark er Template Janino Serialize r Generato r POJO Class ClassLoad er
  • 17.
  • 18. HOW TO LOAD GENERATED CODE? •We need to serialize serializers •First step of deserialization: load the class •Which ClassLoader to use? •Custom ClassLoader to the rescue! Sourc e Code Class Loader
  • 21. LET’S TRY IT OUT! Class cast exception: SerializerA cannot be cast to SerializerA.
  • 22.
  • 23.
  • 24. LETS CACHE AND TRY IT OUT! Class cast exception: UserObjectA cannot be cast to UserObjcetA.
  • 25.
  • 26. LETS CACHE AND INVALIDATE AND TRY IT OUT!
  • 27. ACTUALLY... THERE ARE COUPLE OF MORE •Janino bugs •Compatibility with Scala POJO like classes •Generated code harder to debug •…
  • 28. WHAT’S NEXT? • Versioning serialization format • Replace reflection where performance matters • d.sortPartition("f0.author", Order.DESCENDING); • Better utilization of getLength information • Eliminate redundant null/subclass flags • Beating Tuples!
  • 29. DISTANT FUTURE •Vision: more JVM independent optimizations! •Columnar serialization format (end to end optimization) • Final goal: Faster than naive handwritten serializers! •Customized NormalizedKeySorter •Lots of opportunities due to the delicate type system
  • 30. CONCLUSION •Significant performance improvement •Ground work for lots of possible performance improvements •ClassLoader issues are not newcommer friendly •Not part of mainline Flink yet, happy to receive reviews 
  • 31. ACKNOWLEDGEMENT •Huge thanks to GSoC: •Márton Balassi •Gábor Gévay •Thanks to data Artisans for brainstorming •Thanks for your