SlideShare a Scribd company logo
1 of 24
Download to read offline
Presto
chenchun@meituan.com
Thursday, 24 April, 14
Content
• Background
• Architecture
• Key points for low query latency
• TPCH benchmark test
• What we do
• Reference
Thursday, 24 April, 14
Background
• 300+PB data stored in Hadoop/HDFS-based clusters
• More queries and get results faster improves
analysts, data scientists, and engineers productivity
• MapReduce and Hive are designed for large-scale,
reliable computation
• External projects too nascent or did not meet our
requirements for flexibility and scale
Thursday, 24 April, 14
Architecture
Thursday, 24 April, 14
Key points for low latency
• In memory parallel computing
• Pipeline
• Data local computation
• Data cache
• Dynamic compile part of the plan to byte code
• Careful use of memory and data structure
• BlinkDB liked approximate queries
• Traditional SQL optimize
• GC control
Thursday, 24 April, 14
Compile flow
Thursday, 24 April, 14
In memory parallel computing
select c1.rank, count(*) from dim.city c1 join dim.city c2 on c1.id = c2.id
where c1.id > 10 group by c1.rank limit 10;
Thursday, 24 April, 14
In memory parallel computing
Thursday, 24 April, 14
In memory parallel computing
Thursday, 24 April, 14
In memory parallel computing
• PlanDistribution=Source
– InputSplit[] splits =
inputFormat.getSplits(jobConf
, 0);
• PlanDistribution=Hash
– Hash Shuffle
– Fixed Workers
– query.initial-hash-partitions
Thursday, 24 April, 14
SplitRunner thread number
task.shard.max-threads=availableProcessors() * 4
Pipeline - TaskExecutor
Thursday, 24 April, 14
Pipeline - Operator process flow
Page(max page size: 1MB, max rows: 16 * 1024 )
Thursday, 24 April, 14
Pipeline - ExchangeOperator
Thursday, 24 April, 14
Data local computation
• Select acceptable nodes (as least 10 nodes by
default)
– Nodes has the same address
– If not enough, add nodes in the same rack
– If not enough, randomly select nodes in other racks
• Select the node with the smallest number of
assignments (pending tasks)
Thursday, 24 April, 14
Data cache
• Google Guava LoadingCache
• Cached Objects
– HiveMeta database table partition
– Byte Code Class
FilterAndProjectOperatorFactoryFactory,
ScanFilterAndProjectOperatorFactoryFactory
– functions
Thursday, 24 April, 14
Dynamic compile plan to byte code
• Presto dynamic compile FilterAndProjectOperator
and ScanFilterAndProjectOperator to byte code
which lets the JIT optimize and generate native
machine code
• How much does it speed up ?
• ScanFilterAndProjectOperator
Thursday, 24 April, 14
Careful use mem & data structure
• Slice
– Unsafe#copyMemory
– 20% ~ 30% speed up for ORCFile write performance
• ThreadLocalRandom
– ThreadLocal seed instead of AtomicLong
– 100% speed up
• ListenableFuture
– Async Callback
Thursday, 24 April, 14
Approximate queries
• approx_avg, approx_distinct, approx_percentile
• +50% speed up
Thursday, 24 April, 14
Traditional SQL optimize
• ImplementSampleAsFilter
• LimitPushDown
• MaterializeSamplePullUp
• MergeProjections
• PredicatePushDown
• PruneRedundantProjections
• PruneUnreferencedOutputs
• SetFlatteningOptimizer
• SimplifyExpressions
• UnaliasSymbolReferences
Thursday, 24 April, 14
GC control
• A JDK 1.7 BUG
• When code cache fills up, there is a chance that JIT
might stop compile byte code to native code.
• By forcing classes to unload from the perm gen,
we let the code cache evictor make room before
the cache fills up.
• System.gc()
Thursday, 24 April, 14
TPCH benchmark test
• Run presto-main/src/test/java/com/facebook/
presto/benchmark/BenchmarkSuite.java
• A part of the result as below
Thursday, 24 April, 14
What we do
• Support kerberos authentication
• Implicit type coercion
• Support reading lzo compressed tables
• Implement useful functions
• Fix planning issue when using DISTICT aggregations
in HAVING clause
• https://github.com/MTDATA/presto/commits/
mt-0.60
Thursday, 24 April, 14
Reference
• http://prestodb.io/
• https://www.facebook.com/notes/facebook-
engineering/presto-interacting-with-petabytes-of-
data-at-facebook/10151786197628920
• http://www.slideshare.net/zhusx/presto-overview?
from_search=1
• http://www.slideshare.net/frsyuki/hadoop-source-
code-reading-15-in-japan-presto
Thursday, 24 April, 14
Thanks
Thursday, 24 April, 14

More Related Content

What's hot

Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Taro L. Saito
 
Boston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseBoston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseMatt Fuller
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...viirya
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookTreasure Data, Inc.
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupWojciech Biela
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentationCyanny LIANG
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Martin Traverso
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineDataWorks Summit
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178Kai Sasaki
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_enOgibayashi
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case Kai Sasaki
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_casewyukawa
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Zhenxiao Luo
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine kiran palaka
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringTaro L. Saito
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 

What's hot (20)

Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Boston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseBoston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the Enterprise
 
Presto - SQL on anything
Presto  - SQL on anythingPresto  - SQL on anything
Presto - SQL on anything
 
Presto
PrestoPresto
Presto
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop Meetup
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 

Similar to Presto: High Performance SQL Query Engine Architecture

Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay RaiConquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay RaiDatabricks
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014Eli Singer
 
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Teradata Partners Conference Oct 2014   Big Data Anti-PatternsTeradata Partners Conference Oct 2014   Big Data Anti-Patterns
Teradata Partners Conference Oct 2014 Big Data Anti-PatternsDouglas Moore
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tigerElizabeth Smith
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tigerElizabeth Smith
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion EngineAdam Doyle
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Cloud-native persistence in a serverless world
Cloud-native persistence in a serverless worldCloud-native persistence in a serverless world
Cloud-native persistence in a serverless worldNick Do
 
Beyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To CodeBeyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To CodeYuto Hayamizu
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemJohn Efstathiades
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data labChris Kernaghan
 
DrupalCamp LA 2014 - A Perfect Launch, Every Time
DrupalCamp LA 2014 - A Perfect Launch, Every TimeDrupalCamp LA 2014 - A Perfect Launch, Every Time
DrupalCamp LA 2014 - A Perfect Launch, Every TimeSuzanne Aldrich
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Sabre presentation for MySQL user conference 2004
Sabre presentation for MySQL user conference 2004Sabre presentation for MySQL user conference 2004
Sabre presentation for MySQL user conference 2004Alan Walker
 

Similar to Presto: High Performance SQL Query Engine Architecture (20)

Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay RaiConquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014
 
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Teradata Partners Conference Oct 2014   Big Data Anti-PatternsTeradata Partners Conference Oct 2014   Big Data Anti-Patterns
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tiger
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tiger
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion Engine
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Cloud-native persistence in a serverless world
Cloud-native persistence in a serverless worldCloud-native persistence in a serverless world
Cloud-native persistence in a serverless world
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Beyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To CodeBeyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To Code
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded System
 
Background processing with hangfire
Background processing with hangfireBackground processing with hangfire
Background processing with hangfire
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data lab
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
DrupalCamp LA 2014 - A Perfect Launch, Every Time
DrupalCamp LA 2014 - A Perfect Launch, Every TimeDrupalCamp LA 2014 - A Perfect Launch, Every Time
DrupalCamp LA 2014 - A Perfect Launch, Every Time
 
Apache drill
Apache drillApache drill
Apache drill
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Sabre presentation for MySQL user conference 2004
Sabre presentation for MySQL user conference 2004Sabre presentation for MySQL user conference 2004
Sabre presentation for MySQL user conference 2004
 

Recently uploaded

Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 

Presto: High Performance SQL Query Engine Architecture

  • 2. Content • Background • Architecture • Key points for low query latency • TPCH benchmark test • What we do • Reference Thursday, 24 April, 14
  • 3. Background • 300+PB data stored in Hadoop/HDFS-based clusters • More queries and get results faster improves analysts, data scientists, and engineers productivity • MapReduce and Hive are designed for large-scale, reliable computation • External projects too nascent or did not meet our requirements for flexibility and scale Thursday, 24 April, 14
  • 5. Key points for low latency • In memory parallel computing • Pipeline • Data local computation • Data cache • Dynamic compile part of the plan to byte code • Careful use of memory and data structure • BlinkDB liked approximate queries • Traditional SQL optimize • GC control Thursday, 24 April, 14
  • 7. In memory parallel computing select c1.rank, count(*) from dim.city c1 join dim.city c2 on c1.id = c2.id where c1.id > 10 group by c1.rank limit 10; Thursday, 24 April, 14
  • 8. In memory parallel computing Thursday, 24 April, 14
  • 9. In memory parallel computing Thursday, 24 April, 14
  • 10. In memory parallel computing • PlanDistribution=Source – InputSplit[] splits = inputFormat.getSplits(jobConf , 0); • PlanDistribution=Hash – Hash Shuffle – Fixed Workers – query.initial-hash-partitions Thursday, 24 April, 14
  • 11. SplitRunner thread number task.shard.max-threads=availableProcessors() * 4 Pipeline - TaskExecutor Thursday, 24 April, 14
  • 12. Pipeline - Operator process flow Page(max page size: 1MB, max rows: 16 * 1024 ) Thursday, 24 April, 14
  • 14. Data local computation • Select acceptable nodes (as least 10 nodes by default) – Nodes has the same address – If not enough, add nodes in the same rack – If not enough, randomly select nodes in other racks • Select the node with the smallest number of assignments (pending tasks) Thursday, 24 April, 14
  • 15. Data cache • Google Guava LoadingCache • Cached Objects – HiveMeta database table partition – Byte Code Class FilterAndProjectOperatorFactoryFactory, ScanFilterAndProjectOperatorFactoryFactory – functions Thursday, 24 April, 14
  • 16. Dynamic compile plan to byte code • Presto dynamic compile FilterAndProjectOperator and ScanFilterAndProjectOperator to byte code which lets the JIT optimize and generate native machine code • How much does it speed up ? • ScanFilterAndProjectOperator Thursday, 24 April, 14
  • 17. Careful use mem & data structure • Slice – Unsafe#copyMemory – 20% ~ 30% speed up for ORCFile write performance • ThreadLocalRandom – ThreadLocal seed instead of AtomicLong – 100% speed up • ListenableFuture – Async Callback Thursday, 24 April, 14
  • 18. Approximate queries • approx_avg, approx_distinct, approx_percentile • +50% speed up Thursday, 24 April, 14
  • 19. Traditional SQL optimize • ImplementSampleAsFilter • LimitPushDown • MaterializeSamplePullUp • MergeProjections • PredicatePushDown • PruneRedundantProjections • PruneUnreferencedOutputs • SetFlatteningOptimizer • SimplifyExpressions • UnaliasSymbolReferences Thursday, 24 April, 14
  • 20. GC control • A JDK 1.7 BUG • When code cache fills up, there is a chance that JIT might stop compile byte code to native code. • By forcing classes to unload from the perm gen, we let the code cache evictor make room before the cache fills up. • System.gc() Thursday, 24 April, 14
  • 21. TPCH benchmark test • Run presto-main/src/test/java/com/facebook/ presto/benchmark/BenchmarkSuite.java • A part of the result as below Thursday, 24 April, 14
  • 22. What we do • Support kerberos authentication • Implicit type coercion • Support reading lzo compressed tables • Implement useful functions • Fix planning issue when using DISTICT aggregations in HAVING clause • https://github.com/MTDATA/presto/commits/ mt-0.60 Thursday, 24 April, 14
  • 23. Reference • http://prestodb.io/ • https://www.facebook.com/notes/facebook- engineering/presto-interacting-with-petabytes-of- data-at-facebook/10151786197628920 • http://www.slideshare.net/zhusx/presto-overview? from_search=1 • http://www.slideshare.net/frsyuki/hadoop-source- code-reading-15-in-japan-presto Thursday, 24 April, 14