SlideShare a Scribd company logo
1 of 19
Hive, Presto, and Spark
on
TPC-DS benchmark
Dongwon Kim, PhD
SK Telecom
Contents
• Experimental setup
• Experimental results
[Experimental setup] TPC-DS dataset and query
• Hive
• Entirely depend on github.com/hortonworks/hive-testbench
• Distributed data generator
• A small dataset (100GB)
• A large dataset (1TB)
• DDLs
• External table declaration
• Partitioned table declaration (ORC)
• 66 queries provided (out of 99 TPC-DS queries)
• Presto
• Use hive-hadoop2 connector to read the same partitioned table
• Use the same query
• Spark
• Connected to Hive MetaStore to read the same partitioned table
• Use the same query
[Experimental setup] Cluster setup
• A single master node with 5 slave nodes
• Two 12-core processes with total 48 hyper threads
• 128GB main memory
• 10 HDDs
• Hadoop 2.7.3
• Hive 2.1.1 + Tez 0.8.4
• A LLAP worker on a node uses 192 cores and 80GB
• Presto 0.162
• A Presto worker on a node uses 192 cores and 80GB
• distributed-joins-enabled = false
• Spark 2.0.2
• 4 Spark executors on a node uses 192 cores and 80GB
[Experimental setup] Performance monitoring tool
• github.com/eastcirclek/swimlane-graphs
• Hive/Presto/Spark task swimlane graph + Ganglia resource utilization graph
• To observe the main cause of performance bottleneck
[Experimental results] Characteristics of each engine
• Hive
• Improve significantly through LLAP
• Good for both small and large workload
• Especially good for IO-bound workloads
• Spark
• Improve CPU performance through Whole Stage Code Generation
• Especially good for CPU-bound workloads
• Does not outperform Hive and Presto for IO-bound workloads
• Presto
• Pipelined execution to reduce unnecessary disk IOs
• Good for simple queries
• Works okay only when data is fit into memory
[Experimental results] Query execution time (100GB)
with query72 without query72
Pairwise comparison
reduction in
sum of running times
Pairwise comparison
reduction in
sum of running times
Spark > Hive 26.3 %
(1668s  1229s)
Hive > Spark 19.8 %
(1143s  916s)
Hive > Presto 55.6 %
(2797s  1241s)
Hive > Presto 50.2 %
(982s  489s)
Spark > Presto 62.0 %
(2932s  1114s)
Spark > Presto 5.2%
(1116s  1057s)
Spark > Hive >>> Presto Hive > Spark >= Presto
Reversed
Gap reduced
significantly
Hive with LLAP is good even for small workload
* When comparing each pair of engines, I count queries that are completed by both of two engines.
[Experimental results] Query execution time (1TB)
with query72 without query72
Pairwise comparison
reduction in
sum of running times
Pairwise comparison
reduction in
sum of running times
Hive > Spark 28.2 %
(6445s  4625s)
Hive > Spark 41.3 %
(6165s  3629s)
Hive > Presto 56.4 %
(5567s  2426s)
Hive > Presto 25.5 %
(1460s  1087s)
Spark > Presto 29.2 %
(5685s  4026s)
Presto > Spark 58.6%
(3812s  1578s)
Hive > Spark >>> Presto Hive > Presto > Spark
Reversed
* When comparing each pair of engines, I count queries that are completed by both of two engines.
Hive with LLAP is good for large, I/O-bound workload
Presto works okay if data fit into memory
[Experimental results] Curse of Query72 on Hive and Presto
0
500
1000
1500
2000
72 89 43 63 19 3 51 52 42 55 82 29 17 25 49 39 91 40 21 13 12 73 96 48 20 85 34 84 79 32 7 26 45 27 46 88 68 15 97 92 93 66 87 24 28 56 71 83 60 76 31 64 74
Presto Spark
100GB dataset
0
500
1000
1500
2000
72 89 43 63 19 3 42 52 55 25 49 29 17 75 82 40 88 31 13 71 91 56 85 60 68 28 46 48 26 7 51 66 21 34 27 20 45 12 87 73 79 15 32 96 84 76 39 93 92 97
Presto Hive
0
500
1000
1500
2000
72 22 67 51 97 92 95 82 39 93 21 94 96 84 73 12 18 79 32 3 91 20 43 98 52 42 89 63 45 15 55 34 48 27 7 26 90 46 68 13 19 85 87 40 66 76 28 88 50 60 29 17 56 58 71 54 25 49 31 80
Spark Hive
0
1000
2000
3000
4000
72 75 91 49 13 26 88 71 40 66 56 60 68 7 31 27 48 79 87 46 45 73 15 34 84 20 21 12 39 32 96 76 28 51 92 85 93
Presto Hive
0
1000
2000
3000
4000
72 67 82 92 97 95 22 51 21 12 20 15 26 19 96 39 27 84 3 13 18 28 43 45 52 7 34 48 42 46 55 32 73 89 87 63 68 90 79 91 76 66 40 71 93 50 94 56 85 54 60 58 25 17 31 29 88 49 74 80
Spark Hive
0
1000
2000
3000
4000
72 26 13 91 21 15 20 12 27 7 84 39 45 96 48 46 34 40 71 68 66 73 87 92 97 79 32 51 28 76 85 56 93 83 60 24 49 88 31 74
Presto Spark
[Experimental results] Curse of Query72 on Hive and Presto 1TB dataset
[Experimental results] Curse of Query72 on Presto and Hive
Presto SparkHive
High CPU utilization
for a long time
(looks like “plateau”)
No plateau observed
thanks to
WholeStageCodeGen
plateau
w/ WholeStageCodeGen w/o WholeStageCodeGen
CPU
Network
Disk
[Experimental results] Whole Stage Code Generation
Presto Hive
Without WholeStageCodeGeneration,
“plateau” is observed like Presto and Hive
 10th stage takes much longer
without WholeStageCodeGeneration
plateau
[Experimental results] Whole Stage Code Generation
1
10
100
1000
75 71 64 83 73 40 56 55 43 85 89 63 84 27 92 82 28 93 91 12 96 32 7 66 20 45 98 42 68 52 87 90 94 79 3 48 26 60 46 34 67 51 15 29 65 76 50 31 49 80 19 18 21 58 17 13 97 22 95 54 25 39 88 24 72
w/ WSCG w/o WSCG
100GB dataset
[Experimental results] Performance of Hive w/ and w/o LLAP
1
10
100
1000
10000
67 51 70 97 92 68 46 42 71 73 90 34 39 48 84 52 43 63 98 13 3 7 79 66 20 26 27 96 21 45 89 80 12 56 15 32 40 18 49 19 55 60 54 87 31 76 75 82 22 88 28 58 94 93 95 72
llap container
LLAP shows improvement over container-based Tez for most queries
100GB
dataset
1TB
dataset
1
10
100
1000
10000
80 84 96 79 94 51 91 31 56 67 27 32 95 46 72 39 19 48 90 52 21 45 15 3 34 55 42 20 12 97 43 60 63 22 26 76 75 83 88 68 70 89 65 28 13 87 40 58 18 85 25 17 29 93 50 92 66 73 71
llap container
All queries : 78.3% reduction
All except 72 : 36.1% reduction
All queries : 44.9% reduction
All except 72 : 27.9% reduction
[Experimental results][Query 75] without and with LLAP
without LLAP without LLAP
with LLAP
with LLAP
[Experimental results][Query 93] without and with LLAP
without LLAP without LLAP with LLAP
with LLAP
[Experimental results][Query 94] without and with LLAP
without LLAP without LLAP with LLAP
with LLAP
[Experimental results][Query 93] Difference pattern of resource utilization
Network  CPU
Presto Hive
CPU  Network
Spark
The end

More Related Content

What's hot

Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
PLPgSqL- Datatypes, Language structure.pptx
PLPgSqL- Datatypes, Language structure.pptxPLPgSqL- Datatypes, Language structure.pptx
PLPgSqL- Datatypes, Language structure.pptxjohnwick814916
 
Spark overview
Spark overviewSpark overview
Spark overviewLisa Hua
 
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)NTT DATA Technology & Innovation
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is FailingDataWorks Summit
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Josef A. Habdank
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...HostedbyConfluent
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재PgDay.Seoul
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOAltinity Ltd
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeDatabricks
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 

What's hot (20)

Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
PLPgSqL- Datatypes, Language structure.pptx
PLPgSqL- Datatypes, Language structure.pptxPLPgSqL- Datatypes, Language structure.pptx
PLPgSqL- Datatypes, Language structure.pptx
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Spark overview
Spark overviewSpark overview
Spark overview
 
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
Apache Bigtop3.2 (仮)(Open Source Conference 2022 Online/Hiroshima 発表資料)
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 

Viewers also liked

A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkDongwon Kim
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetesDongwon Kim
 
Kubernetes introduction
Kubernetes introductionKubernetes introduction
Kubernetes introductionDongwon Kim
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduceEdureka!
 
Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkPredictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkDongwon Kim
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentationargonauts007
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentationCyanny LIANG
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016kbajda
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FutureDataWorks Summit
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CAkbajda
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine kiran palaka
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case Kai Sasaki
 
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageKai Sasaki
 

Viewers also liked (17)

A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache Flink
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetes
 
Kubernetes introduction
Kubernetes introductionKubernetes introduction
Kubernetes introduction
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduce
 
Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkPredictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache Flink
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
 
Presto - SQL on anything
Presto  - SQL on anythingPresto  - SQL on anything
Presto - SQL on anything
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud Storage
 
Presto
PrestoPresto
Presto
 

Similar to Hive, Presto & Spark on TPC-DS: A Comparison of Query Performance

Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance SmackdownDataWorks Summit
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudNicolas Poggi
 
Scaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insertScaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insertChris Adkin
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffTimescale
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
 
Using Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization Methods in the Hadoop Cluster with TerasortUsing Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization Methods in the Hadoop Cluster with TerasortAnhanguera Educacional S/A
 
Cassandra Performance Benchmark
Cassandra Performance BenchmarkCassandra Performance Benchmark
Cassandra Performance BenchmarkBigstep
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Nicolas Poggi
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongCeph Community
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightDataStax Academy
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Tier1 App
 
USE_OF_PACKET_CAPTURE.pptx
USE_OF_PACKET_CAPTURE.pptxUSE_OF_PACKET_CAPTURE.pptx
USE_OF_PACKET_CAPTURE.pptxrajaguru91
 
ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemAvleen Vig
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance BenchmarkingSantanu Dey
 
ClusterPresentation
ClusterPresentationClusterPresentation
ClusterPresentationWill Dixon
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneEnkitec
 

Similar to Hive, Presto & Spark on TPC-DS: A Comparison of Query Performance (20)

Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance Smackdown
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Scaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insertScaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insert
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
Using Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization Methods in the Hadoop Cluster with TerasortUsing Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
 
QCon London.pdf
QCon London.pdfQCon London.pdf
QCon London.pdf
 
Cassandra Performance Benchmark
Cassandra Performance BenchmarkCassandra Performance Benchmark
Cassandra Performance Benchmark
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
 
USE_OF_PACKET_CAPTURE.pptx
USE_OF_PACKET_CAPTURE.pptxUSE_OF_PACKET_CAPTURE.pptx
USE_OF_PACKET_CAPTURE.pptx
 
ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log system
 
Silent stores
Silent storesSilent stores
Silent stores
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
 
ClusterPresentation
ClusterPresentationClusterPresentation
ClusterPresentation
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Hive, Presto & Spark on TPC-DS: A Comparison of Query Performance

  • 1. Hive, Presto, and Spark on TPC-DS benchmark Dongwon Kim, PhD SK Telecom
  • 2. Contents • Experimental setup • Experimental results
  • 3. [Experimental setup] TPC-DS dataset and query • Hive • Entirely depend on github.com/hortonworks/hive-testbench • Distributed data generator • A small dataset (100GB) • A large dataset (1TB) • DDLs • External table declaration • Partitioned table declaration (ORC) • 66 queries provided (out of 99 TPC-DS queries) • Presto • Use hive-hadoop2 connector to read the same partitioned table • Use the same query • Spark • Connected to Hive MetaStore to read the same partitioned table • Use the same query
  • 4. [Experimental setup] Cluster setup • A single master node with 5 slave nodes • Two 12-core processes with total 48 hyper threads • 128GB main memory • 10 HDDs • Hadoop 2.7.3 • Hive 2.1.1 + Tez 0.8.4 • A LLAP worker on a node uses 192 cores and 80GB • Presto 0.162 • A Presto worker on a node uses 192 cores and 80GB • distributed-joins-enabled = false • Spark 2.0.2 • 4 Spark executors on a node uses 192 cores and 80GB
  • 5. [Experimental setup] Performance monitoring tool • github.com/eastcirclek/swimlane-graphs • Hive/Presto/Spark task swimlane graph + Ganglia resource utilization graph • To observe the main cause of performance bottleneck
  • 6. [Experimental results] Characteristics of each engine • Hive • Improve significantly through LLAP • Good for both small and large workload • Especially good for IO-bound workloads • Spark • Improve CPU performance through Whole Stage Code Generation • Especially good for CPU-bound workloads • Does not outperform Hive and Presto for IO-bound workloads • Presto • Pipelined execution to reduce unnecessary disk IOs • Good for simple queries • Works okay only when data is fit into memory
  • 7. [Experimental results] Query execution time (100GB) with query72 without query72 Pairwise comparison reduction in sum of running times Pairwise comparison reduction in sum of running times Spark > Hive 26.3 % (1668s  1229s) Hive > Spark 19.8 % (1143s  916s) Hive > Presto 55.6 % (2797s  1241s) Hive > Presto 50.2 % (982s  489s) Spark > Presto 62.0 % (2932s  1114s) Spark > Presto 5.2% (1116s  1057s) Spark > Hive >>> Presto Hive > Spark >= Presto Reversed Gap reduced significantly Hive with LLAP is good even for small workload * When comparing each pair of engines, I count queries that are completed by both of two engines.
  • 8. [Experimental results] Query execution time (1TB) with query72 without query72 Pairwise comparison reduction in sum of running times Pairwise comparison reduction in sum of running times Hive > Spark 28.2 % (6445s  4625s) Hive > Spark 41.3 % (6165s  3629s) Hive > Presto 56.4 % (5567s  2426s) Hive > Presto 25.5 % (1460s  1087s) Spark > Presto 29.2 % (5685s  4026s) Presto > Spark 58.6% (3812s  1578s) Hive > Spark >>> Presto Hive > Presto > Spark Reversed * When comparing each pair of engines, I count queries that are completed by both of two engines. Hive with LLAP is good for large, I/O-bound workload Presto works okay if data fit into memory
  • 9. [Experimental results] Curse of Query72 on Hive and Presto 0 500 1000 1500 2000 72 89 43 63 19 3 51 52 42 55 82 29 17 25 49 39 91 40 21 13 12 73 96 48 20 85 34 84 79 32 7 26 45 27 46 88 68 15 97 92 93 66 87 24 28 56 71 83 60 76 31 64 74 Presto Spark 100GB dataset 0 500 1000 1500 2000 72 89 43 63 19 3 42 52 55 25 49 29 17 75 82 40 88 31 13 71 91 56 85 60 68 28 46 48 26 7 51 66 21 34 27 20 45 12 87 73 79 15 32 96 84 76 39 93 92 97 Presto Hive 0 500 1000 1500 2000 72 22 67 51 97 92 95 82 39 93 21 94 96 84 73 12 18 79 32 3 91 20 43 98 52 42 89 63 45 15 55 34 48 27 7 26 90 46 68 13 19 85 87 40 66 76 28 88 50 60 29 17 56 58 71 54 25 49 31 80 Spark Hive
  • 10. 0 1000 2000 3000 4000 72 75 91 49 13 26 88 71 40 66 56 60 68 7 31 27 48 79 87 46 45 73 15 34 84 20 21 12 39 32 96 76 28 51 92 85 93 Presto Hive 0 1000 2000 3000 4000 72 67 82 92 97 95 22 51 21 12 20 15 26 19 96 39 27 84 3 13 18 28 43 45 52 7 34 48 42 46 55 32 73 89 87 63 68 90 79 91 76 66 40 71 93 50 94 56 85 54 60 58 25 17 31 29 88 49 74 80 Spark Hive 0 1000 2000 3000 4000 72 26 13 91 21 15 20 12 27 7 84 39 45 96 48 46 34 40 71 68 66 73 87 92 97 79 32 51 28 76 85 56 93 83 60 24 49 88 31 74 Presto Spark [Experimental results] Curse of Query72 on Hive and Presto 1TB dataset
  • 11. [Experimental results] Curse of Query72 on Presto and Hive Presto SparkHive High CPU utilization for a long time (looks like “plateau”) No plateau observed thanks to WholeStageCodeGen plateau
  • 12. w/ WholeStageCodeGen w/o WholeStageCodeGen CPU Network Disk [Experimental results] Whole Stage Code Generation Presto Hive Without WholeStageCodeGeneration, “plateau” is observed like Presto and Hive  10th stage takes much longer without WholeStageCodeGeneration plateau
  • 13. [Experimental results] Whole Stage Code Generation 1 10 100 1000 75 71 64 83 73 40 56 55 43 85 89 63 84 27 92 82 28 93 91 12 96 32 7 66 20 45 98 42 68 52 87 90 94 79 3 48 26 60 46 34 67 51 15 29 65 76 50 31 49 80 19 18 21 58 17 13 97 22 95 54 25 39 88 24 72 w/ WSCG w/o WSCG 100GB dataset
  • 14. [Experimental results] Performance of Hive w/ and w/o LLAP 1 10 100 1000 10000 67 51 70 97 92 68 46 42 71 73 90 34 39 48 84 52 43 63 98 13 3 7 79 66 20 26 27 96 21 45 89 80 12 56 15 32 40 18 49 19 55 60 54 87 31 76 75 82 22 88 28 58 94 93 95 72 llap container LLAP shows improvement over container-based Tez for most queries 100GB dataset 1TB dataset 1 10 100 1000 10000 80 84 96 79 94 51 91 31 56 67 27 32 95 46 72 39 19 48 90 52 21 45 15 3 34 55 42 20 12 97 43 60 63 22 26 76 75 83 88 68 70 89 65 28 13 87 40 58 18 85 25 17 29 93 50 92 66 73 71 llap container All queries : 78.3% reduction All except 72 : 36.1% reduction All queries : 44.9% reduction All except 72 : 27.9% reduction
  • 15. [Experimental results][Query 75] without and with LLAP without LLAP without LLAP with LLAP with LLAP
  • 16. [Experimental results][Query 93] without and with LLAP without LLAP without LLAP with LLAP with LLAP
  • 17. [Experimental results][Query 94] without and with LLAP without LLAP without LLAP with LLAP with LLAP
  • 18. [Experimental results][Query 93] Difference pattern of resource utilization Network  CPU Presto Hive CPU  Network Spark