SlideShare une entreprise Scribd logo
1  sur  26
Query Compilation in Impala
Query Compilation in Impala
Alexander Behm | Software Engineer
May 2014 @ Impala User Group
Query Compilation in Impala
Compile Query
Execute Query
Client
Client
SQL Text
Executable Plan
Query Results
Impala Frontend
(Java)
Impala Backend
(C++)
Focus of this talk
Flow of a SQL Query
Query Compilation in Impala
Client
SQL Text
Executable Plan
Query Compilation
Query
Compiler
SQL
Parsing
Semantic
Analysis
Query
Planning
Parse Tree
Parse Tree + Analyzer
Query Compilation in Impala
Query Parsing
SELECT c1, SUM(c2)
FROM t1 JOIN t2 USING(id)
WHERE c3 > 10 GROUP BY c1
SelectList TableRefs WhereClause
SelectStmt
GroupByClause
ColRef AggExpr
ColRef
BinaryPredicate
ColRef IntLiteral
ColRefTableRef TableRef
UsingClause
ColRef
• Applies SQL grammar, reports syntax errors
• Produces parse tree capturing syntactic structure of query
Query Compilation in Impala
Semantic Analysis…
• Precondition: Query is syntactically valid. Analysis operates on parse tree.
• Consults table metadata
• Do t1 and t2 exist? Does c1 exist in t1 or t2 (or both  error)? Does id exist in t1 and t2?
• Does the user have privileges to SELECT from t1?
• Checks type compatibility of expressions, adds implicit casts
• c3 > 10  c3 > cast(10 as bigint)
• SQL rules (semantic, not syntactic)
• Does c1 appear in the GROUP BY clause?
SELECT c1, SUM(c2)
FROM t1 JOIN t2 USING(id)
WHERE c3 > 10 GROUP BY c1
Query Compilation in Impala
… Semantic Analysis
• Expression substitution for views
• Resolve column references against base tables
• Preparation for Planning
• Register state in analyzer for correct predicate assignment during planning
• Register predicates (WHERE, HAVING, ON, USING, etc.)
• Register outer-joined tables
• Compute value-transfer graph and equivalence classes for predicate inference
• (…)
• Postcondition: Query is valid. An executable plan can be produced.
SELECT c1, SUM(c2)
FROM (SELECT dept AS c1, revenue AS c2,
month AS c3 FROM t1) AS v
WHERE c3 > 10 GROUP BY c1
SELECT dept, SUM(revenue)
FROM t1
WHERE month > 10
GROUP BY dept
Query Compilation in Impala
• Generate executable plan (“tree” of operators)
• Maximize scan locality using DN block metadata
• Minimize data movement
• Full distribution of operators
• Query operators
• Scan, HashJoin, HashAggregation, Union, TopN,
Exchange
Query Planning: Goals
Query Compilation in Impala
Query Planning: Overview
Semantic
Analysis
Parse Tree + Analyzer
Query
Planner
Walk Parse Tree
Parallelize
& Fragment
Single-node Plan
Executable Plan
Query Compilation in Impala
Query Planning: Single-Node Plan
• Four major functions:
1. Parse Tree  Plan Tree
2. Assigns predicates to lowest plan node
3. Optimizes join order
4. Prunes irrelevant columns
Query Compilation in Impala
Parse Tree  Single-Node Plan Tree
HashJoin
Scan: t1
Scan: t3
Scan: t2
HashJoin
TopN
Agg
SELECT t1.dept, SUM(t2.revenue)
FROM LargeHdfsTable t1
JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id)
JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id)
WHERE t3.category = 'Online‘ AND t1.id > 10
GROUP BY t1.dept
HAVING COUNT(t2.revenue) > 10
ORDER BY revenue LIMIT 10
Query Compilation in Impala
SELECT t1.dept, SUM(t2.revenue)
FROM LargeHdfsTable t1
JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id)
JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id)
WHERE t3.category = 'Online‘ AND t1.id > 10
GROUP BY t1.dept
HAVING COUNT(t2.revenue) > 10
ORDER BY revenue LIMIT 10
Predicate Assignment & Inference
HashJoin
Scan: t1
Scan: t3
Scan: t2
HashJoin
TopN
Agg
COUNT(t2.revenue) > 10
t1.id2 = t3.id
t1.id1 = t2.id
id1 > 10
category = ‘Online’
id > 10
Inferred
Predicate
Query Compilation in Impala
Join-Order Optimization
• Inner joins are commutative and associative
• Query results correct independent of execution order
• Query execution costs vary dramatically!
• Hash table sizes, network transfers, #hash lookups
• Join-order optimization
• Impala only considers left-deep join trees
• (Right join input is a table, not another join)
• Find cheapest valid join order
• Relies heavily on table and column statistics
• Limitation: Choice of join order independent of join strategy
Query Compilation in Impala
Invalid Join Orders
SELECT t1.dept, SUM(t2.revenue)
FROM LargeHdfsTable t1
JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id)
JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id)
WHERE t3.category = 'Online‘ AND t1.id > 10
GROUP BY t1.dept
HAVING COUNT(t2.revenue) > 10
ORDER BY revenue LIMIT 10
No explicit or implicit
predicate between t2 and t3
Query Compilation in Impala
Join-Order Optimization
HashJoin
Scan: t1
Scan: t3
Scan: t2
HashJoin
HashJoin
Scan: t1
Scan: t2
Scan: t3
HashJoin
HashJoin
Scan: t2
Scan: t3
Scan: t1
HashJoin
HashJoin
Scan: t2
Scan: t1
Scan: t3
HashJoin
HashJoin
Scan: t3
Scan: t2
Scan: t1
HashJoin
HashJoin
Scan: t3
Scan: t1
Scan: t2
HashJoin
Order:
t1, t2, t3
Order:
t1, t3, t2
Order:
t2, t1, t3
Order:
t2, t3, t1
Order:
t3, t1, t2
Order:
t3, t2, t1
Query Compilation in Impala
Join-Order Optimization
• Impala’s Implementation:
1. Heuristic
• Order tables descending by size
• Best plan typically has largest table on the left (if valid)
2. Plan enumeration & costing
• Generate all possible join orders starting from a given
left-most table (starting with largest one)
• Ignore invalid join orders
• Estimate intermediate result sizes (key!)
• Choose plan that minimizes intermediate result sizes
Query Compilation in Impala
Query Planning: Overview
Semantic
Analysis
Parse Tree + Analyzer
Query
Planner
Walk Parse Tree
Parallelize
& Fragment
Single-node Plan
Executable Plan
Query Compilation in Impala
Query Planning: Distributed Plans
• Distributed Aggregation
• Pre-aggregation where data is first materialized
• Merge-aggregation partitioned by grouping columns
• Distinct aggregation: additional level of pre- and merge aggregation
• Distributed Top-N
• Initial Top-N where data is first materialized
• Final Top-N at coordinator
• Distributed Union
• Pre-aggregation/top-n placed into plans of each union operand
• Union-operand plans executed in parallel, merged via exchange
• Above strategies are currently fixed in Impala
• Independent of column/table stats
Query Compilation in Impala
Query Planning: Distributed Joins
• Broadcast Join
• Join is co-located with left input
• Broadcast right input to all nodes executing join
• Build hash table on right input, streaming probe from left input
•  Preferred for small right side (relative to left side)
• Partitioned Join
• Both tables hash-partitioned on join columns
• Same build/probe procedure as above
•  Preferred for joins where both left and right side are large
• Cost-based decision based on table/column stats
• Minimize required network transfer
Query Compilation in Impala
Query Planning: Distributed Plans
HashJoinScan: t2
Scan: t3
Scan: t1
HashJoin
TopN
Pre-Agg
MergeAgg
TopN
Broadcast
Merge
hash t2.idhash t1.id1
hash
t1.custid
at HDFS DN
at HBase RS
at coordinator
HashJoin
Scan: t2
Scan: t3
Scan: t1
HashJoin
TopN
Agg
Single-Node
Plan
Query Compilation in Impala
Explain Example: TPCDS Q42
SELECT d.d_year, i.i_category_id, i.i_category, SUM(ss_ext_sales_price)
FROM store_sales ss
JOIN date_dim d
ON (ss.ss_sold_date_sk = d.d_date_sk)
JOIN item i
ON (ss.ss_item_sk = i.i_item_sk)
WHERE i.i_manager_id = 1 AND d.d_moy = 12 AND d.d_year = 1998
GROUP BY d.d_year, i.i_category_id, i.i_category
ORDER BY total_sales DESC, d_year, i_category_id, i_category
LIMIT 100
Query Compilation in Impala
Explain Example: TPCDS Q42
+-----------------------------------------------------+
| Explain String |
+-----------------------------------------------------+
| Estimated Per-Host Requirements: Memory=0B VCores=0 |
| |
| 06:TOP-N [LIMIT=100] |
| 05:AGGREGATE [FINALIZE] |
| 04:HASH JOIN [INNER JOIN] |
| |--02:SCAN HDFS [tpcds1000gb.item i] |
| 03:HASH JOIN [INNER JOIN] |
| |--01:SCAN HDFS [tpcds1000gb.date_dim d] |
| 00:SCAN HDFS [tpcds1000gb.store_sales ss] |
+-----------------------------------------------------+
set explain_level=0;
set num_nodes=1;
Query Compilation in Impala
Explain Example: TPCDS Q42
+---------------------------------------------------------------------+
| Explain String |
+---------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=3.76GB VCores=3 |
| |
| 12:TOP-N [LIMIT=100] |
| 11:EXCHANGE [PARTITION=UNPARTITIONED] |
| 06:TOP-N [LIMIT=100] |
| 10:AGGREGATE [MERGE FINALIZE] |
| 09:EXCHANGE [PARTITION=HASH(d.d_year,i.i_category_id,i.i_category)] |
| 05:AGGREGATE |
| 04:HASH JOIN [INNER JOIN, BROADCAST] |
| |--08:EXCHANGE [BROADCAST] |
| | 02:SCAN HDFS [tpcds1000gb.item i] |
| 03:HASH JOIN [INNER JOIN, BROADCAST] |
| |--07:EXCHANGE [BROADCAST] |
| | 01:SCAN HDFS [tpcds1000gb.date_dim d] |
| 00:SCAN HDFS [tpcds1000gb.store_sales ss] |
+---------------------------------------------------------------------+
set explain_level=0;
set num_nodes=0;
Query Compilation in Impala
Explain Example: TPCDS Q42
| …
| 03:HASH JOIN [INNER JOIN, BROADCAST] |
| | hash predicates: ss.ss_sold_date_sk = d.d_date_sk |
| | hosts=10 per-host-mem=511B |
| | tuple-ids=0,1 row-size=40B cardinality=8251124389 |
| | |
| |--07:EXCHANGE [BROADCAST] |
| | | hosts=3 per-host-mem=0B |
| | | tuple-ids=1 row-size=16B cardinality=29 |
| | | |
| | 01:SCAN HDFS [tpcds1000gb.date_dim d, PARTITION=RANDOM] |
| | partitions=1/1 size=9.77MB |
| | predicates: d.d_moy = 12, d.d_year = 1998 |
| | table stats: 73049 rows total |
| | column stats: all |
| | hosts=3 per-host-mem=48.00MB |
| | tuple-ids=1 row-size=16B cardinality=29 |
| | |
| 00:SCAN HDFS [tpcds1000gb.store_sales ss, PARTITION=RANDOM] |
| partitions=1823/1823 size=1.10TB |
| table stats: 8251124389 rows total |
| column stats: all |
| hosts=10 per-host-mem=3.75GB |
| tuple-ids=0 row-size=24B cardinality=8251124389 |
+--------------------------------------------------------------+
set explain_level=2;
set num_nodes=0;
Query Compilation in Impala
Conclusion
• Cost-based choice of join order and strategy
• Critical for performance
• Relies on table and column stats
• Other plan optimizations currently independent of stats
• Likely to expand plan choices in the future
• Likely to increase reliance on stats
• Helpful Impala commands
• compute stats
• show table/column stats
• explain query/insert stmt
• set explain_level=[0-3]
• set num_nodes=0  show single-node plan
Query Compilation in Impala
Try It Out!
•Questions/comments?
• Download: cloudera.com/impala
• Email: impala-user@cloudera.org
• Join: groups.cloudera.org
Query Compilation in Impala

Contenu connexe

Tendances

Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performanceDataWorks Summit
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code GenerationDatabricks
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationDatabricks
 

Tendances (20)

Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 

Similaire à Query Compilation in Impala

Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Databricks
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Databricks
 
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...Databricks
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsJen Aman
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Spark Summit
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesCloudera, Inc.
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaFlink Forward
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
Dremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsDremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsCarl Lu
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017Jags Ramnarayan
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector? confluent
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Gruter
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewOlav Sandstå
 
Spark Sql and DataFrame
Spark Sql and DataFrameSpark Sql and DataFrame
Spark Sql and DataFramePrashant Gupta
 

Similaire à Query Compilation in Impala (20)

Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Dremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsDremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasets
 
Meetup tensorframes
Meetup tensorframesMeetup tensorframes
Meetup tensorframes
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector?
 
Lecture 9.pptx
Lecture 9.pptxLecture 9.pptx
Lecture 9.pptx
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Spark Sql and DataFrame
Spark Sql and DataFrameSpark Sql and DataFrame
Spark Sql and DataFrame
 

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Dernier

%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburgmasabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...masabamasaba
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 

Dernier (20)

%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 

Query Compilation in Impala

  • 1. Query Compilation in Impala Query Compilation in Impala Alexander Behm | Software Engineer May 2014 @ Impala User Group
  • 2. Query Compilation in Impala Compile Query Execute Query Client Client SQL Text Executable Plan Query Results Impala Frontend (Java) Impala Backend (C++) Focus of this talk Flow of a SQL Query
  • 3. Query Compilation in Impala Client SQL Text Executable Plan Query Compilation Query Compiler SQL Parsing Semantic Analysis Query Planning Parse Tree Parse Tree + Analyzer
  • 4. Query Compilation in Impala Query Parsing SELECT c1, SUM(c2) FROM t1 JOIN t2 USING(id) WHERE c3 > 10 GROUP BY c1 SelectList TableRefs WhereClause SelectStmt GroupByClause ColRef AggExpr ColRef BinaryPredicate ColRef IntLiteral ColRefTableRef TableRef UsingClause ColRef • Applies SQL grammar, reports syntax errors • Produces parse tree capturing syntactic structure of query
  • 5. Query Compilation in Impala Semantic Analysis… • Precondition: Query is syntactically valid. Analysis operates on parse tree. • Consults table metadata • Do t1 and t2 exist? Does c1 exist in t1 or t2 (or both  error)? Does id exist in t1 and t2? • Does the user have privileges to SELECT from t1? • Checks type compatibility of expressions, adds implicit casts • c3 > 10  c3 > cast(10 as bigint) • SQL rules (semantic, not syntactic) • Does c1 appear in the GROUP BY clause? SELECT c1, SUM(c2) FROM t1 JOIN t2 USING(id) WHERE c3 > 10 GROUP BY c1
  • 6. Query Compilation in Impala … Semantic Analysis • Expression substitution for views • Resolve column references against base tables • Preparation for Planning • Register state in analyzer for correct predicate assignment during planning • Register predicates (WHERE, HAVING, ON, USING, etc.) • Register outer-joined tables • Compute value-transfer graph and equivalence classes for predicate inference • (…) • Postcondition: Query is valid. An executable plan can be produced. SELECT c1, SUM(c2) FROM (SELECT dept AS c1, revenue AS c2, month AS c3 FROM t1) AS v WHERE c3 > 10 GROUP BY c1 SELECT dept, SUM(revenue) FROM t1 WHERE month > 10 GROUP BY dept
  • 7. Query Compilation in Impala • Generate executable plan (“tree” of operators) • Maximize scan locality using DN block metadata • Minimize data movement • Full distribution of operators • Query operators • Scan, HashJoin, HashAggregation, Union, TopN, Exchange Query Planning: Goals
  • 8. Query Compilation in Impala Query Planning: Overview Semantic Analysis Parse Tree + Analyzer Query Planner Walk Parse Tree Parallelize & Fragment Single-node Plan Executable Plan
  • 9. Query Compilation in Impala Query Planning: Single-Node Plan • Four major functions: 1. Parse Tree  Plan Tree 2. Assigns predicates to lowest plan node 3. Optimizes join order 4. Prunes irrelevant columns
  • 10. Query Compilation in Impala Parse Tree  Single-Node Plan Tree HashJoin Scan: t1 Scan: t3 Scan: t2 HashJoin TopN Agg SELECT t1.dept, SUM(t2.revenue) FROM LargeHdfsTable t1 JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id) JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id) WHERE t3.category = 'Online‘ AND t1.id > 10 GROUP BY t1.dept HAVING COUNT(t2.revenue) > 10 ORDER BY revenue LIMIT 10
  • 11. Query Compilation in Impala SELECT t1.dept, SUM(t2.revenue) FROM LargeHdfsTable t1 JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id) JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id) WHERE t3.category = 'Online‘ AND t1.id > 10 GROUP BY t1.dept HAVING COUNT(t2.revenue) > 10 ORDER BY revenue LIMIT 10 Predicate Assignment & Inference HashJoin Scan: t1 Scan: t3 Scan: t2 HashJoin TopN Agg COUNT(t2.revenue) > 10 t1.id2 = t3.id t1.id1 = t2.id id1 > 10 category = ‘Online’ id > 10 Inferred Predicate
  • 12. Query Compilation in Impala Join-Order Optimization • Inner joins are commutative and associative • Query results correct independent of execution order • Query execution costs vary dramatically! • Hash table sizes, network transfers, #hash lookups • Join-order optimization • Impala only considers left-deep join trees • (Right join input is a table, not another join) • Find cheapest valid join order • Relies heavily on table and column statistics • Limitation: Choice of join order independent of join strategy
  • 13. Query Compilation in Impala Invalid Join Orders SELECT t1.dept, SUM(t2.revenue) FROM LargeHdfsTable t1 JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id) JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id) WHERE t3.category = 'Online‘ AND t1.id > 10 GROUP BY t1.dept HAVING COUNT(t2.revenue) > 10 ORDER BY revenue LIMIT 10 No explicit or implicit predicate between t2 and t3
  • 14. Query Compilation in Impala Join-Order Optimization HashJoin Scan: t1 Scan: t3 Scan: t2 HashJoin HashJoin Scan: t1 Scan: t2 Scan: t3 HashJoin HashJoin Scan: t2 Scan: t3 Scan: t1 HashJoin HashJoin Scan: t2 Scan: t1 Scan: t3 HashJoin HashJoin Scan: t3 Scan: t2 Scan: t1 HashJoin HashJoin Scan: t3 Scan: t1 Scan: t2 HashJoin Order: t1, t2, t3 Order: t1, t3, t2 Order: t2, t1, t3 Order: t2, t3, t1 Order: t3, t1, t2 Order: t3, t2, t1
  • 15. Query Compilation in Impala Join-Order Optimization • Impala’s Implementation: 1. Heuristic • Order tables descending by size • Best plan typically has largest table on the left (if valid) 2. Plan enumeration & costing • Generate all possible join orders starting from a given left-most table (starting with largest one) • Ignore invalid join orders • Estimate intermediate result sizes (key!) • Choose plan that minimizes intermediate result sizes
  • 16. Query Compilation in Impala Query Planning: Overview Semantic Analysis Parse Tree + Analyzer Query Planner Walk Parse Tree Parallelize & Fragment Single-node Plan Executable Plan
  • 17. Query Compilation in Impala Query Planning: Distributed Plans • Distributed Aggregation • Pre-aggregation where data is first materialized • Merge-aggregation partitioned by grouping columns • Distinct aggregation: additional level of pre- and merge aggregation • Distributed Top-N • Initial Top-N where data is first materialized • Final Top-N at coordinator • Distributed Union • Pre-aggregation/top-n placed into plans of each union operand • Union-operand plans executed in parallel, merged via exchange • Above strategies are currently fixed in Impala • Independent of column/table stats
  • 18. Query Compilation in Impala Query Planning: Distributed Joins • Broadcast Join • Join is co-located with left input • Broadcast right input to all nodes executing join • Build hash table on right input, streaming probe from left input •  Preferred for small right side (relative to left side) • Partitioned Join • Both tables hash-partitioned on join columns • Same build/probe procedure as above •  Preferred for joins where both left and right side are large • Cost-based decision based on table/column stats • Minimize required network transfer
  • 19. Query Compilation in Impala Query Planning: Distributed Plans HashJoinScan: t2 Scan: t3 Scan: t1 HashJoin TopN Pre-Agg MergeAgg TopN Broadcast Merge hash t2.idhash t1.id1 hash t1.custid at HDFS DN at HBase RS at coordinator HashJoin Scan: t2 Scan: t3 Scan: t1 HashJoin TopN Agg Single-Node Plan
  • 20. Query Compilation in Impala Explain Example: TPCDS Q42 SELECT d.d_year, i.i_category_id, i.i_category, SUM(ss_ext_sales_price) FROM store_sales ss JOIN date_dim d ON (ss.ss_sold_date_sk = d.d_date_sk) JOIN item i ON (ss.ss_item_sk = i.i_item_sk) WHERE i.i_manager_id = 1 AND d.d_moy = 12 AND d.d_year = 1998 GROUP BY d.d_year, i.i_category_id, i.i_category ORDER BY total_sales DESC, d_year, i_category_id, i_category LIMIT 100
  • 21. Query Compilation in Impala Explain Example: TPCDS Q42 +-----------------------------------------------------+ | Explain String | +-----------------------------------------------------+ | Estimated Per-Host Requirements: Memory=0B VCores=0 | | | | 06:TOP-N [LIMIT=100] | | 05:AGGREGATE [FINALIZE] | | 04:HASH JOIN [INNER JOIN] | | |--02:SCAN HDFS [tpcds1000gb.item i] | | 03:HASH JOIN [INNER JOIN] | | |--01:SCAN HDFS [tpcds1000gb.date_dim d] | | 00:SCAN HDFS [tpcds1000gb.store_sales ss] | +-----------------------------------------------------+ set explain_level=0; set num_nodes=1;
  • 22. Query Compilation in Impala Explain Example: TPCDS Q42 +---------------------------------------------------------------------+ | Explain String | +---------------------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=3.76GB VCores=3 | | | | 12:TOP-N [LIMIT=100] | | 11:EXCHANGE [PARTITION=UNPARTITIONED] | | 06:TOP-N [LIMIT=100] | | 10:AGGREGATE [MERGE FINALIZE] | | 09:EXCHANGE [PARTITION=HASH(d.d_year,i.i_category_id,i.i_category)] | | 05:AGGREGATE | | 04:HASH JOIN [INNER JOIN, BROADCAST] | | |--08:EXCHANGE [BROADCAST] | | | 02:SCAN HDFS [tpcds1000gb.item i] | | 03:HASH JOIN [INNER JOIN, BROADCAST] | | |--07:EXCHANGE [BROADCAST] | | | 01:SCAN HDFS [tpcds1000gb.date_dim d] | | 00:SCAN HDFS [tpcds1000gb.store_sales ss] | +---------------------------------------------------------------------+ set explain_level=0; set num_nodes=0;
  • 23. Query Compilation in Impala Explain Example: TPCDS Q42 | … | 03:HASH JOIN [INNER JOIN, BROADCAST] | | | hash predicates: ss.ss_sold_date_sk = d.d_date_sk | | | hosts=10 per-host-mem=511B | | | tuple-ids=0,1 row-size=40B cardinality=8251124389 | | | | | |--07:EXCHANGE [BROADCAST] | | | | hosts=3 per-host-mem=0B | | | | tuple-ids=1 row-size=16B cardinality=29 | | | | | | | 01:SCAN HDFS [tpcds1000gb.date_dim d, PARTITION=RANDOM] | | | partitions=1/1 size=9.77MB | | | predicates: d.d_moy = 12, d.d_year = 1998 | | | table stats: 73049 rows total | | | column stats: all | | | hosts=3 per-host-mem=48.00MB | | | tuple-ids=1 row-size=16B cardinality=29 | | | | | 00:SCAN HDFS [tpcds1000gb.store_sales ss, PARTITION=RANDOM] | | partitions=1823/1823 size=1.10TB | | table stats: 8251124389 rows total | | column stats: all | | hosts=10 per-host-mem=3.75GB | | tuple-ids=0 row-size=24B cardinality=8251124389 | +--------------------------------------------------------------+ set explain_level=2; set num_nodes=0;
  • 24. Query Compilation in Impala Conclusion • Cost-based choice of join order and strategy • Critical for performance • Relies on table and column stats • Other plan optimizations currently independent of stats • Likely to expand plan choices in the future • Likely to increase reliance on stats • Helpful Impala commands • compute stats • show table/column stats • explain query/insert stmt • set explain_level=[0-3] • set num_nodes=0  show single-node plan
  • 25. Query Compilation in Impala Try It Out! •Questions/comments? • Download: cloudera.com/impala • Email: impala-user@cloudera.org • Join: groups.cloudera.org