SlideShare une entreprise Scribd logo
1  sur  109
Télécharger pour lire hors ligne
><
INTRODUCTIONTO
INTRODUCTION TO APACHE CALCITE
APACHE CALCITE
JORDAN HALTERMAN
1
WHAT IS APACHE
CALCITE?
next 2
><INTRODUCTION TO APACHE CALCITE 3
What is Apache Calcite?
• A framework for building SQL databases
• Developed over more than ten years
• Written in Java
• Previously known as Optiq
• Previously known as Farrago
• Became an Apache project in 2013
• Led by Julian Hyde at Hortonworks
><INTRODUCTION TO APACHE CALCITE 4
Projects using Calcite
• Apache Hive
• Apache Drill
• Apache Flink
• Apache Phoenix
• Apache Samza
• Apache Storm
• Apache everything…
><INTRODUCTION TO APACHE CALCITE 5
What is Apache Calcite?
• SQL parser
• SQL validation
• Query optimizer
• SQL generator
• Data federator
><INTRODUCTION TO APACHE CALCITE
Parse
Queries are parsed using
a JavaCC generated
parser
Validate
Queries are validated
against known database
metadata
Optimize
Logical plans are optimized
and converted into physical
expressions
Execute
P h y s i c a l p l a n s a r e
converted into application-
specific executions
01 02 03 04
Stages of query execution
6
COMPONENTS next 7
><INTRODUCTION TO APACHE CALCITE 8
Components of Calcite
• Catalog - Defines metadata and namespaces
that can be accessed in SQL queries
• SQL parser - Parses valid SQL queries into an
abstract syntax tree (AST)
• SQL validator - Validates abstract syntax trees
against metadata provided by the catalog
• Query optimizer - Converts AST into logical
plans, optimizes logical plans, and converts
logical expressions into physical plans
• SQL generator - Converts physical plans to
SQL
CATALOG next 9
><INTRODUCTION TO APACHE CALCITE 10
Calcite Catalog
• Defines namespaces that can be accessed in Calcite
queries
• Schema
• A collection of schemas and tables
• Can be arbitrarily nested
• Table
• Represents a single data set
• Fields defined by a RelDataType
• RelDataType
• Represents fields in a data set
• Supports all SQL data types, including structs and
><INTRODUCTION TO APACHE CALCITE 11
Schema
• A collection of schemas and tables
• Schemas can be arbitrarily nested
><INTRODUCTION TO APACHE CALCITE 12
Schema
• A collection of schemas and tables
• Schemas can be arbitrarily nested
><INTRODUCTION TO APACHE CALCITE 13
Table
• Represents a single data set
• Fields are defined by a RelDataType
><INTRODUCTION TO APACHE CALCITE 14
Table
• Represents a single data set
• Fields are defined by a RelDataType
><INTRODUCTION TO APACHE CALCITE 15
RelDataType
• Represents the data type of an object
• Supports all SQL data types, including
structs and arrays
• Similar to Spark’s DataType
><INTRODUCTION TO APACHE CALCITE 16
RelDataType
><INTRODUCTION TO APACHE CALCITE 17
RelDataType
data type enum
><INTRODUCTION TO APACHE CALCITE 18
Statistic
• Provide table statistics used in optimization
><INTRODUCTION TO APACHE CALCITE 19
Statistic
• Provide table statistics used in optimization
><INTRODUCTION TO APACHE CALCITE 20
Usage of the Calcite catalog
><INTRODUCTION TO APACHE CALCITE 21
Usage of the Calcite catalog
schema
><INTRODUCTION TO APACHE CALCITE 22
Usage of the Calcite catalog
schema table
><INTRODUCTION TO APACHE CALCITE 23
Usage of the Calcite catalog
schema table
data type
><INTRODUCTION TO APACHE CALCITE 24
Usage of the Calcite catalog
schema table
data typedata type field
SQL PARSER next 25
><INTRODUCTION TO APACHE CALCITE 26
Calcite SQL parser
• LL(k) parser written in JavaCC
• Input queries are parsed into an abstract
syntax tree (AST)
• Tokens are represented in Calcite by
SqlNode
• SqlNode can also be converted back to a
SQL string via the unparse method
><INTRODUCTION TO APACHE CALCITE 27
JavaCC
• Java Compiler Compiler
• Created in 1996 at Sun Microsystems
• Generates Java code from a domain-
specific language
• ANTLR is the modern alternative used in
projects like Hive and Drill
• JavaCC has sparse documentation
><INTRODUCTION TO APACHE CALCITE 28
JavaCC
><INTRODUCTION TO APACHE CALCITE 29
JavaCC
><INTRODUCTION TO APACHE CALCITE 30
JavaCC
tokens
><INTRODUCTION TO APACHE CALCITE 31
JavaCC
tokens
or
><INTRODUCTION TO APACHE CALCITE 32
JavaCC
tokens
or
function call
><INTRODUCTION TO APACHE CALCITE 33
JavaCC
tokens
Java code
or
function call
><INTRODUCTION TO APACHE CALCITE 34
SqlNode
• SqlNode represents an element in an
abstract syntax tree
><INTRODUCTION TO APACHE CALCITE 35
SqlNode
• SqlNode represents an element in an
abstract syntax tree
select
><INTRODUCTION TO APACHE CALCITE 36
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect
><INTRODUCTION TO APACHE CALCITE 37
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect operator
><INTRODUCTION TO APACHE CALCITE 38
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect operator identifier
><INTRODUCTION TO APACHE CALCITE 39
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect operator identifier
data type
><INTRODUCTION TO APACHE CALCITE 40
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect operator identifier
data type
identifier
><INTRODUCTION TO APACHE CALCITE 41
SqlNode
• SqlNode’s unparse method converts a
SQL element back into a string
><INTRODUCTION TO APACHE CALCITE 42
SqlNode
• SqlNode’s unparse method converts a
SQL element back into a string
><INTRODUCTION TO APACHE CALCITE 43
SqlNode
><INTRODUCTION TO APACHE CALCITE 44
SqlNode
• SqlDialect indicates the capitalization
and quoting rules of specific databases
><INTRODUCTION TO APACHE CALCITE 45
SqlNode
• SqlDialect indicates the capitalization
and quoting rules of specific databases
QUERY OPTIMIZER next 46
><INTRODUCTION TO APACHE CALCITE 47
Query Plans
• Query plans represent the steps necessary
to execute a query
><INTRODUCTION TO APACHE CALCITE 48
Query Plans
• Query plans represent the steps necessary
to execute a query
><INTRODUCTION TO APACHE CALCITE 49
Query Plans
• Query plans represent the steps necessary
to execute a query
table scan
table
scan
><INTRODUCTION TO APACHE CALCITE 50
Query Plans
• Query plans represent the steps necessary
to execute a query
inner join table scan
table
scan
><INTRODUCTION TO APACHE CALCITE 51
Query Plans
• Query plans represent the steps necessary
to execute a query
filter
inner join table scan
table
scan
><INTRODUCTION TO APACHE CALCITE 52
Query Plans
• Query plans represent the steps necessary
to execute a query
filter
inner join
project
table scan
table
scan
><INTRODUCTION TO APACHE CALCITE 53
Query Plans
• Query plans represent the steps necessary
to execute a query
filter
inner join
project
table scan
table
scan
><INTRODUCTION TO APACHE CALCITE 54
Query Optimization
• Optimize logical plan
• Goal is typically to try to reduce the amount
of data that must be processed early in the
plan
• Convert logical plan into a physical plan
• Physical plan is engine specific and
represents the physical execution stages
><INTRODUCTION TO APACHE CALCITE 55
Query Optimization
• Prune unused fields
• Merge projections
• Convert subqueries to joins
• Reorder joins
• Push down projections
• Push down filters
><INTRODUCTION TO APACHE CALCITE 56
Query Optimization
><INTRODUCTION TO APACHE CALCITE 57
Query Optimization
><INTRODUCTION TO APACHE CALCITE 58
Query Optimization
><INTRODUCTION TO APACHE CALCITE 59
Query Optimization
push down
project
><INTRODUCTION TO APACHE CALCITE 60
Query Optimization
push down
project
push down
filter
><INTRODUCTION TO APACHE CALCITE 61
Query Optimization
><INTRODUCTION TO APACHE CALCITE 62
Key Concepts
Relational algebra
Row expressions
Traits
Conventions
Rules
Planners
Programs
><INTRODUCTION TO APACHE CALCITE 63
Key Concepts
Relational algebra
Row expressions
Traits
Conventions
Rules
Planners
Programs
RelNode
RexNode
RelTrait
Convention
RelOptRule
RelOptPlanner
Program
><INTRODUCTION TO APACHE CALCITE 64
Relational Algebra
• RelNode represents a relational expression
• Largely equivalent to Spark’s DataFrame
methods
• Logical algebra
• Physical algebra
><INTRODUCTION TO APACHE CALCITE 65
Relational Algebra
TableScan
Project
Filter
Aggregate
Join
Union
Intersect
Sort
><INTRODUCTION TO APACHE CALCITE 66
Relational Algebra
TableScan
Project
Filter
Aggregate
Join
Union
Intersect
Sort
SparkTableScan
SparkProject
SparkFilter
SparkAggregate
SparkJoin
SparkUnion
SparkIntersect
SparkSort
><INTRODUCTION TO APACHE CALCITE 67
Row Expressions
• RexNode represents a row-level expression
• Largely equivalent to Spark’s Column
functions
• Projection fields
• Filter condition
• Join condition
• Sort fields
><INTRODUCTION TO APACHE CALCITE 68
Row Expressions
Input column ref
Literal
Struct field access
Function call
Window expression
><INTRODUCTION TO APACHE CALCITE 69
Row Expressions
Input column ref
Literal
Struct field access
Function call
Window expression
RexInputRef
RexLiteral
RexFieldAccess
RexCall
RexOver
><INTRODUCTION TO APACHE CALCITE 70
Row Expressions
><INTRODUCTION TO APACHE CALCITE 71
Row Expressions
input ref
><INTRODUCTION TO APACHE CALCITE 72
Row Expressions
input ref
function call
><INTRODUCTION TO APACHE CALCITE 73
Traits
• Defined by the RelTrait interface
• Represent a trait of a relational expression
that does not alter execution
• Traits are used to validate plan output
• Three primary trait types:
• Convention
• RelCollation
• RelDistribution
><INTRODUCTION TO APACHE CALCITE 74
Conventions
• Convention is a type of RelTrait
• A Convention is associated with a
RelNode interface
• SparkConvention, JdbcConvention,
EnumerableConvention, etc
• Conventions are used to represent a single
data source
• Inputs to a relational expression must be in
the same convention
><INTRODUCTION TO APACHE CALCITE 75
Conventions
><INTRODUCTION TO APACHE CALCITE 76
Conventions
Spark convention
><INTRODUCTION TO APACHE CALCITE 77
Conventions
Spark convention
JDBC
convention
><INTRODUCTION TO APACHE CALCITE 78
Conventions
Spark convention
JDBC
convention
converter
><INTRODUCTION TO APACHE CALCITE 79
Rules
• Rules are used to modify query plans
• Defined by the RelOptRule interface
• Two types of rules: converters and
transformers
• Converter rules implement Converter and
convert from one convention to another
• Rules are matched to elements of a query
plan using pattern matching
• onMatch is called for matched rules
• Converter rules applied via convert
><INTRODUCTION TO APACHE CALCITE 80
Converter Rule
><INTRODUCTION TO APACHE CALCITE 81
Converter Rule
expression type
><INTRODUCTION TO APACHE CALCITE 82
Converter Rule
expression type
input convention
><INTRODUCTION TO APACHE CALCITE 83
Converter Rule
expression type
input convention
converted convention
><INTRODUCTION TO APACHE CALCITE 84
Converter Rule
expression type
input convention
converted convention
converter function
><INTRODUCTION TO APACHE CALCITE 85
Pattern Matching
><INTRODUCTION TO APACHE CALCITE 86
Pattern Matching
><INTRODUCTION TO APACHE CALCITE 87
Pattern Matching
no match
:-(
><INTRODUCTION TO APACHE CALCITE 88
Pattern Matching
no match
:-(
><INTRODUCTION TO APACHE CALCITE 89
Pattern Matching
match!
no match
:-(
><INTRODUCTION TO APACHE CALCITE 90
Planners
• Planners implement the RelOptPlanner
interface
• Two types of planners:
• HepPlanner
• VolcanoPlanner
><INTRODUCTION TO APACHE CALCITE 91
Heuristic Optimization
• HepPlanner is a heuristic optimizer similar
to Spark’s optimizer
• Applies all matching rules until none can be
applied
• Heuristic optimization is faster than cost-
based optimization
• Risk of infinite recursion if rules make
opposing changes to the plan
><INTRODUCTION TO APACHE CALCITE 92
Cost-based Optimization
• VolcanoPlanner is a cost-based
optimizer
• Applies matching rules iteratively, selecting
the plan with the cheapest cost on each
iteration
• Costs are provided by relational expressions
• Not all possible plans can be computed
• Stops optimization when the cost does not
significantly improve through a determinable
number of iterations
><INTRODUCTION TO APACHE CALCITE 93
Cost-based Optimization
• Cost is provided by each RelNode
• Cost is represented by RelOptCost
• Cost typically includes row count, I/O, and
CPU cost
• Cost estimates are relative
• Statistics are used to improve accuracy of
cost estimations
• Calcite provides utilities for computing
various resource-related statistics for use in
cost estimations
><INTRODUCTION TO APACHE CALCITE 94
Cost-based Optimization
><INTRODUCTION TO APACHE CALCITE 95
Cost-based Optimization
><INTRODUCTION TO APACHE CALCITE 96
Cost-based Optimization
><INTRODUCTION TO APACHE CALCITE 97
Cost-based Optimization
><INTRODUCTION TO APACHE CALCITE 98
Cost-based Optimization
PUTTING IT ALL
TOGETHER
next 99
><INTRODUCTION TO APACHE CALCITE 100
Putting it all together
><INTRODUCTION TO APACHE CALCITE 101
Putting it all together
><INTRODUCTION TO APACHE CALCITE 102
Putting it all together
><INTRODUCTION TO APACHE CALCITE 103
Putting it all together
><INTRODUCTION TO APACHE CALCITE 104
Putting it all together
><INTRODUCTION TO APACHE CALCITE 105
Putting it all together
><INTRODUCTION TO APACHE CALCITE 106
Putting it all together
><INTRODUCTION TO APACHE CALCITE 107
Putting it all together
><INTRODUCTION TO APACHE CALCITE 108
Putting it all together
><INTRODUCTION TO APACHE CALCITE 109
Putting it all together

Contenu connexe

Tendances

SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can helpChristian Tzolov
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteChris Baynes
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiDatabricks
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllMichael Mior
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performanceDataWorks Summit
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Christian Tzolov
 

Tendances (20)

SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can help
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
 

Similaire à Introduction to Apache Calcite

Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...Aman Sinha
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartEvans Ye
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaSpark Summit
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerSachin Aggarwal
 
PL/SQL Tips and Techniques Webinar Presentation
PL/SQL Tips and Techniques Webinar PresentationPL/SQL Tips and Techniques Webinar Presentation
PL/SQL Tips and Techniques Webinar PresentationEmbarcadero Technologies
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKai Wähner
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & SparkMatthias Niehoff
 
Dan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New FeaturesDan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New FeaturesEmbarcadero Technologies
 
What's new in Apache Spark 2.4
What's new in Apache Spark 2.4What's new in Apache Spark 2.4
What's new in Apache Spark 2.4boxu42
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectSaltlux Inc.
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sqlŁukasz Grala
 
Data Science at Scale with Apache Spark and Zeppelin Notebook
Data Science at Scale with Apache Spark and Zeppelin NotebookData Science at Scale with Apache Spark and Zeppelin Notebook
Data Science at Scale with Apache Spark and Zeppelin NotebookCarolyn Duby
 
Flink's SQL Engine: Let's Open the Engine Room!
Flink's SQL Engine: Let's Open the Engine Room!Flink's SQL Engine: Let's Open the Engine Room!
Flink's SQL Engine: Let's Open the Engine Room!HostedbyConfluent
 
Pydata london meetup - RiakTS, PySpark and Python by Stephen Etheridge
Pydata london meetup - RiakTS, PySpark and Python by Stephen EtheridgePydata london meetup - RiakTS, PySpark and Python by Stephen Etheridge
Pydata london meetup - RiakTS, PySpark and Python by Stephen EtheridgeEmmanuel Marchal
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFlink Forward
 

Similaire à Introduction to Apache Calcite (20)

Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
 
PL/SQL Tips and Techniques Webinar Presentation
PL/SQL Tips and Techniques Webinar PresentationPL/SQL Tips and Techniques Webinar Presentation
PL/SQL Tips and Techniques Webinar Presentation
 
KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
Dan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New FeaturesDan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New Features
 
What's new in Apache Spark 2.4
What's new in Apache Spark 2.4What's new in Apache Spark 2.4
What's new in Apache Spark 2.4
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql
 
Data Science at Scale with Apache Spark and Zeppelin Notebook
Data Science at Scale with Apache Spark and Zeppelin NotebookData Science at Scale with Apache Spark and Zeppelin Notebook
Data Science at Scale with Apache Spark and Zeppelin Notebook
 
Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
 
Flink's SQL Engine: Let's Open the Engine Room!
Flink's SQL Engine: Let's Open the Engine Room!Flink's SQL Engine: Let's Open the Engine Room!
Flink's SQL Engine: Let's Open the Engine Room!
 
Pydata london meetup - RiakTS, PySpark and Python by Stephen Etheridge
Pydata london meetup - RiakTS, PySpark and Python by Stephen EtheridgePydata london meetup - RiakTS, PySpark and Python by Stephen Etheridge
Pydata london meetup - RiakTS, PySpark and Python by Stephen Etheridge
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
 

Dernier

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 

Dernier (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Introduction to Apache Calcite

  • 1. >< INTRODUCTIONTO INTRODUCTION TO APACHE CALCITE APACHE CALCITE JORDAN HALTERMAN 1
  • 3. ><INTRODUCTION TO APACHE CALCITE 3 What is Apache Calcite? • A framework for building SQL databases • Developed over more than ten years • Written in Java • Previously known as Optiq • Previously known as Farrago • Became an Apache project in 2013 • Led by Julian Hyde at Hortonworks
  • 4. ><INTRODUCTION TO APACHE CALCITE 4 Projects using Calcite • Apache Hive • Apache Drill • Apache Flink • Apache Phoenix • Apache Samza • Apache Storm • Apache everything…
  • 5. ><INTRODUCTION TO APACHE CALCITE 5 What is Apache Calcite? • SQL parser • SQL validation • Query optimizer • SQL generator • Data federator
  • 6. ><INTRODUCTION TO APACHE CALCITE Parse Queries are parsed using a JavaCC generated parser Validate Queries are validated against known database metadata Optimize Logical plans are optimized and converted into physical expressions Execute P h y s i c a l p l a n s a r e converted into application- specific executions 01 02 03 04 Stages of query execution 6
  • 8. ><INTRODUCTION TO APACHE CALCITE 8 Components of Calcite • Catalog - Defines metadata and namespaces that can be accessed in SQL queries • SQL parser - Parses valid SQL queries into an abstract syntax tree (AST) • SQL validator - Validates abstract syntax trees against metadata provided by the catalog • Query optimizer - Converts AST into logical plans, optimizes logical plans, and converts logical expressions into physical plans • SQL generator - Converts physical plans to SQL
  • 10. ><INTRODUCTION TO APACHE CALCITE 10 Calcite Catalog • Defines namespaces that can be accessed in Calcite queries • Schema • A collection of schemas and tables • Can be arbitrarily nested • Table • Represents a single data set • Fields defined by a RelDataType • RelDataType • Represents fields in a data set • Supports all SQL data types, including structs and
  • 11. ><INTRODUCTION TO APACHE CALCITE 11 Schema • A collection of schemas and tables • Schemas can be arbitrarily nested
  • 12. ><INTRODUCTION TO APACHE CALCITE 12 Schema • A collection of schemas and tables • Schemas can be arbitrarily nested
  • 13. ><INTRODUCTION TO APACHE CALCITE 13 Table • Represents a single data set • Fields are defined by a RelDataType
  • 14. ><INTRODUCTION TO APACHE CALCITE 14 Table • Represents a single data set • Fields are defined by a RelDataType
  • 15. ><INTRODUCTION TO APACHE CALCITE 15 RelDataType • Represents the data type of an object • Supports all SQL data types, including structs and arrays • Similar to Spark’s DataType
  • 16. ><INTRODUCTION TO APACHE CALCITE 16 RelDataType
  • 17. ><INTRODUCTION TO APACHE CALCITE 17 RelDataType data type enum
  • 18. ><INTRODUCTION TO APACHE CALCITE 18 Statistic • Provide table statistics used in optimization
  • 19. ><INTRODUCTION TO APACHE CALCITE 19 Statistic • Provide table statistics used in optimization
  • 20. ><INTRODUCTION TO APACHE CALCITE 20 Usage of the Calcite catalog
  • 21. ><INTRODUCTION TO APACHE CALCITE 21 Usage of the Calcite catalog schema
  • 22. ><INTRODUCTION TO APACHE CALCITE 22 Usage of the Calcite catalog schema table
  • 23. ><INTRODUCTION TO APACHE CALCITE 23 Usage of the Calcite catalog schema table data type
  • 24. ><INTRODUCTION TO APACHE CALCITE 24 Usage of the Calcite catalog schema table data typedata type field
  • 26. ><INTRODUCTION TO APACHE CALCITE 26 Calcite SQL parser • LL(k) parser written in JavaCC • Input queries are parsed into an abstract syntax tree (AST) • Tokens are represented in Calcite by SqlNode • SqlNode can also be converted back to a SQL string via the unparse method
  • 27. ><INTRODUCTION TO APACHE CALCITE 27 JavaCC • Java Compiler Compiler • Created in 1996 at Sun Microsystems • Generates Java code from a domain- specific language • ANTLR is the modern alternative used in projects like Hive and Drill • JavaCC has sparse documentation
  • 28. ><INTRODUCTION TO APACHE CALCITE 28 JavaCC
  • 29. ><INTRODUCTION TO APACHE CALCITE 29 JavaCC
  • 30. ><INTRODUCTION TO APACHE CALCITE 30 JavaCC tokens
  • 31. ><INTRODUCTION TO APACHE CALCITE 31 JavaCC tokens or
  • 32. ><INTRODUCTION TO APACHE CALCITE 32 JavaCC tokens or function call
  • 33. ><INTRODUCTION TO APACHE CALCITE 33 JavaCC tokens Java code or function call
  • 34. ><INTRODUCTION TO APACHE CALCITE 34 SqlNode • SqlNode represents an element in an abstract syntax tree
  • 35. ><INTRODUCTION TO APACHE CALCITE 35 SqlNode • SqlNode represents an element in an abstract syntax tree select
  • 36. ><INTRODUCTION TO APACHE CALCITE 36 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect
  • 37. ><INTRODUCTION TO APACHE CALCITE 37 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator
  • 38. ><INTRODUCTION TO APACHE CALCITE 38 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator identifier
  • 39. ><INTRODUCTION TO APACHE CALCITE 39 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator identifier data type
  • 40. ><INTRODUCTION TO APACHE CALCITE 40 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator identifier data type identifier
  • 41. ><INTRODUCTION TO APACHE CALCITE 41 SqlNode • SqlNode’s unparse method converts a SQL element back into a string
  • 42. ><INTRODUCTION TO APACHE CALCITE 42 SqlNode • SqlNode’s unparse method converts a SQL element back into a string
  • 43. ><INTRODUCTION TO APACHE CALCITE 43 SqlNode
  • 44. ><INTRODUCTION TO APACHE CALCITE 44 SqlNode • SqlDialect indicates the capitalization and quoting rules of specific databases
  • 45. ><INTRODUCTION TO APACHE CALCITE 45 SqlNode • SqlDialect indicates the capitalization and quoting rules of specific databases
  • 47. ><INTRODUCTION TO APACHE CALCITE 47 Query Plans • Query plans represent the steps necessary to execute a query
  • 48. ><INTRODUCTION TO APACHE CALCITE 48 Query Plans • Query plans represent the steps necessary to execute a query
  • 49. ><INTRODUCTION TO APACHE CALCITE 49 Query Plans • Query plans represent the steps necessary to execute a query table scan table scan
  • 50. ><INTRODUCTION TO APACHE CALCITE 50 Query Plans • Query plans represent the steps necessary to execute a query inner join table scan table scan
  • 51. ><INTRODUCTION TO APACHE CALCITE 51 Query Plans • Query plans represent the steps necessary to execute a query filter inner join table scan table scan
  • 52. ><INTRODUCTION TO APACHE CALCITE 52 Query Plans • Query plans represent the steps necessary to execute a query filter inner join project table scan table scan
  • 53. ><INTRODUCTION TO APACHE CALCITE 53 Query Plans • Query plans represent the steps necessary to execute a query filter inner join project table scan table scan
  • 54. ><INTRODUCTION TO APACHE CALCITE 54 Query Optimization • Optimize logical plan • Goal is typically to try to reduce the amount of data that must be processed early in the plan • Convert logical plan into a physical plan • Physical plan is engine specific and represents the physical execution stages
  • 55. ><INTRODUCTION TO APACHE CALCITE 55 Query Optimization • Prune unused fields • Merge projections • Convert subqueries to joins • Reorder joins • Push down projections • Push down filters
  • 56. ><INTRODUCTION TO APACHE CALCITE 56 Query Optimization
  • 57. ><INTRODUCTION TO APACHE CALCITE 57 Query Optimization
  • 58. ><INTRODUCTION TO APACHE CALCITE 58 Query Optimization
  • 59. ><INTRODUCTION TO APACHE CALCITE 59 Query Optimization push down project
  • 60. ><INTRODUCTION TO APACHE CALCITE 60 Query Optimization push down project push down filter
  • 61. ><INTRODUCTION TO APACHE CALCITE 61 Query Optimization
  • 62. ><INTRODUCTION TO APACHE CALCITE 62 Key Concepts Relational algebra Row expressions Traits Conventions Rules Planners Programs
  • 63. ><INTRODUCTION TO APACHE CALCITE 63 Key Concepts Relational algebra Row expressions Traits Conventions Rules Planners Programs RelNode RexNode RelTrait Convention RelOptRule RelOptPlanner Program
  • 64. ><INTRODUCTION TO APACHE CALCITE 64 Relational Algebra • RelNode represents a relational expression • Largely equivalent to Spark’s DataFrame methods • Logical algebra • Physical algebra
  • 65. ><INTRODUCTION TO APACHE CALCITE 65 Relational Algebra TableScan Project Filter Aggregate Join Union Intersect Sort
  • 66. ><INTRODUCTION TO APACHE CALCITE 66 Relational Algebra TableScan Project Filter Aggregate Join Union Intersect Sort SparkTableScan SparkProject SparkFilter SparkAggregate SparkJoin SparkUnion SparkIntersect SparkSort
  • 67. ><INTRODUCTION TO APACHE CALCITE 67 Row Expressions • RexNode represents a row-level expression • Largely equivalent to Spark’s Column functions • Projection fields • Filter condition • Join condition • Sort fields
  • 68. ><INTRODUCTION TO APACHE CALCITE 68 Row Expressions Input column ref Literal Struct field access Function call Window expression
  • 69. ><INTRODUCTION TO APACHE CALCITE 69 Row Expressions Input column ref Literal Struct field access Function call Window expression RexInputRef RexLiteral RexFieldAccess RexCall RexOver
  • 70. ><INTRODUCTION TO APACHE CALCITE 70 Row Expressions
  • 71. ><INTRODUCTION TO APACHE CALCITE 71 Row Expressions input ref
  • 72. ><INTRODUCTION TO APACHE CALCITE 72 Row Expressions input ref function call
  • 73. ><INTRODUCTION TO APACHE CALCITE 73 Traits • Defined by the RelTrait interface • Represent a trait of a relational expression that does not alter execution • Traits are used to validate plan output • Three primary trait types: • Convention • RelCollation • RelDistribution
  • 74. ><INTRODUCTION TO APACHE CALCITE 74 Conventions • Convention is a type of RelTrait • A Convention is associated with a RelNode interface • SparkConvention, JdbcConvention, EnumerableConvention, etc • Conventions are used to represent a single data source • Inputs to a relational expression must be in the same convention
  • 75. ><INTRODUCTION TO APACHE CALCITE 75 Conventions
  • 76. ><INTRODUCTION TO APACHE CALCITE 76 Conventions Spark convention
  • 77. ><INTRODUCTION TO APACHE CALCITE 77 Conventions Spark convention JDBC convention
  • 78. ><INTRODUCTION TO APACHE CALCITE 78 Conventions Spark convention JDBC convention converter
  • 79. ><INTRODUCTION TO APACHE CALCITE 79 Rules • Rules are used to modify query plans • Defined by the RelOptRule interface • Two types of rules: converters and transformers • Converter rules implement Converter and convert from one convention to another • Rules are matched to elements of a query plan using pattern matching • onMatch is called for matched rules • Converter rules applied via convert
  • 80. ><INTRODUCTION TO APACHE CALCITE 80 Converter Rule
  • 81. ><INTRODUCTION TO APACHE CALCITE 81 Converter Rule expression type
  • 82. ><INTRODUCTION TO APACHE CALCITE 82 Converter Rule expression type input convention
  • 83. ><INTRODUCTION TO APACHE CALCITE 83 Converter Rule expression type input convention converted convention
  • 84. ><INTRODUCTION TO APACHE CALCITE 84 Converter Rule expression type input convention converted convention converter function
  • 85. ><INTRODUCTION TO APACHE CALCITE 85 Pattern Matching
  • 86. ><INTRODUCTION TO APACHE CALCITE 86 Pattern Matching
  • 87. ><INTRODUCTION TO APACHE CALCITE 87 Pattern Matching no match :-(
  • 88. ><INTRODUCTION TO APACHE CALCITE 88 Pattern Matching no match :-(
  • 89. ><INTRODUCTION TO APACHE CALCITE 89 Pattern Matching match! no match :-(
  • 90. ><INTRODUCTION TO APACHE CALCITE 90 Planners • Planners implement the RelOptPlanner interface • Two types of planners: • HepPlanner • VolcanoPlanner
  • 91. ><INTRODUCTION TO APACHE CALCITE 91 Heuristic Optimization • HepPlanner is a heuristic optimizer similar to Spark’s optimizer • Applies all matching rules until none can be applied • Heuristic optimization is faster than cost- based optimization • Risk of infinite recursion if rules make opposing changes to the plan
  • 92. ><INTRODUCTION TO APACHE CALCITE 92 Cost-based Optimization • VolcanoPlanner is a cost-based optimizer • Applies matching rules iteratively, selecting the plan with the cheapest cost on each iteration • Costs are provided by relational expressions • Not all possible plans can be computed • Stops optimization when the cost does not significantly improve through a determinable number of iterations
  • 93. ><INTRODUCTION TO APACHE CALCITE 93 Cost-based Optimization • Cost is provided by each RelNode • Cost is represented by RelOptCost • Cost typically includes row count, I/O, and CPU cost • Cost estimates are relative • Statistics are used to improve accuracy of cost estimations • Calcite provides utilities for computing various resource-related statistics for use in cost estimations
  • 94. ><INTRODUCTION TO APACHE CALCITE 94 Cost-based Optimization
  • 95. ><INTRODUCTION TO APACHE CALCITE 95 Cost-based Optimization
  • 96. ><INTRODUCTION TO APACHE CALCITE 96 Cost-based Optimization
  • 97. ><INTRODUCTION TO APACHE CALCITE 97 Cost-based Optimization
  • 98. ><INTRODUCTION TO APACHE CALCITE 98 Cost-based Optimization
  • 100. ><INTRODUCTION TO APACHE CALCITE 100 Putting it all together
  • 101. ><INTRODUCTION TO APACHE CALCITE 101 Putting it all together
  • 102. ><INTRODUCTION TO APACHE CALCITE 102 Putting it all together
  • 103. ><INTRODUCTION TO APACHE CALCITE 103 Putting it all together
  • 104. ><INTRODUCTION TO APACHE CALCITE 104 Putting it all together
  • 105. ><INTRODUCTION TO APACHE CALCITE 105 Putting it all together
  • 106. ><INTRODUCTION TO APACHE CALCITE 106 Putting it all together
  • 107. ><INTRODUCTION TO APACHE CALCITE 107 Putting it all together
  • 108. ><INTRODUCTION TO APACHE CALCITE 108 Putting it all together
  • 109. ><INTRODUCTION TO APACHE CALCITE 109 Putting it all together