SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Query Processing in
InfluxDB IOx
SQL, Storge gRPC, Reorganization
2021-10-13
CC BY-SA
Andrew Lamb
© 2021 InfluxData. All rights reserved.
2
Today: IOx Team at InfluxData
Past life 1: Query Optimizer @
Vertica, also on Oracle DB server
Past life 2: Chief Architect + VP
Engineering roles at some ML startups
© 2021 InfluxData. All rights reserved.
3
Talk Outline
‒ Data Model and Storage Review
‒ Query Processing Overview
‒ Frontends
‒ Execution Plans
© 2021 InfluxData. All rights reserved.
4
Data Layout and
Storage
© 2021 InfluxData. All rights reserved.
5
Data Organization: Partitions
Partitions define how data is kept
in separate Chunks in storage.
Each chunk logically stores part of
a partition for one or more
Relational Tables
Partitioning is used for for
1. Data Lifecycle Management
(drop whole partitions by deleting files)
2. Query Performance
(partition pruning)
Each row mapped to a single
Partition based on Partition Rules
cpu table disk table requests table
Jan 1 Jan 2
Jan 3
Jan 1
Jan 3
Jan 1 Jan 2
© 2021 InfluxData. All rights reserved.
6
Data Organization: Chunks
Chunk0
(closed)
Chunk1
(closed)
Chunk4
open
Within each partition within a table,
data is divided into physical
chunks, identified with a chunk id
and a chunk order.
Chunks with lower order have older
(by insert time) data.
There is at most one open chunk
for each partition. All new data
(including deletes + updates) is
written into the open chunk
Once a chunk is closed, it becomes
immutable: rows are never
added/removed. The data is
compacted / persisted over time
into new chunks and the old chunk
dropped
Chunk2
(closed)
Chunk3
(closed)
New data is written into
the open chunk
Closed chunks are
ordered by age of data,
and never modified
© 2021 InfluxData. All rights reserved.
7
Data Model
weather,location=us-east temperature=82,humidity=67 1465839830100400200
weather,location=us-midwest temperature=82,humidity=65 1465839830100400200
weather,location=us-west temperature=70,humidity=54 1465839830100400200
weather,location=us-east temperature=83,humidity=69 1465839830200400200
weather,location=us-midwest temperature=87,humidity=78 1465839830200400200
weather,location=us-west temperature=72,humidity=56 1465839830200400200
weather,location=us-east temperature=84,humidity=67 1465839830300400200
weather,location=us-midwest temperature=90,humidity=82 1465839830400400200
weather,location=us-west temperature=71,humidity=57 1465839830400400200
location
"us-east"
"us-midwest"
"us-west"
"us-east"
"us-midwest"
"us-west"
"us-east"
"us-midwest"
"us-west"
temperature
82
82
70
83
87
72
84
90
71
humidity
67
65
54
69
78
56
67
82
57
timestamp
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.3004002Z
2016-06-13T17:43:50.3004002Z
2016-06-13T17:43:50.3004002Z
© 2021 InfluxData. All rights reserved.
8
Query Processing
© 2021 InfluxData. All rights reserved.
9
Design: One Execution Engine
1. Query and Data Reorganization*: two sides of the same coin “moving data around”
2. Reuse as much existing execution machinery (e.g. streaming, segregated worker pool, etc)
3. Amplify investment by leveraging Open Source (and contribute back)
⇒ All queries run through a unified planning system based
on DataFusion + Arrow
Execute as Rust async streams (`RecordBatchStream`)
using tokio executor
* Putting data in physical structures (Chunks)
© 2021 InfluxData. All rights reserved.
10
Query Processing IOx
Storage gRPC
Frontend
SQL Frontend
(from DataFusion)
Optimization
(storage pruning, pushdown,
etc)
Physical Planning
Execution
gRPC output Arrow Flight IPC
Query Input
Client / Language Specific
Frontends
Shared Planning, Execution
Phases, based on DataFusion
Client Specific
Output formats
read_group(.
.)
SELECT … FROM
...
DataFusio
n
LogicalPla
n
Arrow
Record
Batches
Reorg Frontend
compact_plan(..
)
ReadBuffer /
ParquetWriter
SeriesFrame
...
FlightData
Write to
ReadBufer or
Parquet files
DataFusio
n
LogicalPla
n
DataFusion
Physical
Plan
© 2021 InfluxData. All rights reserved.
11
IOx Query Optimization Features
“Classic”: Projection/Filter/Limit pushdown, partial eval, ...
Predicate Evaluation During Scan
Chunk Pruning on predicates
Parquet Row Group Pruning
Grouping/Aggregate Pushdown
Filters pushed down on some
metadata queries, scans in
ReadBuffer
DataFusion IOx
N/A
ReadBuffer has support, but no
query engine support
© 2021 InfluxData. All rights reserved.
12
Front Ends
(Logical Plans)
© 2021 InfluxData. All rights reserved.
13
SQL Frontend
Arrow Flight
client
IOx
Port 8082
Object Store
2. IOx answers
queries by
combining data
from parquet files
+ in memory
cache
1. Flight request sent
3. Response
streamed back
via flight RPC
See DataFusion: An Embeddable Query Engine Written in Rust for more details
© 2021 InfluxData. All rights reserved.
14
SQL: Logical Plan
SELECT cpu, usage_user, time
FROM cpu
WHERE cpu = 'cpu1';
TableScan is accomplished via
IOxReadFilterNode
.
Chunks are presented to as DataFusion
“partitions” (different than IOx partitions)
IOx query engine handles resolving upserts and
deletes
results
Filter:
#cpu Eq Utf8(“cpu1”)
TableScan: cpu
projection= Some([0,1,2,12])
Projection:
#cpu, #usage_user, #time
© 2021 InfluxData. All rights reserved.
15
Reorg / Life Cycle Planner
Chunk2
Chunk1
Chunk3
Compact Plan
resolves upserts /
applies deletes
RecordBatch stream
Compact lifecycle operation
writes Stream to new Read
Buffer (RUB) or Parquet chunk
Chunk2
Chunk1
Chunk3
Split Plan:
resolves upserts /
applies deletes
RecordBatch stream
Persist lifecycle operation writes
streams to Parquet chunk and RUB
respectively
RecordBatch stream
time <= split_time
time > split_time
Compact
Split (split_time)
© 2021 InfluxData. All rights reserved.
16
Reorg / Life Cycle Planner
Compact Plan
TableScan: cpu
Chunks = ...
Split Plan
TableScan: cpu
Chunks = ...
(2 DataFusion partitions)
StreamSplit split_time=1004
DataFusion
extension Node
© 2021 InfluxData. All rights reserved.
17
Storage gRPC frontend: Flux and InfluxQL
Flux
Runtime
InfluxQL
IOx
Port 8082
Object Store
2. IOx answers
queries by
combining data
from parquet files
+ in memory
cache
1. Flux/InfluxQL
send requests
via gRPC
3. Response
streamed back
via gRPC
© 2021 InfluxData. All rights reserved.
18
Storage gRPC Operations
‒ ReadFilter: Scan data out of IOx matching predicates
‒ ReadGroup: Groups/aggregates in IOx returning grouped data
‒ ReadWindowAggregate: Similar to ReadGroup, but windowed by time
‒ TagKeys: tag keys (column names) with data matching predicates.
‒ TagValues: distinct tag values (column values) with data matching predicates for a set
of tag keys (columns).
‒ MeasurementNames: table names that satisfy some provided predicate.
‒ MeasurementTagKeys: Same as TagKeys but limited to a single table.
‒ MeasurementTagValues: Same as TagValues but limited to a single table.
‒ MeasurementFields: field names (column names) with rows matching predicate
DataQuery Returns Time Series MetadataQuery Returns Strings / times
(thanks @e-dard)
© 2021 InfluxData. All rights reserved.
19
Metadata Queries
meta data queries are incredibly common and often done on more recent data
‒ measurement_names(range, predicate)
‒ tag_keys(range, predicate)
‒ tag_values(tag_key, range, predicate)
Metadata Query
Fast path for
predicates
?
* Read Buffer (RUB) in particular is heavily optimized for metadata queries and rarely need general purpose plans.
YES: Answer with optimized*
implementation in chunk.
NO: Run general purpose
(DataFusion) plan
© 2021 InfluxData. All rights reserved.
20
tag_keys (general)
tag_keys
pred: cpu ~= ‘.*total’
ts_range:[1000, 2000]
results
Filter:
#cpu =~ ‘.*total’ AND 1000 < #time
AND #time > 2000
TableScan: cpu
projection= Some([0,1,2,12])
SchemaPivot
DataFusion extension Node
Produces a single output
String column with the
name of any input column
that had a non null value
© 2021 InfluxData. All rights reserved.
21
Handling multiple tables
tag_keys
pred: host = ‘foo’
ts_range:[1000, 2000]
Filter:
1000 < #time AND
#time > 2000 AND host
= ‘foo’
TableScan: cpu
SchemaPivot
SeriesSetPlan for cpu
SeriesSetPlan(LogicalPlan)
Filter:
1000 < #time AND
#time > 2000 AND host
= ‘foo’
TableScan: mem
SchemaPivot
SeriesSetPlan for mem
{}
SeriesSetPlan for host
(no data between 1000 and
2000)
Results from multiple plans and sets are
combined at higher level
© 2021 InfluxData. All rights reserved.
22
read_filter
pred: tag(_m)=”system” AND tag(_f)=”usage_user” AND tag(cpu)=”cpu1”
ts_range: [1000, 2000)
read_filter: Logical Plan
IOx code creates DataFusion
LogicalPlan nodes
Filter:
1000 < #time AND
#time > 2000 AND
#cpu Eq Utf8(“cpu1”)
Sort: (#cpu, #host, #time)
TableScan: system
projection= Some([0,1,2,12])
Projection:
#cpu, #host, #usage_user, #time
TableScan is accomplished via same
IOxReadFilterNode
.
Predicates are applied using a Filter
Sort data in tag key order, as expected by Flux /
InfluxQL
© 2021 InfluxData. All rights reserved.
23
read_filter:
Physical Plan
FilterExec:
1000 < #time AND
#time > 2000 AND
#cpu Eq Utf8(“cpu1”)
Sort: (#cpu, #host, #time)
IOxReadFilterNode
ProjectionExec:
#cpu, #host, #usage_user, #time
CoalescePartitionsExec
CoalescePartitionsExec does
not preserve sort order
Added by DataFusion physical
planning due to requirements from
Sort
FilterExec:
1000 < #time AND
#time > 2000 AND
#cpu Eq Utf8(“cpu1”)
IOxReadFilterNode
FilterExec:
1000 < #time AND
#time > 2000 AND
#cpu Eq Utf8(“cpu1”)
IOxReadFilterNode:
….
Repeated for
each chunk
PartitionChunk
(mutable_buffer)
PartitionChunk
(read_buffer)
PartitionChunk
(read_buffer)
Calls
PartitionChunk::read_filter
During execution to get
results
© 2021 InfluxData. All rights reserved.
24
read_group
read_group
pred: tag(_m)=”cpu”
agg: first
group_keys: “env”
ts_range: [1000, 2000)
Assumes env and cpu are tags
results
Filter:
1000 < #time AND #time > 2000
Sort:
env, cpu, _time, _value
TableScan: cpu
projection= Some([0,1,2,12])
Projection:
env, cpu, _time, _value
GroupBy:
gby: env, cpu
agg: first.time(usage_user, time) as _time
first.value(usage_user, time) as _value
© 2021 InfluxData. All rights reserved.
25
Execution Plans
© 2021 InfluxData. All rights reserved.
26
Table Scan: Reading data from a Chunk
TableScan: cpu
IOxReadFilterNode
chunk_id = 1
LogicalPlan ExecutionPlan SendableRecordBatchStream
SchemaAdapterStream
{MUB,RUB,Parquet}Stream
For a single chunk
* Chunk that has no deletes or possible
updates
© 2021 InfluxData. All rights reserved.
27
Schema Adapter Stream
SchemaAdapterStream
output_schema: {A, B, C}
A C
1 10
2 20
3 30
4 40
Input RecordBatch
A B C
1 NULL 10
2 NULL 20
3 NULL 30
4 NULL 40
Output RecordBatch
Missing columns are padded with
nulls
© 2021 InfluxData. All rights reserved.
28
Read Time Resolution of Updates/Upserts
Chunks that potentially
have updates (overlaps)
to primary keys but
different sort orders
TableScan: cpu
IOxReadFilterNode
chunk_id = 1
LogicalPlan
ExecutionPlan
Simplified -- real
plans handle partial
overlap scenarios;
See source
documentation for
more details
IOxReadFilterNode
chunk_id = 7
UnionExec
SortPreservingMerge
DeduplicateExec
IOx extension that implements tag key
deduplication
SortExec(optional)
Sort_key: tags
SortExec(optional)
Sort_key: tags
May have to resort on
primary key columns
Classic N-way merge
Doesn’t combine any DF partitions
© 2021 InfluxData. All rights reserved.
29
Read Time Resolution of Deletes
IOxReadFilterNode
chunk_id = 1
ExecutionPlan
IOxReadFilterNode
chunk_id = 7
UnionExec
SortPreservingMerge
DeduplicateExec
SortExec(optional)
Sort_key: tags
SortExec(optional)
Sort_key: tags
Deletes can vary across chunks
Any delete predicates are also
applied in these scans (and thus
pushed down to MUB, RUB, etc)
as normal
FilterExec
time < 2021-10-01
Delete where
time < 2021-10-01
Thank You

Contenu connexe

Tendances

Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxData
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Catalogs - Turning a Set of Parquet Files into a Data Set
Catalogs - Turning a Set of Parquet Files into a Data SetCatalogs - Turning a Set of Parquet Files into a Data Set
Catalogs - Turning a Set of Parquet Files into a Data SetInfluxData
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Spark Summit
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Flink Forward
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDatabricks
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Databricks
 
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022InfluxData
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineInfluxData
 

Tendances (20)

Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Catalogs - Turning a Set of Parquet Files into a Data Set
Catalogs - Turning a Set of Parquet Files into a Data SetCatalogs - Turning a Set of Parquet Files into a Data Set
Catalogs - Turning a Set of Parquet Files into a Data Set
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage Engine
 

Similaire à 2021 10-13 i ox query processing

Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Rakib Hossain
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01Karam Abuataya
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11gfcamachob
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPFIvan Babrou
 
Oracle Database performance tuning using oratop
Oracle Database performance tuning using oratopOracle Database performance tuning using oratop
Oracle Database performance tuning using oratopSandesh Rao
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightDataStax Academy
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWRpasalapudi
 
Taming the Tiger: Tips and Tricks for Using Telegraf
Taming the Tiger: Tips and Tricks for Using TelegrafTaming the Tiger: Tips and Tricks for Using Telegraf
Taming the Tiger: Tips and Tricks for Using TelegrafInfluxData
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and ArchitectureSidney Chen
 
IO_Analysis_with_SAR.ppt
IO_Analysis_with_SAR.pptIO_Analysis_with_SAR.ppt
IO_Analysis_with_SAR.pptcookie1969
 
Anais Dotis-Georgiou [InfluxData] | Learn Flux by Example | InfluxDays NA 2021
Anais Dotis-Georgiou [InfluxData] | Learn Flux by Example | InfluxDays NA 2021Anais Dotis-Georgiou [InfluxData] | Learn Flux by Example | InfluxDays NA 2021
Anais Dotis-Georgiou [InfluxData] | Learn Flux by Example | InfluxDays NA 2021InfluxData
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
 
Troubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTanel Poder
 
Adventures in Dataguard
Adventures in DataguardAdventures in Dataguard
Adventures in DataguardJason Arneil
 
Schema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12cSchema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12cuzzal basak
 

Similaire à 2021 10-13 i ox query processing (20)

Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
 
Beginbackup
BeginbackupBeginbackup
Beginbackup
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
 
Rmoug ashmaster
Rmoug ashmasterRmoug ashmaster
Rmoug ashmaster
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPF
 
Oracle Database performance tuning using oratop
Oracle Database performance tuning using oratopOracle Database performance tuning using oratop
Oracle Database performance tuning using oratop
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 
Taming the Tiger: Tips and Tricks for Using Telegraf
Taming the Tiger: Tips and Tricks for Using TelegrafTaming the Tiger: Tips and Tricks for Using Telegraf
Taming the Tiger: Tips and Tricks for Using Telegraf
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
 
IO_Analysis_with_SAR.ppt
IO_Analysis_with_SAR.pptIO_Analysis_with_SAR.ppt
IO_Analysis_with_SAR.ppt
 
Anais Dotis-Georgiou [InfluxData] | Learn Flux by Example | InfluxDays NA 2021
Anais Dotis-Georgiou [InfluxData] | Learn Flux by Example | InfluxDays NA 2021Anais Dotis-Georgiou [InfluxData] | Learn Flux by Example | InfluxDays NA 2021
Anais Dotis-Georgiou [InfluxData] | Learn Flux by Example | InfluxDays NA 2021
 
Oracle ORA Errors
Oracle ORA ErrorsOracle ORA Errors
Oracle ORA Errors
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflix
 
Rac nonrac clone
Rac nonrac cloneRac nonrac clone
Rac nonrac clone
 
Troubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contention
 
Adventures in Dataguard
Adventures in DataguardAdventures in Dataguard
Adventures in Dataguard
 
Schema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12cSchema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12c
 

Dernier

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 

Dernier (20)

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 

2021 10-13 i ox query processing

  • 1. Query Processing in InfluxDB IOx SQL, Storge gRPC, Reorganization 2021-10-13 CC BY-SA Andrew Lamb
  • 2. © 2021 InfluxData. All rights reserved. 2 Today: IOx Team at InfluxData Past life 1: Query Optimizer @ Vertica, also on Oracle DB server Past life 2: Chief Architect + VP Engineering roles at some ML startups
  • 3. © 2021 InfluxData. All rights reserved. 3 Talk Outline ‒ Data Model and Storage Review ‒ Query Processing Overview ‒ Frontends ‒ Execution Plans
  • 4. © 2021 InfluxData. All rights reserved. 4 Data Layout and Storage
  • 5. © 2021 InfluxData. All rights reserved. 5 Data Organization: Partitions Partitions define how data is kept in separate Chunks in storage. Each chunk logically stores part of a partition for one or more Relational Tables Partitioning is used for for 1. Data Lifecycle Management (drop whole partitions by deleting files) 2. Query Performance (partition pruning) Each row mapped to a single Partition based on Partition Rules cpu table disk table requests table Jan 1 Jan 2 Jan 3 Jan 1 Jan 3 Jan 1 Jan 2
  • 6. © 2021 InfluxData. All rights reserved. 6 Data Organization: Chunks Chunk0 (closed) Chunk1 (closed) Chunk4 open Within each partition within a table, data is divided into physical chunks, identified with a chunk id and a chunk order. Chunks with lower order have older (by insert time) data. There is at most one open chunk for each partition. All new data (including deletes + updates) is written into the open chunk Once a chunk is closed, it becomes immutable: rows are never added/removed. The data is compacted / persisted over time into new chunks and the old chunk dropped Chunk2 (closed) Chunk3 (closed) New data is written into the open chunk Closed chunks are ordered by age of data, and never modified
  • 7. © 2021 InfluxData. All rights reserved. 7 Data Model weather,location=us-east temperature=82,humidity=67 1465839830100400200 weather,location=us-midwest temperature=82,humidity=65 1465839830100400200 weather,location=us-west temperature=70,humidity=54 1465839830100400200 weather,location=us-east temperature=83,humidity=69 1465839830200400200 weather,location=us-midwest temperature=87,humidity=78 1465839830200400200 weather,location=us-west temperature=72,humidity=56 1465839830200400200 weather,location=us-east temperature=84,humidity=67 1465839830300400200 weather,location=us-midwest temperature=90,humidity=82 1465839830400400200 weather,location=us-west temperature=71,humidity=57 1465839830400400200 location "us-east" "us-midwest" "us-west" "us-east" "us-midwest" "us-west" "us-east" "us-midwest" "us-west" temperature 82 82 70 83 87 72 84 90 71 humidity 67 65 54 69 78 56 67 82 57 timestamp 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.3004002Z 2016-06-13T17:43:50.3004002Z 2016-06-13T17:43:50.3004002Z
  • 8. © 2021 InfluxData. All rights reserved. 8 Query Processing
  • 9. © 2021 InfluxData. All rights reserved. 9 Design: One Execution Engine 1. Query and Data Reorganization*: two sides of the same coin “moving data around” 2. Reuse as much existing execution machinery (e.g. streaming, segregated worker pool, etc) 3. Amplify investment by leveraging Open Source (and contribute back) ⇒ All queries run through a unified planning system based on DataFusion + Arrow Execute as Rust async streams (`RecordBatchStream`) using tokio executor * Putting data in physical structures (Chunks)
  • 10. © 2021 InfluxData. All rights reserved. 10 Query Processing IOx Storage gRPC Frontend SQL Frontend (from DataFusion) Optimization (storage pruning, pushdown, etc) Physical Planning Execution gRPC output Arrow Flight IPC Query Input Client / Language Specific Frontends Shared Planning, Execution Phases, based on DataFusion Client Specific Output formats read_group(. .) SELECT … FROM ... DataFusio n LogicalPla n Arrow Record Batches Reorg Frontend compact_plan(.. ) ReadBuffer / ParquetWriter SeriesFrame ... FlightData Write to ReadBufer or Parquet files DataFusio n LogicalPla n DataFusion Physical Plan
  • 11. © 2021 InfluxData. All rights reserved. 11 IOx Query Optimization Features “Classic”: Projection/Filter/Limit pushdown, partial eval, ... Predicate Evaluation During Scan Chunk Pruning on predicates Parquet Row Group Pruning Grouping/Aggregate Pushdown Filters pushed down on some metadata queries, scans in ReadBuffer DataFusion IOx N/A ReadBuffer has support, but no query engine support
  • 12. © 2021 InfluxData. All rights reserved. 12 Front Ends (Logical Plans)
  • 13. © 2021 InfluxData. All rights reserved. 13 SQL Frontend Arrow Flight client IOx Port 8082 Object Store 2. IOx answers queries by combining data from parquet files + in memory cache 1. Flight request sent 3. Response streamed back via flight RPC See DataFusion: An Embeddable Query Engine Written in Rust for more details
  • 14. © 2021 InfluxData. All rights reserved. 14 SQL: Logical Plan SELECT cpu, usage_user, time FROM cpu WHERE cpu = 'cpu1'; TableScan is accomplished via IOxReadFilterNode . Chunks are presented to as DataFusion “partitions” (different than IOx partitions) IOx query engine handles resolving upserts and deletes results Filter: #cpu Eq Utf8(“cpu1”) TableScan: cpu projection= Some([0,1,2,12]) Projection: #cpu, #usage_user, #time
  • 15. © 2021 InfluxData. All rights reserved. 15 Reorg / Life Cycle Planner Chunk2 Chunk1 Chunk3 Compact Plan resolves upserts / applies deletes RecordBatch stream Compact lifecycle operation writes Stream to new Read Buffer (RUB) or Parquet chunk Chunk2 Chunk1 Chunk3 Split Plan: resolves upserts / applies deletes RecordBatch stream Persist lifecycle operation writes streams to Parquet chunk and RUB respectively RecordBatch stream time <= split_time time > split_time Compact Split (split_time)
  • 16. © 2021 InfluxData. All rights reserved. 16 Reorg / Life Cycle Planner Compact Plan TableScan: cpu Chunks = ... Split Plan TableScan: cpu Chunks = ... (2 DataFusion partitions) StreamSplit split_time=1004 DataFusion extension Node
  • 17. © 2021 InfluxData. All rights reserved. 17 Storage gRPC frontend: Flux and InfluxQL Flux Runtime InfluxQL IOx Port 8082 Object Store 2. IOx answers queries by combining data from parquet files + in memory cache 1. Flux/InfluxQL send requests via gRPC 3. Response streamed back via gRPC
  • 18. © 2021 InfluxData. All rights reserved. 18 Storage gRPC Operations ‒ ReadFilter: Scan data out of IOx matching predicates ‒ ReadGroup: Groups/aggregates in IOx returning grouped data ‒ ReadWindowAggregate: Similar to ReadGroup, but windowed by time ‒ TagKeys: tag keys (column names) with data matching predicates. ‒ TagValues: distinct tag values (column values) with data matching predicates for a set of tag keys (columns). ‒ MeasurementNames: table names that satisfy some provided predicate. ‒ MeasurementTagKeys: Same as TagKeys but limited to a single table. ‒ MeasurementTagValues: Same as TagValues but limited to a single table. ‒ MeasurementFields: field names (column names) with rows matching predicate DataQuery Returns Time Series MetadataQuery Returns Strings / times (thanks @e-dard)
  • 19. © 2021 InfluxData. All rights reserved. 19 Metadata Queries meta data queries are incredibly common and often done on more recent data ‒ measurement_names(range, predicate) ‒ tag_keys(range, predicate) ‒ tag_values(tag_key, range, predicate) Metadata Query Fast path for predicates ? * Read Buffer (RUB) in particular is heavily optimized for metadata queries and rarely need general purpose plans. YES: Answer with optimized* implementation in chunk. NO: Run general purpose (DataFusion) plan
  • 20. © 2021 InfluxData. All rights reserved. 20 tag_keys (general) tag_keys pred: cpu ~= ‘.*total’ ts_range:[1000, 2000] results Filter: #cpu =~ ‘.*total’ AND 1000 < #time AND #time > 2000 TableScan: cpu projection= Some([0,1,2,12]) SchemaPivot DataFusion extension Node Produces a single output String column with the name of any input column that had a non null value
  • 21. © 2021 InfluxData. All rights reserved. 21 Handling multiple tables tag_keys pred: host = ‘foo’ ts_range:[1000, 2000] Filter: 1000 < #time AND #time > 2000 AND host = ‘foo’ TableScan: cpu SchemaPivot SeriesSetPlan for cpu SeriesSetPlan(LogicalPlan) Filter: 1000 < #time AND #time > 2000 AND host = ‘foo’ TableScan: mem SchemaPivot SeriesSetPlan for mem {} SeriesSetPlan for host (no data between 1000 and 2000) Results from multiple plans and sets are combined at higher level
  • 22. © 2021 InfluxData. All rights reserved. 22 read_filter pred: tag(_m)=”system” AND tag(_f)=”usage_user” AND tag(cpu)=”cpu1” ts_range: [1000, 2000) read_filter: Logical Plan IOx code creates DataFusion LogicalPlan nodes Filter: 1000 < #time AND #time > 2000 AND #cpu Eq Utf8(“cpu1”) Sort: (#cpu, #host, #time) TableScan: system projection= Some([0,1,2,12]) Projection: #cpu, #host, #usage_user, #time TableScan is accomplished via same IOxReadFilterNode . Predicates are applied using a Filter Sort data in tag key order, as expected by Flux / InfluxQL
  • 23. © 2021 InfluxData. All rights reserved. 23 read_filter: Physical Plan FilterExec: 1000 < #time AND #time > 2000 AND #cpu Eq Utf8(“cpu1”) Sort: (#cpu, #host, #time) IOxReadFilterNode ProjectionExec: #cpu, #host, #usage_user, #time CoalescePartitionsExec CoalescePartitionsExec does not preserve sort order Added by DataFusion physical planning due to requirements from Sort FilterExec: 1000 < #time AND #time > 2000 AND #cpu Eq Utf8(“cpu1”) IOxReadFilterNode FilterExec: 1000 < #time AND #time > 2000 AND #cpu Eq Utf8(“cpu1”) IOxReadFilterNode: …. Repeated for each chunk PartitionChunk (mutable_buffer) PartitionChunk (read_buffer) PartitionChunk (read_buffer) Calls PartitionChunk::read_filter During execution to get results
  • 24. © 2021 InfluxData. All rights reserved. 24 read_group read_group pred: tag(_m)=”cpu” agg: first group_keys: “env” ts_range: [1000, 2000) Assumes env and cpu are tags results Filter: 1000 < #time AND #time > 2000 Sort: env, cpu, _time, _value TableScan: cpu projection= Some([0,1,2,12]) Projection: env, cpu, _time, _value GroupBy: gby: env, cpu agg: first.time(usage_user, time) as _time first.value(usage_user, time) as _value
  • 25. © 2021 InfluxData. All rights reserved. 25 Execution Plans
  • 26. © 2021 InfluxData. All rights reserved. 26 Table Scan: Reading data from a Chunk TableScan: cpu IOxReadFilterNode chunk_id = 1 LogicalPlan ExecutionPlan SendableRecordBatchStream SchemaAdapterStream {MUB,RUB,Parquet}Stream For a single chunk * Chunk that has no deletes or possible updates
  • 27. © 2021 InfluxData. All rights reserved. 27 Schema Adapter Stream SchemaAdapterStream output_schema: {A, B, C} A C 1 10 2 20 3 30 4 40 Input RecordBatch A B C 1 NULL 10 2 NULL 20 3 NULL 30 4 NULL 40 Output RecordBatch Missing columns are padded with nulls
  • 28. © 2021 InfluxData. All rights reserved. 28 Read Time Resolution of Updates/Upserts Chunks that potentially have updates (overlaps) to primary keys but different sort orders TableScan: cpu IOxReadFilterNode chunk_id = 1 LogicalPlan ExecutionPlan Simplified -- real plans handle partial overlap scenarios; See source documentation for more details IOxReadFilterNode chunk_id = 7 UnionExec SortPreservingMerge DeduplicateExec IOx extension that implements tag key deduplication SortExec(optional) Sort_key: tags SortExec(optional) Sort_key: tags May have to resort on primary key columns Classic N-way merge Doesn’t combine any DF partitions
  • 29. © 2021 InfluxData. All rights reserved. 29 Read Time Resolution of Deletes IOxReadFilterNode chunk_id = 1 ExecutionPlan IOxReadFilterNode chunk_id = 7 UnionExec SortPreservingMerge DeduplicateExec SortExec(optional) Sort_key: tags SortExec(optional) Sort_key: tags Deletes can vary across chunks Any delete predicates are also applied in these scans (and thus pushed down to MUB, RUB, etc) as normal FilterExec time < 2021-10-01 Delete where time < 2021-10-01