SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Technical Deep Dive:
How to Think Vectorized
Alex Behm
Tech Lead, Photon
Agenda
Introduction
Delta Engine, vectorization, micro-benchmarks
Expressions
Compute kernels, adaptivity, lazy filters
Aggregation
Hash tables, mixed row/columnar kernels
End-to-End Performance
Hardware Changes since 2015
2010 2015 2020
Storage
50 MB/s
(HDD)
500 MB/s
(SSD)
16 GB/s
(NVMe)
10X
Network 1 Gbps 10 Gbps 100 Gbps 10X
CPU ~3 GHz ~3 GHz ~3 GHz ☹
CPUs continue to be the bottleneck.
How do we achieve next level performance?
Workload Trends
Businesses are moving faster, and as a result organizations spend less
time in data modeling, leading to worse performance:
▪ Most columns don’t have “NOT NULL” defined
▪ Strings are convenient, and many date columns are stored as strings
▪ Raw → Bronze → Silver → Gold: from nothing to pristine schema/quality
Can we get both agility and performance?
Query
Optimizer
Photon
Execution
Engine
SQL
Spark
DataFrame
Koalas
Caching
Delta Engine
Photon
New execution engine for Delta Engine to accelerate Spark SQL
Built from scratch in C++, for performance:
▪ Vectorization: data-level and instruction-level parallelism
▪ Optimize for modern structured and semi-structured workloads
Vectorization
● Decompose query into compute kernels that process vectors of data
● Typically: Columnar in-memory format
● Cache and CPU friendly: simple predictable loops, many data items, SIMD
● Adaptive: Batch-level specialization, e.g., NULLs or no NULLs
● Modular: Can optimize individual kernels as needed
Sounds great! But… what does it really mean? How does it work? Is it worth
it?
This talk: I will teach you how to think vectorized!
Microbenchmarks
Does not necessarily reflect speedups on end-to-end queries
Let’s build a simple engine from scratch.
1. Expression evaluation and adaptivity
2. Filters and laziness
3. Hash tables and mixed column/row operations
Vectorization: Basic Building Blocks
Expressions
Running Example
SELECT SUM(c3) FROM t
WHERE c1 + c2 < 10
GROUP BY g1, g2
Scan
Filter
c1 + c2 < 10
Aggregate
SUM(c3)
We’re not covering this part
Operators pass
batches of
columnar data
Expression Evaluation
c1 c2
+
<
10
Out
SELECT SUM(c3) FROM t
WHERE c1 + c2 < 10
GROUP BY g1, g2
Expression Evaluation
c1 c2
+
<
10
Out
Kernels!
SELECT SUM(c3) FROM t
WHERE c1 + c2 < 10
GROUP BY g1, g2
Expression Evaluation
void PlusKernel(const int64_t* left, const int64_t* right
int32_t num_rows, int64_t* output) {
for (int32_t i = 0; i < num_rows; ++i) {
output[i] = left[i] + right[i]
}
}
SELECT SUM(c3) FROM t
WHERE c1 + c2 < 10
GROUP BY g1, g2
Expression Evaluation
void PlusKernel(const int64_t* left, const int64_t* right
int32_t num_rows, int64_t* output) {
for (int32_t i = 0; i < num_rows; ++i) {
output[i] = left[i] + right[i]
}
}
🤔
What about NULLs?
SELECT SUM(c3) FROM t
WHERE c1 + c2 < 10
GROUP BY g1, g2
Expression Evaluation
void PlusKernel(const int64_t* left, const bool* left_nulls,
const int64_t* right, const bool* right_nulls,
int32_t num_rows,
int64_t* output, bool* output_nulls) {
for (int32_t i = 0; i < num_rows; ++i) {
bool is_null = left_nulls[i] || right[nulls];
if (!is_null) output[i] = left[i] + right[i];
output_nulls[i] = is_null;
}
}
SELECT SUM(c3) FROM t
WHERE c1 + c2 < 10
GROUP BY g1, g2
Expression Evaluation
void PlusKernel(const int64_t* left, const bool* left_nulls,
const int64_t* right, const bool* right_nulls,
int32_t num_rows,
int64_t* output, bool* output_nulls) {
for (int32_t i = 0; i < num_rows; ++i) {
bool is_null = left_nulls[i] || right[nulls];
if (!is_null) output[i] = left[i] + right[i];
output_nulls[i] = is_null;
}
}
> 30% slower with NULL checks
SELECT SUM(c3) FROM t
WHERE c1 + c2 < 10
GROUP BY g1, g2
Expression Evaluation: Runtime Adaptivity
void PlusKernelNoNulls(...);
void PlusKernel(...);
void PlusEval(Column left, Column right, Column output) {
if (!left.has_nulls() && !right.has_nulls()) {
PlusKernelNoNulls(left.data(), right.data(), output.data());
} else {
PlusKernel(left.data(), left.nulls(), …);
}
}
But what if my data rarely has NULLs?
Expression Evaluation
c1 c2
+
<
10
Out
● Similar kernel approach
● Can optimize for literals,
~25% faster
SELECT SUM(c3) FROM t
WHERE c1 + c2 < 10
GROUP BY g1, g2
Filters
SELECT SUM(c3) FROM t
WHERE c1 + c2 < 10
GROUP BY g1, g2
c1 c2
+
<
10
Out ???
● What exactly is the output?
● What should we do with our
input column batch?
Filters: Lazy Representation as Active Rows
5
4
3
2
1
7
2
3
8
5
1
1
1
1
1
7
7
7
7
7
4
5
3
4
5
c1 c
2
c3 g1 g
2
Scan
Filter
c1 + c2 < 10
Aggregate
SUM(c3)
{c1, c2, c3, g1, g2}
{c1, c2, c3, g1, g2}
Column Batch
3
2
0
c1 + c2 < 10
Active RowsSELECT SUM(c3) FROM t
WHERE c1 + c2 < 10
GROUP BY g1, g2
Filters: Lazy Representation as Active Rows
void PlusNoNullsSomeActiveKernel(
const int64_t* left, const int64_t* right,
const int32_t* active_rows, int32_t num_rows,
int64_t* output) {
for (int32_t i = 0; i < num_rows; ++i) {
int32_t active_idx = active_rows[i];
output[active_idx] = left[active_idx] * right[active_idx]
}
}
Active rows concept must be supported throughout the engine
● Adds complexity, code
● Will come in handy for advanced operations like aggregation/join
Aggregation
Hash Aggregation
Basic Algorithm
1. Hash and find bucket
2. If bucket empty, initialize entry with
keys and aggregation buffers
3. Compare keys and follow probing
strategy to resolve collisions
4. Update aggregation buffers
according to aggregation function
and input
Hash Table
{g1, g2, SUM}
Hash Aggregation
Think vectorized!
● Columnar, batch-oriented
● Type specialized
Basic Algorithm
1. Hash and find bucket
2. If bucket empty, initialize entry with
keys and aggregation buffers
3. Compare keys and follow probing
strategy to resolve collisions
4. Update aggregation buffers
according to aggregation function
and input
Microbenchmarks
Does not necessarily reflect speedups on end-to-end queries
SELECT co1l, SUM(col2)
FROM t
GROUP BY col1
Hash Aggregation
Hash Table
{g1, g2, SUM}
{7, 5, 10}
{7, 4, 3}
1
1
1
1
1
7
7
7
7
7
4
5
3
4
5
c3 g1 g
2
Column Batch
h2
h1
h1
h2
h1
hashes
Hash Aggregation
1
1
1
1
1
7
7
7
7
7
4
5
3
4
5
c3 g1 g
2
Column Batch
h2
h1
h1
h2
h1
hashes buckets
Hash Table
{g1, g2, SUM}
{7, 5, 10}
{7, 4, 3}
Hash Aggregation
1
1
1
1
1
7
7
7
7
7
4
5
3
4
5
c3 g1 g
2
Column Batch
h2
h1
h1
h2
h1
hashes buckets
Hash Table
{g1, g2, SUM}
{7, 5, 10}
{7, 4, 3}
● Compare keys
● Create an active rows
for non-matches
(collisions)
Collision
Hash Aggregation
1
1
1
1
1
7
7
7
7
7
4
5
3
4
5
c3 g1 g
2
Column Batch
h2
h1
h1
h2
h1
hashes buckets
Hash Table
{g1, g2, SUM}
{7, 5, 10}
{7, 3, 0}
{7, 4, 3}
● Advance buckets for all
collisions and compare keys
● Repeat until match or
empty bucket
Hash Aggregation
1
1
1
1
1
7
7
7
7
7
4
5
3
4
5
c3 g1 g
2
Column Batch
h2
h1
h1
h2
h1
hashes buckets
Hash Table
{g1, g2, SUM}
{7, 5, 12}
{7, 3, 1}
{7, 4, 5}
● Update the aggregation
state for each aggregate
Mixed Column/Row Kernel Example
void AggKernel(AggFn* fn,
int64_t* input,
int8_t** buckets,
int64_t buffer_offset,
int32_t num_rows) {
for (int32_t i = 0; i < num_rows; ++i) {
// Memory access into large array. Good to have a tight loop.
int8_t* bucket = buckets[i];
// Make sure this gets inlined.
fn->update(input[i], bucket + buffer_offset);
}
}
A “column” whose values are sprayed
across rows in the hash table
End-to-End Performance
Why go to the trouble? TPC-DS 30TB Queries/Hour
3.3x
speedup
110
32
(Higher is better)
32
23 columns
mixed types
1 column
Real-World Queries
▪ Several preview customers from different industries
▪ Need to have a suitable workload with sufficient Photon feature coverage
▪ Typical experience: 2-3x speedup end-to-end
▪ Mileage varies, best speedup: From 80 → 5 minutes!
▪ Vectorization: Decompose query into simple loops over vectors of data
▪ Batch-level adaptivity, e.g., NULLs vs no-NULLs
▪ Lazy filter evaluation with an active rows → useful concept
▪ Mixed column/row operations for accessing hash tables
Recap
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

Contenu connexe

Tendances

A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiDatabricks
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code GenerationDatabricks
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseDatabricks
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the RoadmapEDB
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark Summit
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLDatabricks
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Spark Summit
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
Parallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysParallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysDatabricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeDatabricks
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationDatabricks
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 

Tendances (20)

Explain that explain
Explain that explainExplain that explain
Explain that explain
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the Roadmap
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Parallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysParallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected Ways
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 

Similaire à Photon Technical Deep Dive: How to Think Vectorized

lecture8_Cuong.ppt
lecture8_Cuong.pptlecture8_Cuong.ppt
lecture8_Cuong.pptHongV34104
 
Rainer Grimm, “Functional Programming in C++11”
Rainer Grimm, “Functional Programming in C++11”Rainer Grimm, “Functional Programming in C++11”
Rainer Grimm, “Functional Programming in C++11”Platonov Sergey
 
C++20 the small things - Timur Doumler
C++20 the small things - Timur DoumlerC++20 the small things - Timur Doumler
C++20 the small things - Timur Doumlercorehard_by
 
IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesKeshav Murthy
 
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Provectus
 
The Ring programming language version 1.6 book - Part 9 of 189
The Ring programming language version 1.6 book - Part 9 of 189The Ring programming language version 1.6 book - Part 9 of 189
The Ring programming language version 1.6 book - Part 9 of 189Mahmoud Samir Fayed
 
Automatically Documenting Program Changes
Automatically Documenting Program ChangesAutomatically Documenting Program Changes
Automatically Documenting Program ChangesRay Buse
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
 
The Ring programming language version 1.7 book - Part 10 of 196
The Ring programming language version 1.7 book - Part 10 of 196The Ring programming language version 1.7 book - Part 10 of 196
The Ring programming language version 1.7 book - Part 10 of 196Mahmoud Samir Fayed
 
Whats new in_csharp4
Whats new in_csharp4Whats new in_csharp4
Whats new in_csharp4Abed Bukhari
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介Masayuki Matsushita
 
CS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfCS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfssuser034ce1
 
Simplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functionsSimplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functionsClayton Groom
 
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdfLDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdfVedant Gavhane
 
The MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer TraceThe MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer Traceoysteing
 
Intro to tsql unit 10
Intro to tsql   unit 10Intro to tsql   unit 10
Intro to tsql unit 10Syed Asrarali
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 

Similaire à Photon Technical Deep Dive: How to Think Vectorized (20)

lecture8_Cuong.ppt
lecture8_Cuong.pptlecture8_Cuong.ppt
lecture8_Cuong.ppt
 
Rainer Grimm, “Functional Programming in C++11”
Rainer Grimm, “Functional Programming in C++11”Rainer Grimm, “Functional Programming in C++11”
Rainer Grimm, “Functional Programming in C++11”
 
C++20 the small things - Timur Doumler
C++20 the small things - Timur DoumlerC++20 the small things - Timur Doumler
C++20 the small things - Timur Doumler
 
GCC
GCCGCC
GCC
 
IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql Features
 
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
 
Explain this!
Explain this!Explain this!
Explain this!
 
The Ring programming language version 1.6 book - Part 9 of 189
The Ring programming language version 1.6 book - Part 9 of 189The Ring programming language version 1.6 book - Part 9 of 189
The Ring programming language version 1.6 book - Part 9 of 189
 
Automatically Documenting Program Changes
Automatically Documenting Program ChangesAutomatically Documenting Program Changes
Automatically Documenting Program Changes
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
The Ring programming language version 1.7 book - Part 10 of 196
The Ring programming language version 1.7 book - Part 10 of 196The Ring programming language version 1.7 book - Part 10 of 196
The Ring programming language version 1.7 book - Part 10 of 196
 
Whats new in_csharp4
Whats new in_csharp4Whats new in_csharp4
Whats new in_csharp4
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
 
CS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfCS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdf
 
Simplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functionsSimplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functions
 
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdfLDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
 
The MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer TraceThe MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer Trace
 
Intro to tsql unit 10
Intro to tsql   unit 10Intro to tsql   unit 10
Intro to tsql unit 10
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Novos recursos do postgre sql para sharding
Novos recursos do postgre sql para shardingNovos recursos do postgre sql para sharding
Novos recursos do postgre sql para sharding
 

Plus de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Dernier

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 

Dernier (20)

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 

Photon Technical Deep Dive: How to Think Vectorized

  • 1. Technical Deep Dive: How to Think Vectorized Alex Behm Tech Lead, Photon
  • 2. Agenda Introduction Delta Engine, vectorization, micro-benchmarks Expressions Compute kernels, adaptivity, lazy filters Aggregation Hash tables, mixed row/columnar kernels End-to-End Performance
  • 3. Hardware Changes since 2015 2010 2015 2020 Storage 50 MB/s (HDD) 500 MB/s (SSD) 16 GB/s (NVMe) 10X Network 1 Gbps 10 Gbps 100 Gbps 10X CPU ~3 GHz ~3 GHz ~3 GHz ☹ CPUs continue to be the bottleneck. How do we achieve next level performance?
  • 4. Workload Trends Businesses are moving faster, and as a result organizations spend less time in data modeling, leading to worse performance: ▪ Most columns don’t have “NOT NULL” defined ▪ Strings are convenient, and many date columns are stored as strings ▪ Raw → Bronze → Silver → Gold: from nothing to pristine schema/quality Can we get both agility and performance?
  • 6. Photon New execution engine for Delta Engine to accelerate Spark SQL Built from scratch in C++, for performance: ▪ Vectorization: data-level and instruction-level parallelism ▪ Optimize for modern structured and semi-structured workloads
  • 7. Vectorization ● Decompose query into compute kernels that process vectors of data ● Typically: Columnar in-memory format ● Cache and CPU friendly: simple predictable loops, many data items, SIMD ● Adaptive: Batch-level specialization, e.g., NULLs or no NULLs ● Modular: Can optimize individual kernels as needed Sounds great! But… what does it really mean? How does it work? Is it worth it? This talk: I will teach you how to think vectorized!
  • 8. Microbenchmarks Does not necessarily reflect speedups on end-to-end queries
  • 9. Let’s build a simple engine from scratch. 1. Expression evaluation and adaptivity 2. Filters and laziness 3. Hash tables and mixed column/row operations Vectorization: Basic Building Blocks
  • 11. Running Example SELECT SUM(c3) FROM t WHERE c1 + c2 < 10 GROUP BY g1, g2 Scan Filter c1 + c2 < 10 Aggregate SUM(c3) We’re not covering this part Operators pass batches of columnar data
  • 12. Expression Evaluation c1 c2 + < 10 Out SELECT SUM(c3) FROM t WHERE c1 + c2 < 10 GROUP BY g1, g2
  • 13. Expression Evaluation c1 c2 + < 10 Out Kernels! SELECT SUM(c3) FROM t WHERE c1 + c2 < 10 GROUP BY g1, g2
  • 14. Expression Evaluation void PlusKernel(const int64_t* left, const int64_t* right int32_t num_rows, int64_t* output) { for (int32_t i = 0; i < num_rows; ++i) { output[i] = left[i] + right[i] } } SELECT SUM(c3) FROM t WHERE c1 + c2 < 10 GROUP BY g1, g2
  • 15. Expression Evaluation void PlusKernel(const int64_t* left, const int64_t* right int32_t num_rows, int64_t* output) { for (int32_t i = 0; i < num_rows; ++i) { output[i] = left[i] + right[i] } } 🤔 What about NULLs? SELECT SUM(c3) FROM t WHERE c1 + c2 < 10 GROUP BY g1, g2
  • 16. Expression Evaluation void PlusKernel(const int64_t* left, const bool* left_nulls, const int64_t* right, const bool* right_nulls, int32_t num_rows, int64_t* output, bool* output_nulls) { for (int32_t i = 0; i < num_rows; ++i) { bool is_null = left_nulls[i] || right[nulls]; if (!is_null) output[i] = left[i] + right[i]; output_nulls[i] = is_null; } } SELECT SUM(c3) FROM t WHERE c1 + c2 < 10 GROUP BY g1, g2
  • 17. Expression Evaluation void PlusKernel(const int64_t* left, const bool* left_nulls, const int64_t* right, const bool* right_nulls, int32_t num_rows, int64_t* output, bool* output_nulls) { for (int32_t i = 0; i < num_rows; ++i) { bool is_null = left_nulls[i] || right[nulls]; if (!is_null) output[i] = left[i] + right[i]; output_nulls[i] = is_null; } } > 30% slower with NULL checks SELECT SUM(c3) FROM t WHERE c1 + c2 < 10 GROUP BY g1, g2
  • 18. Expression Evaluation: Runtime Adaptivity void PlusKernelNoNulls(...); void PlusKernel(...); void PlusEval(Column left, Column right, Column output) { if (!left.has_nulls() && !right.has_nulls()) { PlusKernelNoNulls(left.data(), right.data(), output.data()); } else { PlusKernel(left.data(), left.nulls(), …); } } But what if my data rarely has NULLs?
  • 19. Expression Evaluation c1 c2 + < 10 Out ● Similar kernel approach ● Can optimize for literals, ~25% faster SELECT SUM(c3) FROM t WHERE c1 + c2 < 10 GROUP BY g1, g2
  • 20. Filters SELECT SUM(c3) FROM t WHERE c1 + c2 < 10 GROUP BY g1, g2 c1 c2 + < 10 Out ??? ● What exactly is the output? ● What should we do with our input column batch?
  • 21. Filters: Lazy Representation as Active Rows 5 4 3 2 1 7 2 3 8 5 1 1 1 1 1 7 7 7 7 7 4 5 3 4 5 c1 c 2 c3 g1 g 2 Scan Filter c1 + c2 < 10 Aggregate SUM(c3) {c1, c2, c3, g1, g2} {c1, c2, c3, g1, g2} Column Batch 3 2 0 c1 + c2 < 10 Active RowsSELECT SUM(c3) FROM t WHERE c1 + c2 < 10 GROUP BY g1, g2
  • 22. Filters: Lazy Representation as Active Rows void PlusNoNullsSomeActiveKernel( const int64_t* left, const int64_t* right, const int32_t* active_rows, int32_t num_rows, int64_t* output) { for (int32_t i = 0; i < num_rows; ++i) { int32_t active_idx = active_rows[i]; output[active_idx] = left[active_idx] * right[active_idx] } } Active rows concept must be supported throughout the engine ● Adds complexity, code ● Will come in handy for advanced operations like aggregation/join
  • 24. Hash Aggregation Basic Algorithm 1. Hash and find bucket 2. If bucket empty, initialize entry with keys and aggregation buffers 3. Compare keys and follow probing strategy to resolve collisions 4. Update aggregation buffers according to aggregation function and input Hash Table {g1, g2, SUM}
  • 25. Hash Aggregation Think vectorized! ● Columnar, batch-oriented ● Type specialized Basic Algorithm 1. Hash and find bucket 2. If bucket empty, initialize entry with keys and aggregation buffers 3. Compare keys and follow probing strategy to resolve collisions 4. Update aggregation buffers according to aggregation function and input
  • 26. Microbenchmarks Does not necessarily reflect speedups on end-to-end queries SELECT co1l, SUM(col2) FROM t GROUP BY col1
  • 27. Hash Aggregation Hash Table {g1, g2, SUM} {7, 5, 10} {7, 4, 3} 1 1 1 1 1 7 7 7 7 7 4 5 3 4 5 c3 g1 g 2 Column Batch h2 h1 h1 h2 h1 hashes
  • 28. Hash Aggregation 1 1 1 1 1 7 7 7 7 7 4 5 3 4 5 c3 g1 g 2 Column Batch h2 h1 h1 h2 h1 hashes buckets Hash Table {g1, g2, SUM} {7, 5, 10} {7, 4, 3}
  • 29. Hash Aggregation 1 1 1 1 1 7 7 7 7 7 4 5 3 4 5 c3 g1 g 2 Column Batch h2 h1 h1 h2 h1 hashes buckets Hash Table {g1, g2, SUM} {7, 5, 10} {7, 4, 3} ● Compare keys ● Create an active rows for non-matches (collisions) Collision
  • 30. Hash Aggregation 1 1 1 1 1 7 7 7 7 7 4 5 3 4 5 c3 g1 g 2 Column Batch h2 h1 h1 h2 h1 hashes buckets Hash Table {g1, g2, SUM} {7, 5, 10} {7, 3, 0} {7, 4, 3} ● Advance buckets for all collisions and compare keys ● Repeat until match or empty bucket
  • 31. Hash Aggregation 1 1 1 1 1 7 7 7 7 7 4 5 3 4 5 c3 g1 g 2 Column Batch h2 h1 h1 h2 h1 hashes buckets Hash Table {g1, g2, SUM} {7, 5, 12} {7, 3, 1} {7, 4, 5} ● Update the aggregation state for each aggregate
  • 32. Mixed Column/Row Kernel Example void AggKernel(AggFn* fn, int64_t* input, int8_t** buckets, int64_t buffer_offset, int32_t num_rows) { for (int32_t i = 0; i < num_rows; ++i) { // Memory access into large array. Good to have a tight loop. int8_t* bucket = buckets[i]; // Make sure this gets inlined. fn->update(input[i], bucket + buffer_offset); } } A “column” whose values are sprayed across rows in the hash table
  • 34. Why go to the trouble? TPC-DS 30TB Queries/Hour 3.3x speedup 110 32 (Higher is better)
  • 36. Real-World Queries ▪ Several preview customers from different industries ▪ Need to have a suitable workload with sufficient Photon feature coverage ▪ Typical experience: 2-3x speedup end-to-end ▪ Mileage varies, best speedup: From 80 → 5 minutes!
  • 37. ▪ Vectorization: Decompose query into simple loops over vectors of data ▪ Batch-level adaptivity, e.g., NULLs vs no-NULLs ▪ Lazy filter evaluation with an active rows → useful concept ▪ Mixed column/row operations for accessing hash tables Recap
  • 38. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.