SlideShare une entreprise Scribd logo
1  sur  32
cstore_fdw – Columnar store 
for analytic workloads 
Hadi Moshayedi & 
Ben Redman
What is CitusDB? 
• CitusDB is a scalable analytics database that 
extends PostgreSQL 
– Citus shards your data and automatically parallelizes 
your queries 
– Citus isn’t a fork of Postgres. Rather, it hooks onto the 
planner and executor for distributed query execution. 
– Always rebased to newest Postgres version 
– Natively supports new data types and extensions
A C 
D 
worker node #1 
(extended PostgreSQL) 
C 
worker node #2 
(extended PostgreSQL) 
A 
worker node #3 
(extended PostgreSQL) 
1 shard = 
1 Postgres 
table 
. . . . 
master node 
(extended PostgreSQL) 
shard and shard 
placement metadata
Talk Overview 
1. Why customers want columnar stores 
2. Live demo 
3. Optimized Row Columnar (ORC) format 
4. PostgreSQL benefits 
5. Benchmark numbers
Id Sz Ln Ht … … … … … … … … … … … 
1 4 3 4 … … … … … … … … … … … 
2 4 11 3 … … … … … … … … … … … 
3 1 4 2 … … … … … … … … … … … 
4 8 4 12 … … … … … … … … … … … 
… 
4 
… … … … … … … … … … … … … … 
… 
4 
… 
… … … … … … … … … … … … … … 
4 
… 
… … … … … … … … … … … … … … 
30M 
rows 
700 columns
Example SQL query 
SELECT 
id, AVG(price), MAX(price) 
FROM 
items 
WHERE 
quantity > 100 AND 
last_stock_date < ‘2013-10-01’ 
GROUP BY 
weight;
Row-oriented store 
Id … price … … quant … … last_stm … … … … … weight 
1 … 3.90 … … 31 … … 2013-… … … … … … 0.6 
2 … 13 … … 70 … … 2010-… … … … … … 0.8 
3 … 4.25 … … 432 … … 2013-… … … … … … 1 
4 … 4 … … 45 … … 2013-… … … … … … 6 
… 
4… … 95 … … 37 … … 2013-… … … … … … 0.6 
4… … 59 … … 90 … … 2012-… … … … … … 1.5
Row-oriented store 
Id … price … … quant … … last_stm … … … … … weight 
1 … 3.90 … … 31 … … 2013-… … … … … … 0.6 
2 … 13 … … 70 … … 2010-… … … … … … 0.8 
3 … 4.25 … … 432 … … 2013-… … … … … … 1 
4 … 4 … … 45 … … 2013-… … … … … … 6 
… 
4… … 95 … … 37 … … 2013-… … … … … … 0.6 
4… … 59 … … 90 … … 2012-… … … … … … 1.5
Row-oriented store 
Id … price … … quant … … last_stm … … … … … weight 
1 … 3.90 … … 31 … … 2013-… … … … … … 0.6 
2 … 13 … … 70 … … 2010-… … … … … … 0.8 
3 … 4.25 … … 432 … … 2013-… … … … … … 1 
4 … 4 … … 45 … … 2013-… … … … … … 6 
… 
4… … 95 … … 37 … … 2013-… … … … … … 0.6 
4… … 59 … … 90 … … 2012-… … … … … … 1.5
Row-oriented store 
Id … price … … quant … … last_stm … … … … … weight 
1 … 3.90 … … 31 … … 2013-… … … … … … 0.6 
2 … 13 … … 70 … … 2010-… … … … … … 0.8 
3 … 4.25 … … 432 … … 2013-… … … … … … 1 
4 … 4 … … 45 … … 2013-… … … … … … 6 
… 
4… … 95 … … 37 … … 2013-… … … … … … 0.6 
4… … 59 … … 90 … … 2012-… … … … … … 1.5
Cost of row storage 
• Read 700 columns instead of 5 
• >39 GB of unnecessary I/O 
Input Type Estimated Input 
Rate 
Cost to query 
performance 
Memory 10 GB/s 3.9 seconds 
SSD 600 MB/s >60 seconds
Example SQL query 
SELECT 
id, AVG(price), MAX(price) 
FROM 
items 
WHERE 
quantity > 100 AND 
last_stock_date < ‘2013-10-01’ 
GROUP BY 
weight;
Column-oriented store 
Id sz price … … quant … … last_stm … … … … … weight 
1 4 3.90 … … 31 … … 2013-… … … … … … 0.6 
2 3 13 … … 70 … … 2010-… … … … … … 0.8 
3 2 4.25 … … 432 … … 2013-… … … … … … 1 
4 4 4 … … 45 … … 2013-… … … … … … 6 
… 
4… 19 95 … … 37 … … 2013-… … … … … … 0.6 
4… 2 59 … … 90 … … 2012-… … … … … … 1.5
Column-oriented store 
Id sz price … … quant … … last_stm … … … … … weight 
1 4 3.90 … … 31 … … 2013-… … … … … … 0.6 
2 3 13 … … 70 … … 2010-… … … … … … 0.8 
3 2 4.25 … … 432 … … 2013-… … … … … … 1 
4 4 4 … … 45 … … 2013-… … … … … … 6 
… 
4… 19 95 … … 37 … … 2013-… … … … … … 0.6 
4… 2 59 … … 90 … … 2012-… … … … … … 1.5
Column-oriented store 
Id sz price … … quant … … last_stm … … … … … weight 
1 4 3.90 … … 31 … … 2013-… … … … … … 0.6 
2 3 13 … … 70 … … 2010-… … … … … … 0.8 
3 2 4.25 … … 432 … … 2013-… … … … … … 1 
4 4 4 … … 45 … … 2013-… … … … … … 6 
… 
4… 19 95 … … 37 … … 2013-… … … … … … 0.6 
4… 2 59 … … 90 … … 2012-… … … … … … 1.5
Columnar Store Motivation 
• Read subset of columns to reduce I/O 
• Better compression 
– Less disk usage 
– Less disk I/O
State of the Columnar Store 
1. Fork a popular database, swap in your 
storage engine, and never look back 
2. Develop an open columnar store format for 
the Hadoop Distributed Filesystem (HDFS) 
3. Use PostgreSQL extension machinery for in-memory 
stores / external databases
ORC File Layout benefits 
1. Columnar layout – reads columns only 
related to the query 
2. Compression – groups column values 
(10K) together and compresses them 
3. Skip indexes – applies predicate filtering 
to skip over unrelated values
Block 1 
Block 2 
Block 3 
Block 4 
Block 5 
Block 6 
Block 7 
150K rows 
(configurable) 
150K rows 
(configurable) 10K column values 
(configurable) per 
block
Compression 
• Current compression method is PG_LZ 
from PostgreSQL core 
• Easy to add new compression methods 
depending on the CPU / disk trade-off 
• cstore_fdw enables using different 
compression methods at the column block 
level
Table sizes normalized to 1.0
Drawbacks to ORC 
• Support for limited data types. Each data 
type further needs to have a separate 
code path for min/max value collection and 
constraint exclusion. 
• Gathering statistics from the data and 
table JOINs are an afterthought.
Recent Benchmark Results 
• TPC-H is a standard benchmark 
• Performed in-memory, SSD, and HDD tests 
on 10 GB of data 
• Used m2.2xlarge and m3.2xlarge on EC2 
• Compared vanilla PostgreSQL, CStore, 
CStore with compression
10GB of uncached data on m2.2xlarge
10GB of uncached data on m3.2xlarge
Total issued disk I/O measures with iotop
10GB of cached data on m2/m3.2xlarge
1.1 Release 
• CStore is an open source project actively in 
development: github.com/citusdata/cstore_fdw 
– Improved statistics gathering 
– Automatic management of table filenames 
– Management of table file data
Future Work 
– Improve memory usage 
– Native Delete / Insert / Update support 
– Improve read query performance (vectorized 
execution) 
– Different compression codecs 
– Many more; contribute to the discussion on 
GitHub!
Summary 
• CStore: Open source columnar store fdw for 
Postgres 
• Improves query times, reduces disk I/O, and 
reduces disk utilization 
• Uses foreign wrapper APIs 
1 Supports all PostgreSQL data types 
2 Statistics collection for better query plans 
3 Load extension. Create Table. Copy
cstore_fdw – Columnar Store 
for Analytic Workloads 
Hadi Moshayedi – hadi@citusdata.com 
Ben Redman – ben@citusdata.com

Contenu connexe

Tendances

Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayAltinity Ltd
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQLGeorgi Sotirov
 
The internals of gporca optimizer
The internals of gporca optimizerThe internals of gporca optimizer
The internals of gporca optimizerXin Zhang
 
PostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with groupingPostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with groupingAlexey Bashtanov
 
Chapter 02 php basic syntax
Chapter 02   php basic syntaxChapter 02   php basic syntax
Chapter 02 php basic syntaxDhani Ahmad
 
Introduction to hazelcast
Introduction to hazelcastIntroduction to hazelcast
Introduction to hazelcastEmin Demirci
 
Database normalization
Database normalizationDatabase normalization
Database normalizationEdward Blurock
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
 
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdfMySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdfAlkin Tezuysal
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZENorvald Ryeng
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityJulian Hyde
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Spark Summit
 

Tendances (20)

SQL
SQLSQL
SQL
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQL
 
The internals of gporca optimizer
The internals of gporca optimizerThe internals of gporca optimizer
The internals of gporca optimizer
 
Html
HtmlHtml
Html
 
MYSQL.ppt
MYSQL.pptMYSQL.ppt
MYSQL.ppt
 
PostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with groupingPostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with grouping
 
Chapter 02 php basic syntax
Chapter 02   php basic syntaxChapter 02   php basic syntax
Chapter 02 php basic syntax
 
Introduction to hazelcast
Introduction to hazelcastIntroduction to hazelcast
Introduction to hazelcast
 
HTML Tables
HTML TablesHTML Tables
HTML Tables
 
Database normalization
Database normalizationDatabase normalization
Database normalization
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdfMySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
 
Html basics
Html basicsHtml basics
Html basics
 
Chapter 1 introduction to sql server
Chapter 1 introduction to sql serverChapter 1 introduction to sql server
Chapter 1 introduction to sql server
 
Html5 notes for professionals
Html5 notes for professionalsHtml5 notes for professionals
Html5 notes for professionals
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
 
HTML practical file
HTML practical fileHTML practical file
HTML practical file
 

Similaire à cstore_fdw: Columnar Storage for PostgreSQL

SF PostgreSQL User Group cstore presentation
SF PostgreSQL User Group cstore presentationSF PostgreSQL User Group cstore presentation
SF PostgreSQL User Group cstore presentationCitus Data
 
lernOS Prozessmodellierung Guide (Version 1.0)
lernOS Prozessmodellierung Guide (Version 1.0)lernOS Prozessmodellierung Guide (Version 1.0)
lernOS Prozessmodellierung Guide (Version 1.0)Cogneon Akademie
 
Evaluierungsmodell
EvaluierungsmodellEvaluierungsmodell
Evaluierungsmodelloneduphine
 
Evaluierungsmodell
EvaluierungsmodellEvaluierungsmodell
Evaluierungsmodelloneduphine
 
Large Scale Multilayer Perceptron
Large Scale Multilayer PerceptronLarge Scale Multilayer Perceptron
Large Scale Multilayer PerceptronSascha Jonas
 
The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...
The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...
The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...Bernhard Seilz
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseSolr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseDataStax Academy
 
Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...
Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...
Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...Kulturplanner
 
Stack- und Heap-Overflow-Schutz bei Windows XP und Windows Vista
Stack- und Heap-Overflow-Schutz bei Windows XP und Windows Vista Stack- und Heap-Overflow-Schutz bei Windows XP und Windows Vista
Stack- und Heap-Overflow-Schutz bei Windows XP und Windows Vista Johannes Hohenbichler
 
Bachelorarbeit paul gerber.pdf
Bachelorarbeit paul gerber.pdfBachelorarbeit paul gerber.pdf
Bachelorarbeit paul gerber.pdfwissem hammouda
 
HTML5 und CSS3 Übersicht
HTML5 und CSS3 ÜbersichtHTML5 und CSS3 Übersicht
HTML5 und CSS3 ÜbersichtSven Brencher
 
Wettbewerbsanalyse - Blick ins Buch (Auszug)
Wettbewerbsanalyse - Blick ins Buch (Auszug)Wettbewerbsanalyse - Blick ins Buch (Auszug)
Wettbewerbsanalyse - Blick ins Buch (Auszug)ACRASIO
 
Inhaltsverzeichnis: amzn.to/emailBuch
Inhaltsverzeichnis: amzn.to/emailBuchInhaltsverzeichnis: amzn.to/emailBuch
Inhaltsverzeichnis: amzn.to/emailBuchRene Kulka
 
Master thesis pascal_mueller01
Master thesis pascal_mueller01Master thesis pascal_mueller01
Master thesis pascal_mueller01guest39ce4e
 

Similaire à cstore_fdw: Columnar Storage for PostgreSQL (20)

SF PostgreSQL User Group cstore presentation
SF PostgreSQL User Group cstore presentationSF PostgreSQL User Group cstore presentation
SF PostgreSQL User Group cstore presentation
 
lernOS Prozessmodellierung Guide (Version 1.0)
lernOS Prozessmodellierung Guide (Version 1.0)lernOS Prozessmodellierung Guide (Version 1.0)
lernOS Prozessmodellierung Guide (Version 1.0)
 
Evaluierungsmodell
EvaluierungsmodellEvaluierungsmodell
Evaluierungsmodell
 
Evaluierungsmodell
EvaluierungsmodellEvaluierungsmodell
Evaluierungsmodell
 
Large Scale Multilayer Perceptron
Large Scale Multilayer PerceptronLarge Scale Multilayer Perceptron
Large Scale Multilayer Perceptron
 
The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...
The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...
The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseSolr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
 
Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...
Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...
Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...
 
Stack- und Heap-Overflow-Schutz bei Windows XP und Windows Vista
Stack- und Heap-Overflow-Schutz bei Windows XP und Windows Vista Stack- und Heap-Overflow-Schutz bei Windows XP und Windows Vista
Stack- und Heap-Overflow-Schutz bei Windows XP und Windows Vista
 
Bachelorarbeit paul gerber.pdf
Bachelorarbeit paul gerber.pdfBachelorarbeit paul gerber.pdf
Bachelorarbeit paul gerber.pdf
 
Msrbas
MsrbasMsrbas
Msrbas
 
HTML5 und CSS3 Übersicht
HTML5 und CSS3 ÜbersichtHTML5 und CSS3 Übersicht
HTML5 und CSS3 Übersicht
 
Wettbewerbsanalyse - Blick ins Buch (Auszug)
Wettbewerbsanalyse - Blick ins Buch (Auszug)Wettbewerbsanalyse - Blick ins Buch (Auszug)
Wettbewerbsanalyse - Blick ins Buch (Auszug)
 
Inhaltsverzeichnis: amzn.to/emailBuch
Inhaltsverzeichnis: amzn.to/emailBuchInhaltsverzeichnis: amzn.to/emailBuch
Inhaltsverzeichnis: amzn.to/emailBuch
 
Dsvdoc
DsvdocDsvdoc
Dsvdoc
 
Dsvdoc
DsvdocDsvdoc
Dsvdoc
 
Dsvdoc
DsvdocDsvdoc
Dsvdoc
 
Dsvdoc
DsvdocDsvdoc
Dsvdoc
 
Master thesis pascal_mueller01
Master thesis pascal_mueller01Master thesis pascal_mueller01
Master thesis pascal_mueller01
 
mabio
mabiomabio
mabio
 

Plus de Citus Data

Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Citus Data
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
 
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...Citus Data
 
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Citus Data
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensCitus Data
 
When it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberWhen it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberCitus Data
 
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncAmazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncCitus Data
 
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...Citus Data
 
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisDeep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisCitus Data
 
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Citus Data
 
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncCitus Data
 
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Citus Data
 
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineCitus Data
 
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Citus Data
 
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberWhen it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberCitus Data
 
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineCitus Data
 
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Citus Data
 
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineCitus Data
 
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberWhen it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberCitus Data
 
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoWhy PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoCitus Data
 

Plus de Citus Data (20)

Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
 
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
 
When it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberWhen it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will Leinweber
 
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncAmazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
 
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
 
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisDeep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
 
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
 
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
 
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
 
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
 
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
 
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberWhen it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
 
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
 
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
 
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
 
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberWhen it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
 
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoWhy PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
 

cstore_fdw: Columnar Storage for PostgreSQL

  • 1. cstore_fdw – Columnar store for analytic workloads Hadi Moshayedi & Ben Redman
  • 2.
  • 3. What is CitusDB? • CitusDB is a scalable analytics database that extends PostgreSQL – Citus shards your data and automatically parallelizes your queries – Citus isn’t a fork of Postgres. Rather, it hooks onto the planner and executor for distributed query execution. – Always rebased to newest Postgres version – Natively supports new data types and extensions
  • 4. A C D worker node #1 (extended PostgreSQL) C worker node #2 (extended PostgreSQL) A worker node #3 (extended PostgreSQL) 1 shard = 1 Postgres table . . . . master node (extended PostgreSQL) shard and shard placement metadata
  • 5. Talk Overview 1. Why customers want columnar stores 2. Live demo 3. Optimized Row Columnar (ORC) format 4. PostgreSQL benefits 5. Benchmark numbers
  • 6. Id Sz Ln Ht … … … … … … … … … … … 1 4 3 4 … … … … … … … … … … … 2 4 11 3 … … … … … … … … … … … 3 1 4 2 … … … … … … … … … … … 4 8 4 12 … … … … … … … … … … … … 4 … … … … … … … … … … … … … … … 4 … … … … … … … … … … … … … … … 4 … … … … … … … … … … … … … … … 30M rows 700 columns
  • 7. Example SQL query SELECT id, AVG(price), MAX(price) FROM items WHERE quantity > 100 AND last_stock_date < ‘2013-10-01’ GROUP BY weight;
  • 8. Row-oriented store Id … price … … quant … … last_stm … … … … … weight 1 … 3.90 … … 31 … … 2013-… … … … … … 0.6 2 … 13 … … 70 … … 2010-… … … … … … 0.8 3 … 4.25 … … 432 … … 2013-… … … … … … 1 4 … 4 … … 45 … … 2013-… … … … … … 6 … 4… … 95 … … 37 … … 2013-… … … … … … 0.6 4… … 59 … … 90 … … 2012-… … … … … … 1.5
  • 9. Row-oriented store Id … price … … quant … … last_stm … … … … … weight 1 … 3.90 … … 31 … … 2013-… … … … … … 0.6 2 … 13 … … 70 … … 2010-… … … … … … 0.8 3 … 4.25 … … 432 … … 2013-… … … … … … 1 4 … 4 … … 45 … … 2013-… … … … … … 6 … 4… … 95 … … 37 … … 2013-… … … … … … 0.6 4… … 59 … … 90 … … 2012-… … … … … … 1.5
  • 10. Row-oriented store Id … price … … quant … … last_stm … … … … … weight 1 … 3.90 … … 31 … … 2013-… … … … … … 0.6 2 … 13 … … 70 … … 2010-… … … … … … 0.8 3 … 4.25 … … 432 … … 2013-… … … … … … 1 4 … 4 … … 45 … … 2013-… … … … … … 6 … 4… … 95 … … 37 … … 2013-… … … … … … 0.6 4… … 59 … … 90 … … 2012-… … … … … … 1.5
  • 11. Row-oriented store Id … price … … quant … … last_stm … … … … … weight 1 … 3.90 … … 31 … … 2013-… … … … … … 0.6 2 … 13 … … 70 … … 2010-… … … … … … 0.8 3 … 4.25 … … 432 … … 2013-… … … … … … 1 4 … 4 … … 45 … … 2013-… … … … … … 6 … 4… … 95 … … 37 … … 2013-… … … … … … 0.6 4… … 59 … … 90 … … 2012-… … … … … … 1.5
  • 12. Cost of row storage • Read 700 columns instead of 5 • >39 GB of unnecessary I/O Input Type Estimated Input Rate Cost to query performance Memory 10 GB/s 3.9 seconds SSD 600 MB/s >60 seconds
  • 13. Example SQL query SELECT id, AVG(price), MAX(price) FROM items WHERE quantity > 100 AND last_stock_date < ‘2013-10-01’ GROUP BY weight;
  • 14. Column-oriented store Id sz price … … quant … … last_stm … … … … … weight 1 4 3.90 … … 31 … … 2013-… … … … … … 0.6 2 3 13 … … 70 … … 2010-… … … … … … 0.8 3 2 4.25 … … 432 … … 2013-… … … … … … 1 4 4 4 … … 45 … … 2013-… … … … … … 6 … 4… 19 95 … … 37 … … 2013-… … … … … … 0.6 4… 2 59 … … 90 … … 2012-… … … … … … 1.5
  • 15. Column-oriented store Id sz price … … quant … … last_stm … … … … … weight 1 4 3.90 … … 31 … … 2013-… … … … … … 0.6 2 3 13 … … 70 … … 2010-… … … … … … 0.8 3 2 4.25 … … 432 … … 2013-… … … … … … 1 4 4 4 … … 45 … … 2013-… … … … … … 6 … 4… 19 95 … … 37 … … 2013-… … … … … … 0.6 4… 2 59 … … 90 … … 2012-… … … … … … 1.5
  • 16. Column-oriented store Id sz price … … quant … … last_stm … … … … … weight 1 4 3.90 … … 31 … … 2013-… … … … … … 0.6 2 3 13 … … 70 … … 2010-… … … … … … 0.8 3 2 4.25 … … 432 … … 2013-… … … … … … 1 4 4 4 … … 45 … … 2013-… … … … … … 6 … 4… 19 95 … … 37 … … 2013-… … … … … … 0.6 4… 2 59 … … 90 … … 2012-… … … … … … 1.5
  • 17. Columnar Store Motivation • Read subset of columns to reduce I/O • Better compression – Less disk usage – Less disk I/O
  • 18. State of the Columnar Store 1. Fork a popular database, swap in your storage engine, and never look back 2. Develop an open columnar store format for the Hadoop Distributed Filesystem (HDFS) 3. Use PostgreSQL extension machinery for in-memory stores / external databases
  • 19. ORC File Layout benefits 1. Columnar layout – reads columns only related to the query 2. Compression – groups column values (10K) together and compresses them 3. Skip indexes – applies predicate filtering to skip over unrelated values
  • 20. Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 150K rows (configurable) 150K rows (configurable) 10K column values (configurable) per block
  • 21. Compression • Current compression method is PG_LZ from PostgreSQL core • Easy to add new compression methods depending on the CPU / disk trade-off • cstore_fdw enables using different compression methods at the column block level
  • 23. Drawbacks to ORC • Support for limited data types. Each data type further needs to have a separate code path for min/max value collection and constraint exclusion. • Gathering statistics from the data and table JOINs are an afterthought.
  • 24. Recent Benchmark Results • TPC-H is a standard benchmark • Performed in-memory, SSD, and HDD tests on 10 GB of data • Used m2.2xlarge and m3.2xlarge on EC2 • Compared vanilla PostgreSQL, CStore, CStore with compression
  • 25. 10GB of uncached data on m2.2xlarge
  • 26. 10GB of uncached data on m3.2xlarge
  • 27. Total issued disk I/O measures with iotop
  • 28. 10GB of cached data on m2/m3.2xlarge
  • 29. 1.1 Release • CStore is an open source project actively in development: github.com/citusdata/cstore_fdw – Improved statistics gathering – Automatic management of table filenames – Management of table file data
  • 30. Future Work – Improve memory usage – Native Delete / Insert / Update support – Improve read query performance (vectorized execution) – Different compression codecs – Many more; contribute to the discussion on GitHub!
  • 31. Summary • CStore: Open source columnar store fdw for Postgres • Improves query times, reduces disk I/O, and reduces disk utilization • Uses foreign wrapper APIs 1 Supports all PostgreSQL data types 2 Statistics collection for better query plans 3 Load extension. Create Table. Copy
  • 32. cstore_fdw – Columnar Store for Analytic Workloads Hadi Moshayedi – hadi@citusdata.com Ben Redman – ben@citusdata.com

Notes de l'éditeur

  1. Columnar store for PostgreSQL Ozgun .. founder at Citus Data SF and Istanbul <short bio> Hadi did bulk of the work on the columnar store Have about 30 slides and a demo. I’ll put things into context with 2 slides on Citus Technical talk. If you have questions, please feel free to interrupt Speak slowly.
  2. Team trip in Ayvalek
  3. Why did we build cstore_fdw? Context around what we build and why cstore_fdw is very applicable to our users When I say extends, we didn’t take a particular version of Postgres and forked from there. Instead we went from 8.4 to 9.0, etc. We used the existing API and integration points: query planner and executor hooks are an example.
  4. Let’s take an example distributed table, and see how it’s spread across the worker nodes. The yellow boxes here are shards that make up the distributed table. Worker node extensions Master node extensions 1 shard = 1 postgres table = 1 cstore table I/O bottle necks can be even more of an issue because of parallelism
  5. Column “Id” is sequentially laid out on disk. And then we have size sequentially laid out on disk, and so forth.
  6. I just spoke about how we reduce I/O, you also get better compress; why? Now that we’re motivated, let’s do a demo!
  7. Before we started, we wanted to get a picture of the landscape (1/ you could integrate your storage engine back into a popular database) Talk about how Hadoop was working on solving this problem because they have similar needs (read/write bytes), all open source and shared. ORC file format developed by FB and Hortonworks * Pick the best of the latter two approaches
  8. RCFile paper publised in ICDE ’11 – Performance comparisons in the paper. FB and Ohio State First do some horizontal partitioning them do vertical partitioning (use examples) Adopted by Hive and Pig – projects within the Hadoop ecosystem The second generation specification supersedes the first one The specification is open on the web
  9. Reiterate how indexes work Second generation. Developed by Hortonworks and Facebook Talk
  10. ORC columnar file layout Lightweight indexes fit into memory (min/max values for each column) (Stripes allow you to benefit from sequential I/O read benefits – you read in bigger chunks from disk – not so applicable to SSDs) Decompress only related blocks (lower decompression overhead) -> evolutionary approach Block – indexes are per block Block – compression is per block (talk a bit about this in a second) Index data kept in protocol buffer format -> backward compatible
  11. Difference between toast tables and us is that we do block level so hopefully better
  12. What does this mean for you? Lineitem goes from 9.1GB -> 2.4GB 1/ In-memory: Effective memory size increases (If you have 1GB of RAM, your working set of 3-4GB can now fit into RAM) 2/ SSD: SSDs are expensive and you save notably from storage costs. You also read less from disk. Reduce disk bottlenecks. 3/ Rotational: Your disk I/O bound query performance significantly improves. Also, if the user stores PB of data in a distributed cluster, the customer saves from hardware costs.
  13. Cstore can also keep min, max, sum, count, etc.
  14. Limited set of types INT**, BOOL, TEXT**, DECIMAL**, TIMESTAMP Decided to use PostgreSQL’s datum representation for saving the data.
  15. - How to restructure this slide?
  16. FDWs offer a nice API to collect a random sample from the data. Looking to improve cost estimation for cstore_fdw query costs.
  17. TPC-H is an ad-hoc, decision support benchmark. Each table has between 10-20 columns. So not the best benchmark to demonstrate column store performance. Talk about what graphs are going to show m3.2xlarge (2 x 80G SSD, 30G ram, 4x3.25 ECU - 10G tests) m2.2xlarge (1 x 850G HDD, 34.2G ram, 4x3.25 ECU - 10G tests)
  18. Representative queries Q6: 68s -> 25s (Q3: 85s -> 44s) 1/ Reduces disk bottlenecks 2/ Saves disk prices
  19. Q6: 26s -> 14s (Q3: 37s -> 26s) 1/ Reduces SSD storage costs 2/ Query performance starts increasing with CitusDB (use of multiple cores)
  20. * Q6: 9GB -> 1.8GB -> 0.8GB
  21. cstore is slightly faster. cstore with compression is slightly slower due to the compression’s CPU cost. Effective memory size increases 1/ Compression (Instead of fitting 1GB, users can now fit in 2-3GB) DONE? 2/ If queries always selects a subset of the columns, then they occupy the working set 3/ Ideally, skip indexes are always kept in memory (they get referenced on each query)
  22. Bug fixes! Better cost estimates for join operations!
  23. Improves query times, reduces disk I/O, and reduces disk utilization
  24. Improves query times, reduces disk I/O, and reduces disk utilization
  25. Questions?