Soumettre la recherche
Mettre en ligne
VLDB 2009 Tutorial on Column-Stores
•
25 j'aime
•
21,460 vues
Daniel Abadi
Suivre
VLDB 2009 Tutorial on Column-Stores by Daniel Abadi, Peter Boncz, and Stavros Harizopoulos
Lire moins
Lire la suite
Technologie
Signaler
Partager
Signaler
Partager
1 sur 161
Télécharger maintenant
Télécharger pour lire hors ligne
Recommandé
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Cloudera Japan
Apache Impalaパフォーマンスチューニング #dbts2018
Apache Impalaパフォーマンスチューニング #dbts2018
Cloudera Japan
最近のストリーム処理事情振り返り
最近のストリーム処理事情振り返り
Sotaro Kimura
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
NTT DATA Technology & Innovation
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
Jay Patel
大規模データ活用向けストレージレイヤソフトのこれまでとこれから(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019/09/05)
大規模データ活用向けストレージレイヤソフトのこれまでとこれから(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019/09/05)
NTT DATA Technology & Innovation
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分け
Recruit Technologies
MongoDB概要:金融業界でのMongoDB
MongoDB概要:金融業界でのMongoDB
ippei_suzuki
Recommandé
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Cloudera Japan
Apache Impalaパフォーマンスチューニング #dbts2018
Apache Impalaパフォーマンスチューニング #dbts2018
Cloudera Japan
最近のストリーム処理事情振り返り
最近のストリーム処理事情振り返り
Sotaro Kimura
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
NTT DATA Technology & Innovation
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
Jay Patel
大規模データ活用向けストレージレイヤソフトのこれまでとこれから(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019/09/05)
大規模データ活用向けストレージレイヤソフトのこれまでとこれから(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019/09/05)
NTT DATA Technology & Innovation
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分け
Recruit Technologies
MongoDB概要:金融業界でのMongoDB
MongoDB概要:金融業界でのMongoDB
ippei_suzuki
データ分析を支える技術 DWH再入門
データ分析を支える技術 DWH再入門
Satoru Ishikawa
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側
Cloudera Japan
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks
Hadoop and Kerberos
Hadoop and Kerberos
Yuta Imai
データベース12 - トランザクションと同時実行制御
データベース12 - トランザクションと同時実行制御
Kenta Oku
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Hortonworks
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
NAVER D2
Apache Spark on Kubernetes入門(Open Source Conference 2021 Online Hiroshima 発表資料)
Apache Spark on Kubernetes入門(Open Source Conference 2021 Online Hiroshima 発表資料)
NTT DATA Technology & Innovation
Hadoop -NameNode HAの仕組み-
Hadoop -NameNode HAの仕組み-
Yuki Gonda
Comparing Accumulo, Cassandra, and HBase
Comparing Accumulo, Cassandra, and HBase
Accumulo Summit
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
NetflixにおけるPresto/Spark活用事例
NetflixにおけるPresto/Spark活用事例
Amazon Web Services Japan
KafkaとPulsar
KafkaとPulsar
Yahoo!デベロッパーネットワーク
リクルートのビッグデータ活用基盤とデータ活用に向けた取組み
リクルートのビッグデータ活用基盤とデータ活用に向けた取組み
Recruit Technologies
KafkaとAWS Kinesisの比較
KafkaとAWS Kinesisの比較
Yoshiyasu SAEKI
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
Hadoop -ResourceManager HAの仕組み-
Hadoop -ResourceManager HAの仕組み-
Yuki Gonda
Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
NTT DATA Technology & Innovation
AWS運用管理のベストプラクティス hinemosクラウド管理オプションのご紹介
AWS運用管理のベストプラクティス hinemosクラウド管理オプションのご紹介
Hinemos
Database system
Database system
IT Training and Job Placement
Database system
Database system
NYversity
Contenu connexe
Tendances
データ分析を支える技術 DWH再入門
データ分析を支える技術 DWH再入門
Satoru Ishikawa
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側
Cloudera Japan
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks
Hadoop and Kerberos
Hadoop and Kerberos
Yuta Imai
データベース12 - トランザクションと同時実行制御
データベース12 - トランザクションと同時実行制御
Kenta Oku
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Hortonworks
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
NAVER D2
Apache Spark on Kubernetes入門(Open Source Conference 2021 Online Hiroshima 発表資料)
Apache Spark on Kubernetes入門(Open Source Conference 2021 Online Hiroshima 発表資料)
NTT DATA Technology & Innovation
Hadoop -NameNode HAの仕組み-
Hadoop -NameNode HAの仕組み-
Yuki Gonda
Comparing Accumulo, Cassandra, and HBase
Comparing Accumulo, Cassandra, and HBase
Accumulo Summit
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
NetflixにおけるPresto/Spark活用事例
NetflixにおけるPresto/Spark活用事例
Amazon Web Services Japan
KafkaとPulsar
KafkaとPulsar
Yahoo!デベロッパーネットワーク
リクルートのビッグデータ活用基盤とデータ活用に向けた取組み
リクルートのビッグデータ活用基盤とデータ活用に向けた取組み
Recruit Technologies
KafkaとAWS Kinesisの比較
KafkaとAWS Kinesisの比較
Yoshiyasu SAEKI
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
Hadoop -ResourceManager HAの仕組み-
Hadoop -ResourceManager HAの仕組み-
Yuki Gonda
Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
NTT DATA Technology & Innovation
AWS運用管理のベストプラクティス hinemosクラウド管理オプションのご紹介
AWS運用管理のベストプラクティス hinemosクラウド管理オプションのご紹介
Hinemos
Tendances
(20)
データ分析を支える技術 DWH再入門
データ分析を支える技術 DWH再入門
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
Hadoop and Kerberos
Hadoop and Kerberos
データベース12 - トランザクションと同時実行制御
データベース12 - トランザクションと同時実行制御
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
Apache Spark on Kubernetes入門(Open Source Conference 2021 Online Hiroshima 発表資料)
Apache Spark on Kubernetes入門(Open Source Conference 2021 Online Hiroshima 発表資料)
Hadoop -NameNode HAの仕組み-
Hadoop -NameNode HAの仕組み-
Comparing Accumulo, Cassandra, and HBase
Comparing Accumulo, Cassandra, and HBase
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
NetflixにおけるPresto/Spark活用事例
NetflixにおけるPresto/Spark活用事例
KafkaとPulsar
KafkaとPulsar
リクルートのビッグデータ活用基盤とデータ活用に向けた取組み
リクルートのビッグデータ活用基盤とデータ活用に向けた取組み
KafkaとAWS Kinesisの比較
KafkaとAWS Kinesisの比較
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
Hadoop -ResourceManager HAの仕組み-
Hadoop -ResourceManager HAの仕組み-
Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
Apache Hadoopに見るJavaミドルウェアのcompatibility(Open Developers Conference 2020 Onli...
AWS運用管理のベストプラクティス hinemosクラウド管理オプションのご紹介
AWS運用管理のベストプラクティス hinemosクラウド管理オプションのご紹介
Similaire à VLDB 2009 Tutorial on Column-Stores
Database system
Database system
IT Training and Job Placement
Database system
Database system
NYversity
Best Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar Databases
DATAVERSITY
Lecture3.ppt
Lecture3.ppt
ShaimaaMohamedGalal
HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User Group
Cloudera, Inc.
Lesson09
Lesson09
renguzi
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
In-Memory Computing Summit
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
Amazon Web Services
Patterns of Data Distribution
Patterns of Data Distribution
Rick Warren
Stackops - Openstack Nova sizing & service definition
Stackops - Openstack Nova sizing & service definition
Stackops
Lap Around Sql Azure
Lap Around Sql Azure
Anko Duizer
Building services using windows azure
Building services using windows azure
Suliman AlBattat
Consistent High IO Performance with Amazon Elastic Block Store
Consistent High IO Performance with Amazon Elastic Block Store
Amazon Web Services
Storing and processing data with the wso2 platform
Storing and processing data with the wso2 platform
WSO2
Less07 storage
Less07 storage
Amit Bhalla
Sql rally 2013 columnstore indexes
Sql rally 2013 columnstore indexes
Денис Резник
The power of hadoop in cloud computing
The power of hadoop in cloud computing
Joey Echeverria
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Yahoo Developer Network
Bv app costs
Bv app costs
xi2elic
Enrichment lecture EE Technion (part B) on the subject of VHDL-2008 (April 2012)
Enrichment lecture EE Technion (part B) on the subject of VHDL-2008 (April 2012)
Amos Zaslavsky
Similaire à VLDB 2009 Tutorial on Column-Stores
(20)
Database system
Database system
Database system
Database system
Best Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar Databases
Lecture3.ppt
Lecture3.ppt
HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User Group
Lesson09
Lesson09
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
Patterns of Data Distribution
Patterns of Data Distribution
Stackops - Openstack Nova sizing & service definition
Stackops - Openstack Nova sizing & service definition
Lap Around Sql Azure
Lap Around Sql Azure
Building services using windows azure
Building services using windows azure
Consistent High IO Performance with Amazon Elastic Block Store
Consistent High IO Performance with Amazon Elastic Block Store
Storing and processing data with the wso2 platform
Storing and processing data with the wso2 platform
Less07 storage
Less07 storage
Sql rally 2013 columnstore indexes
Sql rally 2013 columnstore indexes
The power of hadoop in cloud computing
The power of hadoop in cloud computing
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Bv app costs
Bv app costs
Enrichment lecture EE Technion (part B) on the subject of VHDL-2008 (April 2012)
Enrichment lecture EE Technion (part B) on the subject of VHDL-2008 (April 2012)
Plus de Daniel Abadi
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Daniel Abadi
SQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
Daniel Abadi
The Power of Determinism in Database Systems
The Power of Determinism in Database Systems
Daniel Abadi
Beckman abadi-5min-pres
Beckman abadi-5min-pres
Daniel Abadi
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
Daniel Abadi
Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13
Daniel Abadi
Invisible loading
Invisible loading
Daniel Abadi
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012
Daniel Abadi
Hadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and Opportunities
Daniel Abadi
CAP, PACELC, and Determinism
CAP, PACELC, and Determinism
Daniel Abadi
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?
Daniel Abadi
Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010
Daniel Abadi
Daniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 Panel
Daniel Abadi
Plus de Daniel Abadi
(13)
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
SQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
The Power of Determinism in Database Systems
The Power of Determinism in Database Systems
Beckman abadi-5min-pres
Beckman abadi-5min-pres
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13
Invisible loading
Invisible loading
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012
Hadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and Opportunities
CAP, PACELC, and Determinism
CAP, PACELC, and Determinism
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?
Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010
Daniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 Panel
Dernier
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Results
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
shyamraj55
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
gurkirankumar98700
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
2toLead Limited
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Enterprise Knowledge
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
Dernier
(20)
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
VLDB 2009 Tutorial on Column-Stores
1.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) VLDB Column-Oriented 2009 Tutorial Database Systems Part 1: Stavros Harizopoulos (HP Labs) Part 2: Daniel Abadi (Yale) Part 3: Peter Boncz (CWI) VLDB 2009 Tutorial 1 Column-Oriented Database Systems
2.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) What is a column-store? row-store column-store Date Store Product Customer Price Date Store Product Customer Price + easy to add/modify a record + only need to read in relevant data - might read in unnecessary data - tuple writes require multiple accesses => suitable for read-mostly, read-intensive, large data repositories VLDB 2009 Tutorial Column-Oriented Database Systems 2
3.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Are these two fundamentally different? l The only fundamental difference is the storage layout l However: we need to look at the big picture different storage layouts proposed row-stores row-stores++ row-stores++ converge? ‘70s ‘80s ‘90s ‘00s today column-stores new applications new bottleneck in hardware l How did we get here, and where we are heading Part 1 l What are the column-specific optimizations? Part 2 l How do we improve CPU efficiency when operating on Cs Part 3 VLDB 2009 Tutorial Column-Oriented Database Systems 3
4.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Outline l Part 1: Basic concepts — Stavros l Introduction to key features l From DSM to column-stores and performance tradeoffs l Column-store architecture overview l Will rows and columns ever converge? l Part 2: Column-oriented execution — Daniel l Part 3: MonetDB/X100 and CPU efficiency — Peter VLDB 2009 Tutorial Column-Oriented Database Systems 4
5.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Telco Data Warehousing example l Typical DW installation dimension tables account fact table or RAM l Real-world example usage source “One Size Fits All? - Part 2: Benchmarking toll Results” Stonebraker et al. CIDR 2007 star schema QUERY 2 SELECT account.account_number, sum (usage.toll_airtime), sum (usage.toll_price) Column-store Row-store FROM usage, toll, source, account WHERE usage.toll_id = toll.toll_id Query 1 2.06 300 AND usage.source_id = source.source_id Query 2 2.20 300 AND usage.account_id = account.account_id AND toll.type_ind in (‘AE’. ‘AA’) Query 3 0.09 300 AND usage.toll_price > 0 Query 4 5.24 300 AND source.type != ‘CIBER’ AND toll.rating_method = ‘IS’ Query 5 2.88 300 AND usage.invoice_date = 20051013 GROUP BY account.account_number Why? Three main factors (next slides) VLDB 2009 Tutorial Column-Oriented Database Systems 5
6.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Telco example explained (1/3): read efficiency row store column store read pages containing entire rows read only columns needed one row = 212 columns! in this example: 7 columns is this typical? (it depends) caveats: • “select * ” not any faster • clever disk prefetching What about vertical partitioning? • clever tuple reconstruction (it does not work with ad-hoc queries) VLDB 2009 Tutorial Column-Oriented Database Systems 6
7.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Telco example explained (2/3): compression efficiency l Columns compress better than rows l Typical row-store compression ratio 1 : 3 l Column-store 1 : 10 l Why? l Rows contain values from different domains => more entropy, difficult to dense-pack l Columns exhibit significantly less entropy l Examples: Male, Female, Female, Female, Male 1998, 1998, 1999, 1999, 1999, 2000 l Caveat: CPU cost (use lightweight compression) VLDB 2009 Tutorial Column-Oriented Database Systems 7
8.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Telco example explained (3/3): sorting & indexing efficiency l Compression and dense-packing free up space l Use multiple overlapping column collections l Sorted columns compress better l Range queries are faster l Use sparse clustered indexes What about heavily-indexed row-stores? (works well for single column access, cross-column joins become increasingly expensive) VLDB 2009 Tutorial Column-Oriented Database Systems 8
9.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Additional opportunities for column-stores l Block-tuple / vectorized processing l Easier to build block-tuple operators l Amortizes function-call cost, improves CPU cache performance l Easier to apply vectorized primitives l Software-based: bitwise operations l Hardware-based: SIMD Part 3 l Opportunities with compressed columns l Avoid decompression: operate directly on compressed l Delay decompression (and tuple reconstruction) more l Also known as: late materialization in Part 2 l Exploit columnar storage in other DBMS components l Physical design (both static and dynamic) See: Database Cracking, from CWI VLDB 2009 Tutorial Column-Oriented Database Systems 9
10.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Column-Stores vs Row-Stores: How Different are They Effect on C-Store performance Really?” Abadi, Hachem, and Madden. SIGMOD 2008. Average for SSBM queries on C-store original Time (sec) C-store enable late column-oriented enable materialization join algorithm compression & operate on compressed VLDB 2009 Tutorial Column-Oriented Database Systems 10
11.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Summary of column-store key features columnar storage Part 1 header/ID elimination l Storage layout compression Part 2 Part 3 multiple sort orders column operators Part 1 Part 2 avoid decompression l Execution engine Part 2 late materialization vectorized operations Part 3 l Design tools, optimizer VLDB 2009 Tutorial Column-Oriented Database Systems 11
12.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Outline l Part 1: Basic concepts — Stavros l Introduction to key features l From DSM to column-stores and performance tradeoffs l Column-store architecture overview l Will rows and columns ever converge? l Part 2: Column-oriented execution — Daniel l Part 3: MonetDB/X100 and CPU efficiency — Peter VLDB 2009 Tutorial Column-Oriented Database Systems 12
13.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) From DSM to Column-stores TOD: Time Oriented Database – Wiederhold et al. 70s -1985: "A Modular, Self-Describing Clinical Databank System," Computers and Biomedical Research, 1975 More 1970s: Transposed files, Lorie, Batory, Svensson. “An overview of cantor: a new system for data analysis” Karasalo, Svensson, SSDBM 1983 1985: DSM paper “A decomposition storage model” Copeland and Khoshafian. SIGMOD 1985. 1990s: Commercialization through SybaseIQ Late 90s – 2000s: Focus on main-memory performance l DSM “on steroids” [1997 – now] CWI: MonetDB l Hybrid DSM/NSM [2001 – 2004] Wisconsin: PAX, Fractured Mirrors Michigan: Data Morphing CMU: Clotho 2005 – : Re-birth of read-optimized DSM as “column-store” MIT: C-Store CWI: MonetDB/X100 10+ startups VLDB 2009 Tutorial Column-Oriented Database Systems 13
14.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “A decomposition storage The original DSM paper model” Copeland and Khoshafian. SIGMOD 1985. l Proposed as an alternative to NSM l 2 indexes: clustered on ID, non-clustered on value l Speeds up queries projecting few columns l Requires more storage value ID 0100 0962 1000 .. 1 2 3 4 .. VLDB 2009 Tutorial Column-Oriented Database Systems 14
15.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Memory wall and PAX l 90s: Cache-conscious research “Cache Conscious Algorithms for from: Relational Query Processing.” Shatdal, Kant, Naughton. VLDB 1994. “DBMSs on a modern processor: “Database Architecture Optimized for and: Where does time go?” Ailamaki, to: the New Bottleneck: Memory Access.” DeWitt, Hill, Wood. VLDB 1999. Boncz, Manegold, Kersten. VLDB 1999. l PAX: Partition Attributes Across l Retains NSM I/O pattern l Optimizes cache-to-RAM communication “Weaving Relations for Cache Performance.” Ailamaki, DeWitt, Hill, Skounakis, VLDB 2001. VLDB 2009 Tutorial Column-Oriented Database Systems 15
16.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) More hybrid NSM/DSM schemes l Dynamic PAX: Data Morphing “Data morphing: an adaptive, cache-conscious storage technique.” Hankins, Patel, VLDB 2003. l Clotho: custom layout using scatter-gather I/O “Clotho: Decoupling Memory Page Layout from Storage Organization.” Shao, Schindler, Schlosser, Ailamaki, and Ganger. VLDB 2004. l Fractured mirrors l Smart mirroring with both NSM/DSM copies “A Case For Fractured Mirrors.” Ramamurthy, DeWitt, Su, VLDB 2002. VLDB 2009 Tutorial Column-Oriented Database Systems 16
17.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) MonetDB (more in Part 3) l Late 1990s, CWI: Boncz, Manegold, and Kersten l Motivation: l Main-memory l Improve computational efficiency by avoiding expression interpreter l DSM with virtual IDs natural choice l Developed new query execution algebra l Initial contributions: l Pointed out memory-wall in DBMSs l Cache-conscious projections and joins l … VLDB 2009 Tutorial Column-Oriented Database Systems 17
18.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) 2005: the (re)birth of column-stores l New hardware and application realities l Faster CPUs, larger memories, disk bandwidth limit l Multi-terabyte Data Warehouses l New approach: combine several techniques l Read-optimized, fast multi-column access, disk/CPU efficiency, light-weight compression l C-store paper: l First comprehensive design description of a column-store l MonetDB/X100 l “proper” disk-based column store l Explosion of new products VLDB 2009 Tutorial Column-Oriented Database Systems 18
19.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Performance tradeoffs: columns vs. rows DSM traditionally was not favored by technology trends How has this changed? l Optimized DSM in “Fractured Mirrors,” 2002 l “Apples-to-apples” comparison “Performance Tradeoffs in Read- Optimized Databases” Harizopoulos, Liang, Abadi, Madden, VLDB’06 l Follow-up study “Read-Optimized Databases, In- Depth” Holloway, DeWitt, VLDB’08 l Main-memory DSM vs. NSM “DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing” Boncz, Zukowski, Nes, DaMoN’08 l Flash-disks: a come-back for PAX? “Query Processing Techniques “Fast Scans and Joins Using Flash for Solid State Drives” Drives” Shah, Harizopoulos, Tsirogiannis, Harizopoulos, Wiener, Graefe. DaMoN’08 VLDB 2009 Tutorial Shah, Wiener, Graefe, Column-Oriented Database Systems 19 SIGMOD’09
20.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Fractured mirrors: a closer look l Store DSM relations inside a B-tree “A Case For Fractured Mirrors” Ramamurthy, l Leaf nodes contain values DeWitt, Su, VLDB 2002. l Eliminate IDs, amortize header overhead l Custom implementation on Shore sparse Tuple TID Column Header Data B-tree on ID 1 a1 3 2 a2 3 a3 1 a1 2 a2 3 a3 4 a4 5 a5 4 a4 a1 a2 a3 a4 a5 1 4 5 a5 Similar: storage density “Efficient columnar comparable storage in B-trees” Graefe. to column stores Sigmod Record 03/2007. VLDB 2009 Tutorial Column-Oriented Database Systems 20
21.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Fractured mirrors: performance From PAX paper: column? time row regular DSM column? columns projected: 1 2 3 4 5 optimized l Chunk-based tuple merging DSM l Read in segments of M pages l Merge segments in memory l Becomes CPU-bound after 5 pages VLDB 2009 Tutorial Column-Oriented Database Systems 21
22.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Performance Tradeoffs in Read- Column-scanner Optimized Databases” implementation Harizopoulos, Liang, Abadi, Madden, VLDB’06 row scanner column scanner Joe 45 … … Joe 45 … … SELECT name, age WHERE age > 40 apply S predicate(s) S #POS 45 Joe #POS … Direct I/O Sue … prefetch ~100ms apply 1 Joe 45 worth of data predicate #1 S 2 Sue 37 …… … 45 37 … VLDB 2009 Tutorial Column-Oriented Database Systems 22
23.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Scan performance l Large prefetch hides disk seeks in columns l Column-CPU efficiency with lower selectivity l Row-CPU suffers from memory stalls not shown, l Memory stalls disappear in narrow tuples details in the paper l Compression: similar to narrow VLDB 2009 Tutorial Column-Oriented Database Systems 23
24.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Read-Optimized Databases, In- Even more results Depth” Holloway, DeWitt, VLDB’08 35 • Same engine as before narrow & compressed tuple: • Additional findings 30 CPU-bound! C-25% 25 C-10% R-50% 20 Time (s) 15 10 5 wide attributes: same as before 0 1 2 3 4 5 6 7 8 9 10 Columns Returned Non-selective queries, narrow tuples, favor well-compressed rows Materialized views are a win Column-joins are Scan times determine early materialized joins covered in part 2! VLDB 2009 Tutorial Column-Oriented Database Systems 24
25.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Speedup of columns over rows “Performance Tradeoffs in Read- Optimized Databases” Harizopoulos, Liang, Abadi, cycles per disk byte 144 Madden, VLDB’06 72 (cpdb) 36 +++ 18 _ = + ++ 9 8 12 16 20 24 28 32 36 tuple width l Rows favored by narrow tuples and low cpdb l Disk-bound workloads have higher cpdb VLDB 2009 Tutorial Column-Oriented Database Systems 25
26.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Varying prefetch size no competing disk traffic 40 Column 2 time (sec) 30 Column 8 20 Column 16 Column 48 (x 128KB) 10 Row (any prefetch size) 0 4 8 12 16 20 24 28 32 selected bytes per tuple l No prefetching hurts columns in single scans VLDB 2009 Tutorial Column-Oriented Database Systems 26
27.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Varying prefetch size with competing disk traffic 40 Column, 48 40 time (sec) 30 Row, 48 30 20 20 10 10 Column, 8 Row, 8 0 0 4 12 20 28 4 12 20 28 selected bytes per tuple l No prefetching hurts columns in single scans l Under competing traffic, columns outperform rows for any prefetch size VLDB 2009 Tutorial Column-Oriented Database Systems 27
28.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “DSM vs. NSM: CPU performance trade CPU Performance offs in block-oriented query processing” Boncz, Zukowski, Nes, DaMoN’08 l Benefit in on-the-fly conversion between NSM and DSM l DSM: sequential access (block fits in L2), random in L1 l NSM: random access, SIMD for grouped Aggregation VLDB 2009 Tutorial Column-Oriented Database Systems 28
29.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) New storage technology: Flash SSDs l Performance characteristics l very fast random reads, slow random writes l fast sequential reads and writes l Price per bit (capacity follows) l cheaper than RAM, order of magnitude more expensive than Disk l Flash Translation Layer introduces unpredictability l avoid random writes! l Form factors not ideal yet l SSD (Ł small reads still suffer from SATA overhead/OS limitations) l PCI card (Ł high price, limited expandability) l Boost Sequential I/O in a simple package l Flash RAID: very tight bandwidth/cm3 packing (4GB/sec inside the box) l Column Store Updates l useful for delta structures and logs l Random I/O on flash fixes unclustered index access l still suboptimal if I/O block size > record size l therefore column stores profit mush less than horizontal stores l Random I/O useful to exploit secondary, tertiary table orderings l the larger the data, the deeper clustering one can exploit VLDB 2009 Tutorial Column-Oriented Database Systems 29
30.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Even faster column scans on flash SSDs 30K Read IOps, 3K Write Iops l New-generation SSDs 250MB/s Read BW, 200MB/s Write l Very fast random reads, slower random writes l Fast sequential RW, comparable to HDD arrays l No expensive seeks across columns l FlashScan and Flashjoin: PAX on SSDs, inside Postgres “Query Processing Techniques for Solid State Drives” Tsirogiannis, Harizopoulos, Shah, Wiener, Graefe, SIGMOD’09 mini-pages with no qualified attributes are not accessed VLDB 2009 Tutorial Column-Oriented Database Systems 30
31.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Column-scan performance over time regular DSM (2001) column-store (2006) ..to 1.2x slower from 7x slower ..to 2x slower ..to same and 3x faster! optimized DSM (2002) SSD Postgres/PAX (2009) VLDB 2009 Tutorial Column-Oriented Database Systems 31
32.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Outline l Part 1: Basic concepts — Stavros l Introduction to key features l From DSM to column-stores and performance tradeoffs l Column-store architecture overview l Will rows and columns ever converge? l Part 2: Column-oriented execution — Daniel l Part 3: MonetDB/X100 and CPU efficiency — Peter VLDB 2009 Tutorial Column-Oriented Database Systems 32
33.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Architecture of a column-store storage layout l read-optimized: dense-packed, compressed l organize in extends, batch updates l multiple sort orders l sparse indexes engine l block-tuple operators l new access methods system-level l optimized relational operators l system-wide column support l loading / updates l scaling through multiple nodes l transactions / redundancy VLDB 2009 Tutorial Column-Oriented Database Systems 33
34.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “C-Store: A Column-Oriented DBMS.” Stonebraker et al. C-Store VLDB 2005. l Compress columns l No alignment l Big disk blocks l Only materialized views (perhaps many) l Focus on Sorting not indexing l Data ordered on anything, not just time l Automatic physical DBMS design l Optimize for grid computing l Innovative redundancy l Xacts – but no need for Mohan l Column optimizer and executor VLDB 2009 Tutorial Column-Oriented Database Systems 34
35.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) C-Store: only materialized views (MVs) l Projection (MV) is some number of columns from a fact table l Plus columns in a dimension table – with a 1-n join between Fact and Dimension table l Stored in order of a storage key(s) l Several may be stored! l With a permutation, if necessary, to map between them l Table (as the user specified it and sees it) is not stored! l No secondary indexes (they are a one column sorted MV plus a permutation, if you really want one) User view: Possible set of MVs: EMP (name, age, salary, dept) MV-1 (name, dept, floor) in floor order Dept (dname, floor) MV-2 (salary, age) in age order MV-3 (dname, salary, name) in salary order VLDB 2009 Tutorial Column-Oriented Database Systems 35
36.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Continuous Load and Query (Vertica) Hybrid Storage Architecture > Write Optimized > Read Optimized Store (WOS) Store (ROS) Trickle • On disk Load • Sorted / Compressed TUPLE MOVER Asynchronous • Segmented A B C Data Transfer • Large data loaded direct §Memory based A B C §Unsorted / Uncompressed §Segmented §Low latency / Small quick (A B C | A) inserts VLDB 2009 Tutorial Column-Oriented Database Systems 36
37.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Loading Data (Vertica) > INSERT, UPDATE, DELETE Write-Optimized Store (WOS) > Bulk and Trickle Loads In-memory §COPY Automatic §COPY DIRECT Tuple Mover > User loads data into logical Tables > Vertica loads atomically into storage Read-Optimized Store (ROS) On-disk VLDB 2009 Tutorial Column-Oriented Database Systems 37
38.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Applications for column-stores l Data Warehousing l High end (clustering) l Mid end/Mass Market l Personal Analytics l Data Mining l E.g. Proximity l Google BigTable l RDF l Semantic web data management l Information retrieval l Terabyte TREC l Scientific datasets l SciDB initiative l SLOAN Digital Sky Survey on MonetDB VLDB 2009 Tutorial Column-Oriented Database Systems 38
39.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) List of column-store systems l Cantor (history) l Sybase IQ l SenSage (former Addamark Technologies) l Kdb l 1010data l MonetDB l C-Store/Vertica l X100/VectorWise l KickFire l SAP Business Accelerator l Infobright l ParAccel l Exasol VLDB 2009 Tutorial Column-Oriented Database Systems 39
40.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Outline l Part 1: Basic concepts — Stavros l Introduction to key features l From DSM to column-stores and performance tradeoffs l Column-store architecture overview l Will rows and columns ever converge? l Part 2: Column-oriented execution — Daniel l Part 3: MonetDB/X100 and CPU efficiency — Peter VLDB 2009 Tutorial Column-Oriented Database Systems 40
41.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Simulate a Column-Store inside a Row-Store Date Store Product Customer Price 01/01 BOS Table Mesa $20 01/01 NYC Chair Lutz $13 Option B: Index Every Column 01/01 BOS Bed Mudd $79 Option A: Date Index Vertical Partitioning Date Store Product Customer Price TID Value TID Value TID Value TID Value TID Value 1 01/01 1 BOS 1 Table 1 Mesa 1 $20 Store Index 2 01/01 2 NYC 2 Chair 2 Lutz 2 $13 3 01/01 3 BOS 3 Bed 3 Mudd 3 $79 … VLDB 2009 Tutorial Column-Oriented Database Systems 41
42.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Simulate a Column-Store inside a Row-Store Date Store Product Customer Price 01/01 BOS Table Mesa $20 01/01 NYC Chair Lutz $13 Option B: Index Every Column 01/01 BOS Bed Mudd $79 Option A: Date Index Vertical Partitioning Date Store Product Customer Price Value StartPos Length TID Value TID Value TID Value TID Value 01/01 1 3 1 BOS 1 Table 1 Mesa 1 $20 Store Index 2 NYC 2 Chair 2 Lutz 2 $13 Can explicitly run- 3 BOS 3 Bed 3 Mudd 3 $79 length encode date “Teaching an Old Elephant New Tricks.” Bruno, CIDR 2009. … VLDB 2009 Tutorial Column-Oriented Database Systems 42
43.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Experiments l Star Schema Benchmark (SSBM) Adjoined Dimension Column Index (ADC Index) to Improve Star Schema Query Performance”. O’Neil et. al. ICDE 2008. l Implemented by professional DBA l Original row-store plus 2 column-store simulations on same row-store product 250.0 “Column-Stores vs Row-Stores: 200.0 How Different are They Really?” Abadi, Hachem, and Madden. Time (seconds) 150.0 SIGMOD 2008. 100.0 50.0 0.0 Vertically Partitioned Row-Store With All Normal Row-Store Row-Store Indexes Average 25.7 79.9 221.2 VLDB 2009 Tutorial Column-Oriented Database Systems 43
44.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) What’s Going On? Vertical Partitions l Vertical partitions in row-stores: l Work well when workload is known l ..and queries access disjoint sets of columns l See automated physical design Tuple TID Column Header Data 1 l Do not work well as full-columns 2 l TupleID overhead significant 3 l Excessive joins Queries touch 3-4 foreign keys in fact table, 1-2 numeric columns “Column-Stores vs. Row-Stores: Complete fact table takes up ~4 GB How Different Are They Really?” (compressed) Abadi, Madden, and Hachem. Vertically partitioned tables take up 0.7-1.1 SIGMOD 2008. GB (compressed) VLDB 2009 Tutorial Column-Oriented Database Systems 44
45.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) What’s Going On? All Indexes Case l Tuple construction l Common type of query: SELECT store_name, SUM(revenue) FROM Facts, Stores WHERE fact.store_id = stores.store_id AND stores.country = “Canada” GROUP BY store_name l Result of lower part of query plan is a set of TIDs that passed all predicates l Need to extract SELECT attributes at these TIDs l BUT: index maps value to TID l You really want to map TID to value (i.e., a vertical partition) Tuple construction is SLOW VLDB 2009 Tutorial Column-Oriented Database Systems 45
46.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) So…. l All indexes approach is a poor way to simulate a column-store l Problems with vertical partitioning are NOT fundamental l Store tuple header in a separate partition l Allow virtual TIDs l Combine clustered indexes, vertical partitioning l So can row-stores simulate column-stores? l Might be possible, BUT: l Need better support for vertical partitioning at the storage layer l Need support for column-specific optimizations at the executer level l Full integration: buffer pool, transaction manager, .. l When will this happen? See Part 2, Part 3 l Most promising features = soon for most promising features l ..unless new technology / new objectives change the game (SSDs, Massively Parallel Platforms, Energy-efficiency) VLDB 2009 Tutorial Column-Oriented Database Systems 46
47.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) End of Part 1 l Basic concepts — Stavros l Introduction to key features l From DSM to column-stores and performance tradeoffs l Column-store architecture overview l Will rows and columns ever converge? l Part 2: Column-oriented execution — Daniel l Part 3: MonetDB/X100 and CPU efficiency — Peter VLDB 2009 Tutorial Column-Oriented Database Systems 47
48.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Part 2 Outline l Compression l Tuple Materialization l Joins VLDB 2009 Tutorial Column-Oriented Database Systems 48
49.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) VLDB Column-Oriented 2009 Tutorial Database Systems Compression “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 “Integrating Compression and Execution in Column- Oriented Database Systems” Abadi, Madden, and Ferreira, SIGMOD ’06 •Query optimization in compressed database systems” Chen, Gehrke, Korn, SIGMOD’01
50.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Compression l Trades I/O for CPU l Increased column-store opportunities: l Higher data value locality in column stores l Techniques such as run length encoding far more useful l Can use extra space to store multiple copies of data in different sort orders VLDB 2009 Tutorial Column-Oriented Database Systems 50
51.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Run-length Encoding Quarter Product ID Price Quarter Product ID Price (value, start_pos, run_length) (value, start_pos, run_length) Q1 1 5 (Q1, 1, 300) (1, 1, 5) 5 Q1 1 7 (2, 6, 2) 7 Q1 1 2 (Q2, 301, 350) 2 Q1 1 9 … (Q3, 651, 500) 9 Q1 1 6 (1, 301, 3) 6 Q1 2 8 (Q4, 1151, 600) (2, 304, 1) 8 Q1 2 5 … … … … 5 … Q2 1 3 3 Q2 1 8 8 Q2 1 1 1 Q2 2 4 … … … 4 … VLDB 2009 Tutorial Column-Oriented Database Systems 51
52.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Integrating Compression and Execution in Column- Oriented Database Systems” Abadi et. al, SIGMOD ’06 Bit-vector Encoding l For each unique Product ID ID: 1 ID: 2 ID: 3 … value, v, in column c, create bit-vector 1 1 0 0 0 b 1 1 0 0 0 l b[i] = 1 if c[i] = v 1 1 0 0 0 1 1 0 0 0 l Good for columns 1 1 0 0 0 with few unique 2 0 1 0 0 values 2 0 1 0 0 l Each bit-vector … … … … … can be further 1 1 0 0 0 compressed if 1 1 0 0 0 sparse 2 0 1 0 0 3 0 0 1 0 … … … … … VLDB 2009 Tutorial Column-Oriented Database Systems 52
53.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Integrating Compression and Execution in Column- Oriented Database Systems” Abadi et. al, SIGMOD ’06 Dictionary Encoding Quarter l For each Quarter unique value 0 Quarter 1 create Q1 3 dictionary entry 0 24 Q2 2 l Dictionary can 0 128 Q4 0 be per-block or 0 122 Q1 1 per-column 3 l Column-stores Q3 Q1 2 2 OR + have the Q1 + Dictionary advantage that Dictionary dictionary Q1 24: Q1, Q2, Q4, Q1 entries may Q2 0: Q1 … encode multiple Q4 1: Q2 122: Q2, Q4, Q3, Q3 values at once Q3 2: Q3 … Q3 3: Q4 128: Q3, Q1, Q1, Q1 … VLDB 2009 Tutorial Column-Oriented Database Systems 53
54.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Frame Of Reference Encoding Price Price l Encodes values as b bit Frame: 50 offset from chosen frame 45 -5 54 of reference 4 48 l Special escape code (e.g. -2 55 4 bits per all bits set to 1) indicates 5 value 51 1 a difference larger than 53 3 can be stored in b bits 40 ∞ l After escape code, 50 40 original (uncompressed) 49 Exceptions (see 0 part 3 for a better value is written 62 way to deal with -1 exceptions) 52 ∞ “Compressing Relations and Indexes 50 … 62 ” Goldstein, Ramakrishnan, Shaft, 2 ICDE’98 0 … VLDB 2009 Tutorial Column-Oriented Database Systems 54
55.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Differential Encoding Time Time l Encodes values as b bit offset from previous value l Special escape code (just like 5:00 5:00 frame of reference encoding) 5:02 2 indicates a difference larger than can be stored in b bits 5:03 1 2 bits per l After escape code, original 5:03 0 value (uncompressed) value is written 5:04 1 l Performs well on columns containing increasing/decreasing 5:06 2 sequences 5:07 1 l inverted lists 5:08 1 l timestamps l object IDs 5:10 2 Exception (see l sorted / clustered columns 5:15 ∞ part 3 for a better way to deal with 5:16 5:15 exceptions) “Improved Word-Aligned Binary 5:16 1 Compression for Text Indexing” … 0 Ahn, Moffat, TKDE’06 VLDB 2009 Tutorial Column-Oriented Database Systems 55
56.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) What Compression Scheme To Use? Does column appear in the sort key? yes no Is the average Are number of unique run-length > 2 values < ~50000 yes no yes Differential RLE Encoding no Does this column appear frequently in selection predicates? yes no Is the data numerical and exhibit good locality? yes no Bit-vector Dictionary Compression Compression Frame of Reference Leave Data Encoding Uncompressed OR Heavyweight Compression VLDB 2009 Tutorial Column-Oriented Database Systems 56
57.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Super-Scalar RAM-CPU Cache Compression” Zukowski, Heman, Nes, Boncz, ICDE’06 Heavy-Weight Compression Schemes l Modern disk arrays can achieve > 1GB/s l 1/3 CPU for decompression Ł 3GB/s needed Ł Lightweight compression schemes are better Ł Even better: operate directly on compressed data VLDB 2009 Tutorial Column-Oriented Database Systems 57
58.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Integrating Compression and Execution in Column- Oriented Database Systems” Abadi et. al, SIGMOD ’06 Operating Directly on Compressed Data l I/O - CPU tradeoff is no longer a tradeoff l Reduces memory–CPU bandwidth requirements l Opens up possibility of operating on multiple records at once VLDB 2009 Tutorial Column-Oriented Database Systems 58
59.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Integrating Compression and Execution in Column- Oriented Database Systems” Abadi et. al, SIGMOD ’06 Operating Directly on Compressed Data Quarter Product ID 1 2 3 … … … ProductID, COUNT(*)) (Q1, 1, 300) 1 0 0 301-306 0 0 1 (1, 3) (Q2, 301, 6) 0 1 0 (2, 1) (Q3, 307, 500) 1 0 0 1 0 0 (3, 2) (Q4, 807, 600) 0 0 1 0 1 0 SELECT ProductID, Count(*) 1 0 0 FROM table 0 0 1 WHERE (Quarter = Q2) 0 1 0 GROUP BY ProductID Index Lookup + Offset jump 0 0 1 … … … VLDB 2009 Tutorial Column-Oriented Database Systems 59
60.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Integrating Compression and Execution in Column- Oriented Database Systems” Abadi et. al, SIGMOD ’06 Operating Directly on Compressed Data SELECT ProductID, Count(*) Block API FROM table WHERE (Quarter = Q2) Data GROUP BY ProductID Aggregation isOneValue() Operator isValueSorted() isPosContiguous() isSparse() Selection getNext() Operator decompressIntoArray() (Q1, 1, 300) getValueAtPosition(pos) (Q2, 301, 6) getMin() Compression- getMax() (Q3, 307, 500) Aware Scan getSize() (Q4, 807, 600) Operator VLDB 2009 Tutorial Column-Oriented Database Systems 60
61.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) VLDB Column-Oriented 2009 Tutorial Database Systems Tuple Materialization and Column-Oriented Join Algorithms “Materialization Strategies in a Column- “Query Processing Techniques for Oriented DBMS” Abadi, Myers, DeWitt, Solid State Drives” Tsirogiannis, and Madden. ICDE 2007. Harizopoulos Shah, Wiener, and Graefe. SIGMOD 2009. “Self-organizing tuple reconstruction in column-stores“, Idreos, Manegold, “Cache-Conscious Radix-Decluster Kersten, SIGMOD’09 Projections”, Manegold, Boncz, Nes, VLDB’04 “Column-Stores vs Row-Stores: How Different are They Really?” Abadi, Madden, and Hachem. SIGMOD 2008.
62.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) When should columns be projected? l Where should column projection operators be placed in a query plan? l Row-store: l Column projection involves removing unneeded columns from tuples l Generally done as early as possible l Column-store: l Operation is almost completely opposite from a row-store l Column projection involves reading needed columns from storage and extracting values for a listed set of tuples § This process is called “materialization” l Early materialization: project columns at beginning of query plan § Straightforward since there is a one-to-one mapping across columns l Late materialization: wait as long as possible for projecting columns § More complicated since selection and join operators on one column obfuscates mapping to other columns from same table l Most column-stores construct tuples and column projection time § Many database interfaces expect output in regular tuples (rows) § Rest of discussion will focus on this case VLDB 2009 Tutorial Column-Oriented Database Systems 62
63.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) When should tuples be constructed? Select + Aggregate QUERY: 4 2 2 7 SELECT custID,SUM(price) FROM table 4 1 3 13 WHERE (prodID = 4) AND 4 3 3 42 (storeID = 1) AND GROUP BY custID 4 1 3 80 Construct l Solution 1: Create rows first (EM). But: l Need to construct ALL tuples (4,1,4) 2 2 7 1 3 13 l Need to decompress data 3 3 42 l Poor memory bandwidth 1 3 80 utilization prodID storeID custID price VLDB 2009 Tutorial Column-Oriented Database Systems 63
64.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Solution 2: Operate on columns QUERY: 1 0 SELECT custID,SUM(price) 1 1 AGG FROM table 1 0 WHERE (prodID = 4) AND 1 1 (storeID = 1) AND Data Data GROUP BY custID Source Source custID price Data Data Source Source AND 4 2 2 7 4 2 4 1 3 13 Data Data 4 1 4 3 3 42 Source Source 4 3 4 1 3 80 prodID storeID 4 1 prodID storeID custID price prodID storeID VLDB 2009 Tutorial Column-Oriented Database Systems 64
65.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Solution 2: Operate on columns QUERY: 0 SELECT custID,SUM(price) AGG FROM table 1 WHERE (prodID = 4) AND 0 (storeID = 1) AND Data Data 1 GROUP BY custID Source Source custID price AND AND 4 2 2 7 1 0 4 1 3 13 Data Data 1 1 4 3 3 42 Source Source 1 0 4 1 3 80 prodID storeID 1 1 prodID storeID custID price VLDB 2009 Tutorial Column-Oriented Database Systems 65
66.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Solution 2: Operate on columns QUERY: 3 13 1 SELECT custID,SUM(price) AGG 3 80 1 FROM table WHERE (prodID = 4) AND (storeID = 1) AND 2 0 7 0 GROUP BY custID Data Data Source Source 3 1 Data Data 13 1 custID price Source Source 3 0 42 0 3 1 80 1 AND custID price 4 2 2 7 0 4 1 3 13 Data Data 1 4 3 3 42 Source Source 0 4 1 3 80 prodID storeID 1 prodID storeID custID price VLDB 2009 Tutorial Column-Oriented Database Systems 66
67.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) Solution 2: Operate on columns QUERY: SELECT custID,SUM(price) AGG FROM table WHERE (prodID = 4) AND 3 1 1 93 (storeID = 1) AND Data Data GROUP BY custID Source Source custID price AGG AND 4 2 2 7 4 1 3 13 Data Data 3 1 13 1 4 3 3 42 Source Source 3 1 80 1 4 1 3 80 prodID storeID prodID storeID custID price VLDB 2009 Tutorial Column-Oriented Database Systems 67
68.
Re-use permitted when
acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009) “Materialization Strategies in a Column-Oriented DBMS” Abadi, Myers, DeWitt, and Madden. ICDE 2007. For plans without joins, late materialization is a win 10 QUERY: 9 8 SELECT C1, SUM(C2) Time (seconds) 7 FROM table 6 WHERE (C1 < CONST) AND 5 4 (C2 < CONST) 3 GROUP BY C1 2 1 l Ran on 2 compressed 0 Low selectivity Medium High selectivity columns from TPC-H selectivity scale 10 data Early Materialization Late Materialization VLDB 2009 Tutorial Column-Oriented Database Systems 68
Télécharger maintenant