SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Optimizing Columnar Stores
StreamBright Data
2016 05 19
Introduction
Istvan Szukacs
CTO - StreamBright Data
Working with (big) data since 2009.
Building & optimizing data pipelines for
companies like:
Amazon, Riot Games, Symantec
StreamBright Data
“On Demand DevOps and Data Expertise”
Founded in 2015, serving US and Western
European clients
Building “Decision Pipelines” - end-to-end,
scalable solutions to get business insights
from data
Development base in Budapest - looking for
big data and data science competency
Row vs Column Oriented Data Stores
Row vs Column Oriented Data Stores
Row Oriented
PROS:
- easy to add/modify a record
- suitable for write heavy load (UPDATE, INSERT)
CONS:
- might read in unnecessary data
Column Oriented
PROS:
- only need to read in relevant data
- suitable for read-heavy analytical load (SELECT)
CONS:
- row writes require multiple accesses
Short History Of Columnar Stores
● "A Modular, Self-Describing Clinical Databank
System," Computers and Biomedical Research,
1975
● “An Overview Of Cantor: A New System For Data
Analysis” Karasalo, Svensson, SSDBM 1983
● “The Design of Cantor: A New System For Data
Analysis” Karasalo, Svensson, SSDBM 1986
Short History Of Columnar Stores
Short History Of Columnar Stores
Fully transposed file reference:
On searching transposed files, Don Steve Batory, Univ. of
Toronto, Toronto, Ont., Canada, 1979
“A transposed file is a collection of nonsequential files called
subfiles. Each subfile contains selected attribute data for all
records. It is shown that transposed file performance can be
enhanced by using a proper strategy to process queries. Analytic
cost expressions for processing conjunctive, disjunctive, and
batched queries are developed and an effective heuristic for
minimizing query processing costs is presented.”
Notable Features For Columnar Stores
● Data Encoding
● Efficient Compression
● Lazy Decompression
Notable Features For Columnar Stores
● Data Encoding
● Efficient Compression
● Lazy Decompression
Data Encoding
From the smallest to the largest data types:
- Boolean (1 bit)
- Integer (1-8 bytes)
- Float (4-8 bytes)
- Datetime (3-8 bytes)
- String, UTF-8 (1 and 4 bytes per character, 64 chars -> 64-512 bytes)
- Complex Structures (depends)
Data Encoding
How can we save space?
- Let’s address the widest columns, strings
- Assigning an integer to each distinct value could save us few bytes
every row
- Real world example: storing SHA2 hashes
- 64 - 512 bytes -> 1-8 bytes / row
- Storing the dictionary + data << unchanged data
- This is called dictionary encoding
Run Length Encoding
- If there is repetition in any sort of data, store the value and the number
of times it is repeated
- A,A,A,A,A -> A,4
- This works on sorted data the best
- Sometimes multiple columns can be sorted in the same data block
Run Length Encoding
RLE example:
A B C A B C
-------- ----------------------
a 3 e => (4, a) (2, 3) (2, e)
a 3 e (4, b) (3, 2) (2, g)
a 2 g (3, 1) (4, f)
a 2 g
b 2 f
b 1 f
b 1 f
b 1 f
Notable Features For Columnar Stores
● Data Encoding
● Compression
● Lazy Decompression
Compression
- Compression is applied on the top of encodings
- There are tradeoffs between encryption time and space
- Widely used compressions:
- Snappy (fast, smaller space saving)
- Zlib (slower, better space efficiency)
Notable Features For Columnar Stores
● Data Encoding
● Compression
● Lazy Decompression
Lazy Decompression
- Lazy decompression is the notion of decompressing values at
the reader
- It saves bandwidth and speeds up queries
- Dictionary has to be sent to the reader
Hadoop/Hive Columnar Stores
RCFILE -- (Note: Available in Hive 0.6.0 and later)
ORC -- (Note: Available in Hive 0.11.0 and later)
PARQUET -- (Note: Available in Hive 0.13.0 and later)
ORC 101
ORC 101
- Data is stored in stripes within a file
- Each stripe has its own index
- Index has basic statistics (min, max)
- ORC:
- Supports predicate pushdown
- Bloom filters
- Lazy decompression
- Snappy and Zlib as compression
- Bucketing (requires sorting)
Optimizing A Petabyte Scale DWH
- We know many of the moving parts, let’s check a real world use case
- One client asked for performance improvements
- 1TB/day, 83 columns in the table, 1PB full size, Snappy compressed
- Cannot change something that would break the application using the table
- No explicit sorting anywhere
- Few extremely wide columns, high repetition
Optimizing A Petabyte Scale DWH
Our assumption:
The problem is IO bound, hence decreasing the size
on disk will result in better performance.
How can assume this?
Optimizing A Petabyte Scale DWH
Optimizing A Petabyte Scale DWH
- I spare you from the iteration steps we took
- Ended up with the following changes:
- Explicit sorting for the widest column
- Explicit sorting for other wide columns
- Snappy -> Zlib
- Bucketing 20 -> 256
- Stripe size from 64M -> 128M
Optimizing A Petabyte Scale DWH
CLUSTERED BY (
some_sha2)
SORTED BY (
some_sha2, some_other_sha2, some_md5)
INTO 256 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://cluter/orc_test'
TBLPROPERTIES (
'orc.compress'='ZLIB', 'orc.create.index'='true', 'orc.stripe.
size'='130023424',
'orc.row.index.stride'='64000', 'orc.create.index'='true'
)
Optimizing A Petabyte Scale DWH
Baseline:
Time taken: 84.259 seconds, Size:860.1 G
Improved:
Time taken: 27.697 seconds, Size: 205.7 G
Optimizing A Petabyte Scale DWH
Key changes & findings:
- Introduced explicit sorting, saving huge amount of space
- Traded insertion speed for better compression, saving some
space (this is a good trade off)
- Saving space almost linearly corresponds with query execution
speedups
- Disk IO is still the biggest bottleneck for large scale DWHs
- Default settings are not good enough for petabyte scale
- Knowing the details of your columnar store helps what to change
- You can change things around without breaking anything
Q & A
Q & A
Optimizing columnar stores

Contenu connexe

Tendances

Percona live linux filesystems and my sql
Percona live   linux filesystems and my sqlPercona live   linux filesystems and my sql
Percona live linux filesystems and my sql
Michael Zhang
 
Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控
Kaiyao Huang
 
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
Ontico
 

Tendances (19)

Redis深入浅出
Redis深入浅出Redis深入浅出
Redis深入浅出
 
Hadoop at datasift
Hadoop at datasiftHadoop at datasift
Hadoop at datasift
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Percona live linux filesystems and my sql
Percona live   linux filesystems and my sqlPercona live   linux filesystems and my sql
Percona live linux filesystems and my sql
 
Write behind logging
Write behind loggingWrite behind logging
Write behind logging
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)
 
Performance evaluation of apache tajo
Performance evaluation of apache tajoPerformance evaluation of apache tajo
Performance evaluation of apache tajo
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
 
MonetDB :column-store approach in database
MonetDB :column-store approach in databaseMonetDB :column-store approach in database
MonetDB :column-store approach in database
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Tachyon meetup slides.
Tachyon meetup slides.Tachyon meetup slides.
Tachyon meetup slides.
 
Ceph - High Performance Without High Costs
Ceph - High Performance Without High CostsCeph - High Performance Without High Costs
Ceph - High Performance Without High Costs
 
Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控Exadata下的数据并行加载、并行卸载及性能监控
Exadata下的数据并行加载、并行卸载及性能监控
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
2014-11 ApacheConEU : Lizard - Clustering an RDF TripleStore
2014-11 ApacheConEU : Lizard - Clustering an RDF TripleStore2014-11 ApacheConEU : Lizard - Clustering an RDF TripleStore
2014-11 ApacheConEU : Lizard - Clustering an RDF TripleStore
 
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoop
 

En vedette

En vedette (9)

Week 9 research_design
Week 9 research_designWeek 9 research_design
Week 9 research_design
 
Ginintuang parihaba at maladiyosang proporsyon-M-Macutay
Ginintuang parihaba at maladiyosang proporsyon-M-MacutayGinintuang parihaba at maladiyosang proporsyon-M-Macutay
Ginintuang parihaba at maladiyosang proporsyon-M-Macutay
 
Technology
TechnologyTechnology
Technology
 
RDET Format (Research Study)
RDET Format (Research Study)RDET Format (Research Study)
RDET Format (Research Study)
 
Rough cut amendments
Rough cut amendmentsRough cut amendments
Rough cut amendments
 
Evaluation Question 6.
Evaluation Question 6. Evaluation Question 6.
Evaluation Question 6.
 
Week 5 scale_and_measurement
Week 5 scale_and_measurementWeek 5 scale_and_measurement
Week 5 scale_and_measurement
 
Evaluation question 4 how did you use
Evaluation question 4   how did you useEvaluation question 4   how did you use
Evaluation question 4 how did you use
 
มโนทัศน์ของการคำนวณปรับแก้
มโนทัศน์ของการคำนวณปรับแก้มโนทัศน์ของการคำนวณปรับแก้
มโนทัศน์ของการคำนวณปรับแก้
 

Similaire à Optimizing columnar stores

Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
Data Con LA
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
MongoDB
 

Similaire à Optimizing columnar stores (20)

Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
 
Using ТРСС to study Firebird performance
Using ТРСС to study Firebird performanceUsing ТРСС to study Firebird performance
Using ТРСС to study Firebird performance
 
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - Advanced
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data Analysis
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data Analysis
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Optimizing columnar stores

  • 2. Introduction Istvan Szukacs CTO - StreamBright Data Working with (big) data since 2009. Building & optimizing data pipelines for companies like: Amazon, Riot Games, Symantec StreamBright Data “On Demand DevOps and Data Expertise” Founded in 2015, serving US and Western European clients Building “Decision Pipelines” - end-to-end, scalable solutions to get business insights from data Development base in Budapest - looking for big data and data science competency
  • 3. Row vs Column Oriented Data Stores
  • 4. Row vs Column Oriented Data Stores Row Oriented PROS: - easy to add/modify a record - suitable for write heavy load (UPDATE, INSERT) CONS: - might read in unnecessary data Column Oriented PROS: - only need to read in relevant data - suitable for read-heavy analytical load (SELECT) CONS: - row writes require multiple accesses
  • 5. Short History Of Columnar Stores ● "A Modular, Self-Describing Clinical Databank System," Computers and Biomedical Research, 1975 ● “An Overview Of Cantor: A New System For Data Analysis” Karasalo, Svensson, SSDBM 1983 ● “The Design of Cantor: A New System For Data Analysis” Karasalo, Svensson, SSDBM 1986
  • 6. Short History Of Columnar Stores
  • 7. Short History Of Columnar Stores Fully transposed file reference: On searching transposed files, Don Steve Batory, Univ. of Toronto, Toronto, Ont., Canada, 1979 “A transposed file is a collection of nonsequential files called subfiles. Each subfile contains selected attribute data for all records. It is shown that transposed file performance can be enhanced by using a proper strategy to process queries. Analytic cost expressions for processing conjunctive, disjunctive, and batched queries are developed and an effective heuristic for minimizing query processing costs is presented.”
  • 8. Notable Features For Columnar Stores ● Data Encoding ● Efficient Compression ● Lazy Decompression
  • 9. Notable Features For Columnar Stores ● Data Encoding ● Efficient Compression ● Lazy Decompression
  • 10. Data Encoding From the smallest to the largest data types: - Boolean (1 bit) - Integer (1-8 bytes) - Float (4-8 bytes) - Datetime (3-8 bytes) - String, UTF-8 (1 and 4 bytes per character, 64 chars -> 64-512 bytes) - Complex Structures (depends)
  • 11. Data Encoding How can we save space? - Let’s address the widest columns, strings - Assigning an integer to each distinct value could save us few bytes every row - Real world example: storing SHA2 hashes - 64 - 512 bytes -> 1-8 bytes / row - Storing the dictionary + data << unchanged data - This is called dictionary encoding
  • 12. Run Length Encoding - If there is repetition in any sort of data, store the value and the number of times it is repeated - A,A,A,A,A -> A,4 - This works on sorted data the best - Sometimes multiple columns can be sorted in the same data block
  • 13. Run Length Encoding RLE example: A B C A B C -------- ---------------------- a 3 e => (4, a) (2, 3) (2, e) a 3 e (4, b) (3, 2) (2, g) a 2 g (3, 1) (4, f) a 2 g b 2 f b 1 f b 1 f b 1 f
  • 14. Notable Features For Columnar Stores ● Data Encoding ● Compression ● Lazy Decompression
  • 15. Compression - Compression is applied on the top of encodings - There are tradeoffs between encryption time and space - Widely used compressions: - Snappy (fast, smaller space saving) - Zlib (slower, better space efficiency)
  • 16. Notable Features For Columnar Stores ● Data Encoding ● Compression ● Lazy Decompression
  • 17. Lazy Decompression - Lazy decompression is the notion of decompressing values at the reader - It saves bandwidth and speeds up queries - Dictionary has to be sent to the reader
  • 18. Hadoop/Hive Columnar Stores RCFILE -- (Note: Available in Hive 0.6.0 and later) ORC -- (Note: Available in Hive 0.11.0 and later) PARQUET -- (Note: Available in Hive 0.13.0 and later)
  • 20. ORC 101 - Data is stored in stripes within a file - Each stripe has its own index - Index has basic statistics (min, max) - ORC: - Supports predicate pushdown - Bloom filters - Lazy decompression - Snappy and Zlib as compression - Bucketing (requires sorting)
  • 21. Optimizing A Petabyte Scale DWH - We know many of the moving parts, let’s check a real world use case - One client asked for performance improvements - 1TB/day, 83 columns in the table, 1PB full size, Snappy compressed - Cannot change something that would break the application using the table - No explicit sorting anywhere - Few extremely wide columns, high repetition
  • 22. Optimizing A Petabyte Scale DWH Our assumption: The problem is IO bound, hence decreasing the size on disk will result in better performance. How can assume this?
  • 24. Optimizing A Petabyte Scale DWH - I spare you from the iteration steps we took - Ended up with the following changes: - Explicit sorting for the widest column - Explicit sorting for other wide columns - Snappy -> Zlib - Bucketing 20 -> 256 - Stripe size from 64M -> 128M
  • 25. Optimizing A Petabyte Scale DWH CLUSTERED BY ( some_sha2) SORTED BY ( some_sha2, some_other_sha2, some_md5) INTO 256 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://cluter/orc_test' TBLPROPERTIES ( 'orc.compress'='ZLIB', 'orc.create.index'='true', 'orc.stripe. size'='130023424', 'orc.row.index.stride'='64000', 'orc.create.index'='true' )
  • 26. Optimizing A Petabyte Scale DWH Baseline: Time taken: 84.259 seconds, Size:860.1 G Improved: Time taken: 27.697 seconds, Size: 205.7 G
  • 27. Optimizing A Petabyte Scale DWH Key changes & findings: - Introduced explicit sorting, saving huge amount of space - Traded insertion speed for better compression, saving some space (this is a good trade off) - Saving space almost linearly corresponds with query execution speedups - Disk IO is still the biggest bottleneck for large scale DWHs - Default settings are not good enough for petabyte scale - Knowing the details of your columnar store helps what to change - You can change things around without breaking anything
  • 28. Q & A Q & A