SlideShare une entreprise Scribd logo
1  sur  31
Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: 2015
Gopal Vijayaraghavan
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC – Optimized Row-Columnar File
Columnar Storage+
Row-groups & Fixed splits
Protobuf Metadata Storage+
+
Type-safe Vectorization+
Hive ACID transactions+
Single SerDe for Format+
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Need for Speed: The Stinger Initiative
Stinger: An Open Roadmap to improve Apache Hive’s performance 100x.
Launched: February 2013; Delivered: April 2014.
Delivered in 100% Apache Open Source.
SQL Engine
Vectorized
SQL Engine
Columnar
Storage
ORC
= 100X+ +
Distributed
Execution
Apache Tez
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC at Facebook
Saved more than 1,400
servers worth of storage.
Compressioni
Compression ratio
increased from 5x to 8x
globally.
Compressioni
[1]
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC at Spotify
16x less HDFS read when
using ORC versus Avro.(5)
IOi
32x less CPU when using
ORC versus Avro.(5)
CPUi
[2]
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Today
What is Optimized about ORC?
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC – Optimized Row-Columnar File
Columnar Storage+
Row-groups & Stripe splits
Protobuf Metadata Storage+
+
Type-safe Vectorization+
Hive ACID transactions+
Single SerDe for Format+
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Columnar Storage
Storage Performance
● Compress each column differently
● Detect & compress common sub-sequences
● Auto-increment ids
● String Enums
● Large Integers (uid scale)
● Unique strings (UUIDS)
Read Performance
● Column projection
● Columnar deserializers
● Data locality
Write Throughput
● Stats auto-gather
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Row-groups & Stripe splits
Split Parallelism
● Effective parallelism
● No seeks to find boundaries
● No splits with zero data
● Decompress fixed chunks
Stripes
● Single unsplittable chunk
● Will reside in 1 HDFS block entirely
● Is self-contained for all read ops
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A Single SerDe for all ORC Files
A Single Writer
● No mismatch of serialization
● Forward compatibility
Readers
● Multiple reader implementations
● Allows for vector readers
● And row-mode readers
● Similar loop – good JIT hit-rate
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Protobuf Metadata Storage
Standardized Metadata
● Readers are easier to write
● Metadata readers are auto-generated
Metadata Forward Compatibility
● Protobuf Optional fields
Statistics Storage in Metadata
● Standard serialization for stats
● Allows for PPD into the IO layer
Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Type-safe Vectorization
Schema on Write
● Write ORC Structs with types
● SerDe & Inputformat
Read Performance
● Data is read with few copies
● Primitive types are fast
● Primitives are also unboxed
● Predicates are typed too
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: ETL Improvements
Always more new data
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC (Zlib): Compress Differently
674
389
433
ORC (old zlib) ORC SNAPPY ORC (new zlib)
ETL for TPC-H LineItem (scale 1 Tb)
Time Taken
Different Zlib algorithms for encoding
● Z_FILTERED
● Z_DEFAULT
● Z_BEST_SPEED
● Z_DEFAULT_COMPRESSION
In detail
● Compress IS_NULL bitsets lightly
● Compress Integers differently from Doubles
● Compress string dictionaries differently
● Allow for user choice
Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC (Zlib): Compress Differently
Different Zlib algorithms for encoding
● Z_FILTERED
● Z_DEFAULT
● Z_BEST_SPEED
● Z_DEFAULT_COMPRESSION
In detail
● Compress IS_NULL bitsets lightly
● Compress Integers differently from Doubles
● Compress string dictionaries differently
● Allow for user choice
178.5
225.1
172.2
ORC (old zlib) ORC SNAPPY ORC (new zlib)
Data Sizes for TPC-H Lineitem (Scale 1 Tb)
Size on Disk
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Using JDK8 SIMD: Integer Writers
Integer encodings
● Base + Delta
● Run-length
● Direct
Trade-off for Size/Speed
● Use fixed bit-width loops
● Snap to nearest bit-width
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1 2 4 8 16 24 32 40 48 56 64
MeanTime(ms)
Bit Width
ORC Write Integer Performance
(smaller better)
hive 0.13 bitpacking
hive 1.0 bitpacking (new)
Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Double Writers
273.331
247.634
231.741
0
50
100
150
200
250
300
old buffered + BE buffered + LE
MeanTime(ms)
Double Write Modes
ORC Write Double Performance
(smaller is better)
Double Writers
● JVM is big-endian
● X86 is little-endian
● Special handling of NaN
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Scale compression buffers
269.4
263.3
258.5 258.4 258.4 258.4
184.8 183.5 182.2 180.1 178.3 177.4
140
160
180
200
220
240
260
280
300
320
8 16 32 64 128 256
SizeinMB
Compression Buffer Size in KB
File Size
ZLIB
SNAPPY
Large Columns vs More Columns
● Adjust when >1000 columns
Trade offs
● Compression
● Low memory use
More additions
● Dynamically partitioned insert
Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Streaming Ingest + ACID
Broken pattern: Partitions for Atomicity-
- Isolation & Consistency on retries+
Transactions are pluggable (txn.manager)+
Cache/Replication friendly (base + deltas)+
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: LLAP and Sub-second
ORC – Pushing for Sub-second
Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Row Indexes
Min-Max pruning
● Evaluate on statistics
Bloom filters
● Better String filters
● Filter a random distribution
LLAP Future
● Row-level vector SARGs
5999989709
540,000
10,000
No Indexes Min-Max Indexes Bloomfilter Indexes
from tpch_1000.lineitem where l_orderkey = 1212000001;
(log scale)
Rows Read
Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Row Indexes
Min-Max pruning
● Evaluate on Statistics
Bloom filters
● Better String filters
● Filter a random distribution
LLAP Future
● Row-level vector SARGs
74
4.5 1.34
No Indexes Min-Max Indexes Bloomfilter Indexes
* from tpch_1000.lineitem where l_orderkey=1212000001;
(smaller better)
Time Taken (seconds)
Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: JDK8 SIMD Readers
Integer encodings
● Base + Delta
● Run-length
● Direct
Trade-off for Size/Speed
● Use fixed bit-width loops
● Snap to nearest bit-width
0
200
400
600
800
1000
1200
1400
1600
1800
1 2 4 8 16 24 32 40 48 56 64
MeanTime(ms)
Bit Width
ORC Read Integer Performance
hive 0.13 unpacking
hive-1.0 unpacking (new)
Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Vectorization + SIMD
Advantage of a Single SerDe
● Primitive Types
Allocation free tight inner loops
● JDK8 has auto-vectorization
Vectorized Early Filter
● Vectors can be filtered early in ORC
● StringDictionary can be used to binary-search
Vectorized SIMD Join
● Performance for single key joins
0x00007f13d2e6afb0: vmovdqu 0x10(%rsi,%rax,8),%ymm2
0x00007f13d2e6afb6: vaddpd %ymm1,%ymm2,%ymm2
0x00007f13d2e6afba: movslq %eax,%r10
0x00007f13d2e6afbd: vmovdqu 0x30(%rsi,%r10,8),%ymm3
;*daload vector.expressions.gen.DoubleColAddDoubleColumn::evaluate
(line 94)
0x00007f13d2e6afc4: vmovdqu %ymm2,0x10(%rdx,%rax,8)
0x00007f13d2e6afca: vaddpd %ymm1,%ymm3,%ymm2
0x00007f13d2e6afce: vmovdqu %ymm2,0x30(%rdx,%r10,8)
;*dastore vector.expressions.gen.DoubleColAddDoubleColumn::evaluate
(line 94)
Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Split Strategies + Tez Grouping
Amdahl’s Law
● As fast as the slowest task
● Slice work thinly, but not too thin
Split-generation vs Execution time
● ETL
● BI
● Hybrid
Split-grouping & estimation
● ColumnarSplit size
● Group by estimate, not file size
● Bucket pruning
Slow split
Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: LLAP
- JIT Performance for short queries+
Row-group level caching+
Asynchronous IO Elevator+
+ Multi-threaded Column Vector processing+
Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: LLAP (+ SIMD + Split Strategies + Row Indexes)
Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Questions?
?
Interested? Stop by the Hortonworks booth to learn more
Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Endnotes
(1) https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/
(2) http://www.slideshare.net/AdamKawa/a-perfect-hive-query-for-a-perfect-meeting-hadoop-summit-2014
Page30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
END
Page31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
But wait, there’s more
178.5
225.1
172.2
220.1
674
389
433 446
0
50
100
150
200
250
0
100
200
300
400
500
600
700
800
ORC (old zlib) ORC SNAPPY ORC (new zlib) Parquet (snappy)
TPC-H Lineitem (1000 scale)
Time Taken (seconds) Data Sizes (on disk)

Contenu connexe

Tendances

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 

Tendances (20)

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Hadoop and Kerberos
Hadoop and KerberosHadoop and Kerberos
Hadoop and Kerberos
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
 
Apache Hadoopの新機能Ozoneの現状
Apache Hadoopの新機能Ozoneの現状Apache Hadoopの新機能Ozoneの現状
Apache Hadoopの新機能Ozoneの現状
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 

Similaire à ORC: 2015 Faster, Better, Smaller

Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Community
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
alanfgates
 
Hive for Analytic Workloads
Hive for Analytic WorkloadsHive for Analytic Workloads
Hive for Analytic Workloads
DataWorks Summit
 

Similaire à ORC: 2015 Faster, Better, Smaller (20)

ORC 2015
ORC 2015ORC 2015
ORC 2015
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
Oracle SPARC T7 a M7 servery
Oracle SPARC T7 a M7 serveryOracle SPARC T7 a M7 servery
Oracle SPARC T7 a M7 servery
 
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
 
Hive for Analytic Workloads
Hive for Analytic WorkloadsHive for Analytic Workloads
Hive for Analytic Workloads
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
 

Plus de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 

Dernier (20)

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 

ORC: 2015 Faster, Better, Smaller

  • 1. Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: 2015 Gopal Vijayaraghavan
  • 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC – Optimized Row-Columnar File Columnar Storage+ Row-groups & Fixed splits Protobuf Metadata Storage+ + Type-safe Vectorization+ Hive ACID transactions+ Single SerDe for Format+
  • 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Need for Speed: The Stinger Initiative Stinger: An Open Roadmap to improve Apache Hive’s performance 100x. Launched: February 2013; Delivered: April 2014. Delivered in 100% Apache Open Source. SQL Engine Vectorized SQL Engine Columnar Storage ORC = 100X+ + Distributed Execution Apache Tez
  • 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC at Facebook Saved more than 1,400 servers worth of storage. Compressioni Compression ratio increased from 5x to 8x globally. Compressioni [1]
  • 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC at Spotify 16x less HDFS read when using ORC versus Avro.(5) IOi 32x less CPU when using ORC versus Avro.(5) CPUi [2]
  • 6. Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Today What is Optimized about ORC?
  • 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC – Optimized Row-Columnar File Columnar Storage+ Row-groups & Stripe splits Protobuf Metadata Storage+ + Type-safe Vectorization+ Hive ACID transactions+ Single SerDe for Format+
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Columnar Storage Storage Performance ● Compress each column differently ● Detect & compress common sub-sequences ● Auto-increment ids ● String Enums ● Large Integers (uid scale) ● Unique strings (UUIDS) Read Performance ● Column projection ● Columnar deserializers ● Data locality Write Throughput ● Stats auto-gather
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Row-groups & Stripe splits Split Parallelism ● Effective parallelism ● No seeks to find boundaries ● No splits with zero data ● Decompress fixed chunks Stripes ● Single unsplittable chunk ● Will reside in 1 HDFS block entirely ● Is self-contained for all read ops
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A Single SerDe for all ORC Files A Single Writer ● No mismatch of serialization ● Forward compatibility Readers ● Multiple reader implementations ● Allows for vector readers ● And row-mode readers ● Similar loop – good JIT hit-rate
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Protobuf Metadata Storage Standardized Metadata ● Readers are easier to write ● Metadata readers are auto-generated Metadata Forward Compatibility ● Protobuf Optional fields Statistics Storage in Metadata ● Standard serialization for stats ● Allows for PPD into the IO layer
  • 12. Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Type-safe Vectorization Schema on Write ● Write ORC Structs with types ● SerDe & Inputformat Read Performance ● Data is read with few copies ● Primitive types are fast ● Primitives are also unboxed ● Predicates are typed too
  • 13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: ETL Improvements Always more new data
  • 14. Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC (Zlib): Compress Differently 674 389 433 ORC (old zlib) ORC SNAPPY ORC (new zlib) ETL for TPC-H LineItem (scale 1 Tb) Time Taken Different Zlib algorithms for encoding ● Z_FILTERED ● Z_DEFAULT ● Z_BEST_SPEED ● Z_DEFAULT_COMPRESSION In detail ● Compress IS_NULL bitsets lightly ● Compress Integers differently from Doubles ● Compress string dictionaries differently ● Allow for user choice
  • 15. Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC (Zlib): Compress Differently Different Zlib algorithms for encoding ● Z_FILTERED ● Z_DEFAULT ● Z_BEST_SPEED ● Z_DEFAULT_COMPRESSION In detail ● Compress IS_NULL bitsets lightly ● Compress Integers differently from Doubles ● Compress string dictionaries differently ● Allow for user choice 178.5 225.1 172.2 ORC (old zlib) ORC SNAPPY ORC (new zlib) Data Sizes for TPC-H Lineitem (Scale 1 Tb) Size on Disk
  • 16. Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Using JDK8 SIMD: Integer Writers Integer encodings ● Base + Delta ● Run-length ● Direct Trade-off for Size/Speed ● Use fixed bit-width loops ● Snap to nearest bit-width 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 2 4 8 16 24 32 40 48 56 64 MeanTime(ms) Bit Width ORC Write Integer Performance (smaller better) hive 0.13 bitpacking hive 1.0 bitpacking (new)
  • 17. Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Double Writers 273.331 247.634 231.741 0 50 100 150 200 250 300 old buffered + BE buffered + LE MeanTime(ms) Double Write Modes ORC Write Double Performance (smaller is better) Double Writers ● JVM is big-endian ● X86 is little-endian ● Special handling of NaN
  • 18. Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Scale compression buffers 269.4 263.3 258.5 258.4 258.4 258.4 184.8 183.5 182.2 180.1 178.3 177.4 140 160 180 200 220 240 260 280 300 320 8 16 32 64 128 256 SizeinMB Compression Buffer Size in KB File Size ZLIB SNAPPY Large Columns vs More Columns ● Adjust when >1000 columns Trade offs ● Compression ● Low memory use More additions ● Dynamically partitioned insert
  • 19. Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Streaming Ingest + ACID Broken pattern: Partitions for Atomicity- - Isolation & Consistency on retries+ Transactions are pluggable (txn.manager)+ Cache/Replication friendly (base + deltas)+
  • 20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: LLAP and Sub-second ORC – Pushing for Sub-second
  • 21. Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Row Indexes Min-Max pruning ● Evaluate on statistics Bloom filters ● Better String filters ● Filter a random distribution LLAP Future ● Row-level vector SARGs 5999989709 540,000 10,000 No Indexes Min-Max Indexes Bloomfilter Indexes from tpch_1000.lineitem where l_orderkey = 1212000001; (log scale) Rows Read
  • 22. Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Row Indexes Min-Max pruning ● Evaluate on Statistics Bloom filters ● Better String filters ● Filter a random distribution LLAP Future ● Row-level vector SARGs 74 4.5 1.34 No Indexes Min-Max Indexes Bloomfilter Indexes * from tpch_1000.lineitem where l_orderkey=1212000001; (smaller better) Time Taken (seconds)
  • 23. Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: JDK8 SIMD Readers Integer encodings ● Base + Delta ● Run-length ● Direct Trade-off for Size/Speed ● Use fixed bit-width loops ● Snap to nearest bit-width 0 200 400 600 800 1000 1200 1400 1600 1800 1 2 4 8 16 24 32 40 48 56 64 MeanTime(ms) Bit Width ORC Read Integer Performance hive 0.13 unpacking hive-1.0 unpacking (new)
  • 24. Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Vectorization + SIMD Advantage of a Single SerDe ● Primitive Types Allocation free tight inner loops ● JDK8 has auto-vectorization Vectorized Early Filter ● Vectors can be filtered early in ORC ● StringDictionary can be used to binary-search Vectorized SIMD Join ● Performance for single key joins 0x00007f13d2e6afb0: vmovdqu 0x10(%rsi,%rax,8),%ymm2 0x00007f13d2e6afb6: vaddpd %ymm1,%ymm2,%ymm2 0x00007f13d2e6afba: movslq %eax,%r10 0x00007f13d2e6afbd: vmovdqu 0x30(%rsi,%r10,8),%ymm3 ;*daload vector.expressions.gen.DoubleColAddDoubleColumn::evaluate (line 94) 0x00007f13d2e6afc4: vmovdqu %ymm2,0x10(%rdx,%rax,8) 0x00007f13d2e6afca: vaddpd %ymm1,%ymm3,%ymm2 0x00007f13d2e6afce: vmovdqu %ymm2,0x30(%rdx,%r10,8) ;*dastore vector.expressions.gen.DoubleColAddDoubleColumn::evaluate (line 94)
  • 25. Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Split Strategies + Tez Grouping Amdahl’s Law ● As fast as the slowest task ● Slice work thinly, but not too thin Split-generation vs Execution time ● ETL ● BI ● Hybrid Split-grouping & estimation ● ColumnarSplit size ● Group by estimate, not file size ● Bucket pruning Slow split
  • 26. Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: LLAP - JIT Performance for short queries+ Row-group level caching+ Asynchronous IO Elevator+ + Multi-threaded Column Vector processing+
  • 27. Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: LLAP (+ SIMD + Split Strategies + Row Indexes)
  • 28. Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Questions? ? Interested? Stop by the Hortonworks booth to learn more
  • 29. Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Endnotes (1) https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/ (2) http://www.slideshare.net/AdamKawa/a-perfect-hive-query-for-a-perfect-meeting-hadoop-summit-2014
  • 30. Page30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved END
  • 31. Page31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved But wait, there’s more 178.5 225.1 172.2 220.1 674 389 433 446 0 50 100 150 200 250 0 100 200 300 400 500 600 700 800 ORC (old zlib) ORC SNAPPY ORC (new zlib) Parquet (snappy) TPC-H Lineitem (1000 scale) Time Taken (seconds) Data Sizes (on disk)