Soumettre la recherche
Mettre en ligne
ORC: 2015 Faster, Better, Smaller
•
15 j'aime
•
7,701 vues
DataWorks Summit
Suivre
ORC: 2015 Faster, Better, Smaller Gopal Vijayaraghavan Hortonworks
Lire moins
Lire la suite
Sciences
Signaler
Partager
Signaler
Partager
1 sur 31
Recommandé
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
ORC Files
ORC Files
Owen O'Malley
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
20090622 Velocity
20090622 Velocity
Jeff Hammerbacher
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! Scale
DataWorks Summit
Recommandé
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
ORC Files
ORC Files
Owen O'Malley
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
20090622 Velocity
20090622 Velocity
Jeff Hammerbacher
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! Scale
DataWorks Summit
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
Leveraging Nexus Repository Manager at the Heart of DevOps
Leveraging Nexus Repository Manager at the Heart of DevOps
SeniorStoryteller
ORC File Introduction
ORC File Introduction
Owen O'Malley
Apache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
Introduction to Oracle Cloud Infrastructure Services
Introduction to Oracle Cloud Infrastructure Services
Knoldus Inc.
ORC 2015
ORC 2015
t3rmin4t0r
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
DataWorks Summit
Introduction to Redis
Introduction to Redis
Arnab Mitra
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér
Hadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
confluent
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
DataWorks Summit
Oracle GoldenGate for Disaster Recovery
Oracle GoldenGate for Disaster Recovery
Fumiko Yamashita
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
Daniel Bimschas
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
DataWorks Summit
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
aldaschwede80
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
Dan Harvey
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
ScyllaDB
Oracle Cloud
Oracle Cloud
MarketingArrowECS_CZ
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Contenu connexe
Tendances
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
Leveraging Nexus Repository Manager at the Heart of DevOps
Leveraging Nexus Repository Manager at the Heart of DevOps
SeniorStoryteller
ORC File Introduction
ORC File Introduction
Owen O'Malley
Apache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
Introduction to Oracle Cloud Infrastructure Services
Introduction to Oracle Cloud Infrastructure Services
Knoldus Inc.
ORC 2015
ORC 2015
t3rmin4t0r
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
DataWorks Summit
Introduction to Redis
Introduction to Redis
Arnab Mitra
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér
Hadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
confluent
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
DataWorks Summit
Oracle GoldenGate for Disaster Recovery
Oracle GoldenGate for Disaster Recovery
Fumiko Yamashita
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
Daniel Bimschas
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
DataWorks Summit
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
aldaschwede80
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
Dan Harvey
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
ScyllaDB
Oracle Cloud
Oracle Cloud
MarketingArrowECS_CZ
Tendances
(20)
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
Leveraging Nexus Repository Manager at the Heart of DevOps
Leveraging Nexus Repository Manager at the Heart of DevOps
ORC File Introduction
ORC File Introduction
Apache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
Introduction to Oracle Cloud Infrastructure Services
Introduction to Oracle Cloud Infrastructure Services
ORC 2015
ORC 2015
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
Introduction to Redis
Introduction to Redis
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Hadoop Security Architecture
Hadoop Security Architecture
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
Oracle GoldenGate for Disaster Recovery
Oracle GoldenGate for Disaster Recovery
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
Oracle Cloud
Oracle Cloud
En vedette
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Redis edu 1
Redis edu 1
DaeMyung Kang
이것이 레디스다.
이것이 레디스다.
Kris Jeong
Redis edu 3
Redis edu 3
DaeMyung Kang
Redis edu 2
Redis edu 2
DaeMyung Kang
En vedette
(6)
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
Redis edu 1
Redis edu 1
이것이 레디스다.
이것이 레디스다.
Redis edu 3
Redis edu 3
Redis edu 2
Redis edu 2
Similaire à ORC: 2015 Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
Bob Ward
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
Ceph
Ceph
Hien Nguyen Van
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
Oracle SPARC T7 a M7 servery
Oracle SPARC T7 a M7 servery
MarketingArrowECS_CZ
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Community
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
Jim St. Leger
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Community
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
alanfgates
Hive for Analytic Workloads
Hive for Analytic Workloads
DataWorks Summit
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Community
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
Similaire à ORC: 2015 Faster, Better, Smaller
(20)
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Ceph
Ceph
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Oracle SPARC T7 a M7 servery
Oracle SPARC T7 a M7 servery
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
Hive for Analytic Workloads
Hive for Analytic Workloads
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
Plus de DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
Plus de DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Dernier
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
anandsmhk
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
anilsa9823
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
Sumit Kumar yadav
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
Sérgio Sacani
Nanoparticles synthesis and characterization
Nanoparticles synthesis and characterization
kaibalyasahoo82800
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
Sérgio Sacani
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
Sumit Kumar yadav
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
ssifa0344
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
Natural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
AArockiyaNisha
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
muntazimhurra
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
sakshisoni2385
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
Sumit Kumar yadav
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Sheetal Arora
Dernier
(20)
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
Nanoparticles synthesis and characterization
Nanoparticles synthesis and characterization
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Natural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
ORC: 2015 Faster, Better, Smaller
1.
Page1 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: 2015 Gopal Vijayaraghavan
2.
Page2 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC – Optimized Row-Columnar File Columnar Storage+ Row-groups & Fixed splits Protobuf Metadata Storage+ + Type-safe Vectorization+ Hive ACID transactions+ Single SerDe for Format+
3.
Page3 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Need for Speed: The Stinger Initiative Stinger: An Open Roadmap to improve Apache Hive’s performance 100x. Launched: February 2013; Delivered: April 2014. Delivered in 100% Apache Open Source. SQL Engine Vectorized SQL Engine Columnar Storage ORC = 100X+ + Distributed Execution Apache Tez
4.
Page4 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC at Facebook Saved more than 1,400 servers worth of storage. Compressioni Compression ratio increased from 5x to 8x globally. Compressioni [1]
5.
Page5 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC at Spotify 16x less HDFS read when using ORC versus Avro.(5) IOi 32x less CPU when using ORC versus Avro.(5) CPUi [2]
6.
Page 6 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: Today What is Optimized about ORC?
7.
Page7 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC – Optimized Row-Columnar File Columnar Storage+ Row-groups & Stripe splits Protobuf Metadata Storage+ + Type-safe Vectorization+ Hive ACID transactions+ Single SerDe for Format+
8.
Page8 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Columnar Storage Storage Performance ● Compress each column differently ● Detect & compress common sub-sequences ● Auto-increment ids ● String Enums ● Large Integers (uid scale) ● Unique strings (UUIDS) Read Performance ● Column projection ● Columnar deserializers ● Data locality Write Throughput ● Stats auto-gather
9.
Page9 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Row-groups & Stripe splits Split Parallelism ● Effective parallelism ● No seeks to find boundaries ● No splits with zero data ● Decompress fixed chunks Stripes ● Single unsplittable chunk ● Will reside in 1 HDFS block entirely ● Is self-contained for all read ops
10.
Page10 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved A Single SerDe for all ORC Files A Single Writer ● No mismatch of serialization ● Forward compatibility Readers ● Multiple reader implementations ● Allows for vector readers ● And row-mode readers ● Similar loop – good JIT hit-rate
11.
Page11 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Protobuf Metadata Storage Standardized Metadata ● Readers are easier to write ● Metadata readers are auto-generated Metadata Forward Compatibility ● Protobuf Optional fields Statistics Storage in Metadata ● Standard serialization for stats ● Allows for PPD into the IO layer
12.
Page12 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Type-safe Vectorization Schema on Write ● Write ORC Structs with types ● SerDe & Inputformat Read Performance ● Data is read with few copies ● Primitive types are fast ● Primitives are also unboxed ● Predicates are typed too
13.
Page 13 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: ETL Improvements Always more new data
14.
Page14 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC (Zlib): Compress Differently 674 389 433 ORC (old zlib) ORC SNAPPY ORC (new zlib) ETL for TPC-H LineItem (scale 1 Tb) Time Taken Different Zlib algorithms for encoding ● Z_FILTERED ● Z_DEFAULT ● Z_BEST_SPEED ● Z_DEFAULT_COMPRESSION In detail ● Compress IS_NULL bitsets lightly ● Compress Integers differently from Doubles ● Compress string dictionaries differently ● Allow for user choice
15.
Page15 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC (Zlib): Compress Differently Different Zlib algorithms for encoding ● Z_FILTERED ● Z_DEFAULT ● Z_BEST_SPEED ● Z_DEFAULT_COMPRESSION In detail ● Compress IS_NULL bitsets lightly ● Compress Integers differently from Doubles ● Compress string dictionaries differently ● Allow for user choice 178.5 225.1 172.2 ORC (old zlib) ORC SNAPPY ORC (new zlib) Data Sizes for TPC-H Lineitem (Scale 1 Tb) Size on Disk
16.
Page16 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Using JDK8 SIMD: Integer Writers Integer encodings ● Base + Delta ● Run-length ● Direct Trade-off for Size/Speed ● Use fixed bit-width loops ● Snap to nearest bit-width 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 2 4 8 16 24 32 40 48 56 64 MeanTime(ms) Bit Width ORC Write Integer Performance (smaller better) hive 0.13 bitpacking hive 1.0 bitpacking (new)
17.
Page17 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Double Writers 273.331 247.634 231.741 0 50 100 150 200 250 300 old buffered + BE buffered + LE MeanTime(ms) Double Write Modes ORC Write Double Performance (smaller is better) Double Writers ● JVM is big-endian ● X86 is little-endian ● Special handling of NaN
18.
Page18 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: Scale compression buffers 269.4 263.3 258.5 258.4 258.4 258.4 184.8 183.5 182.2 180.1 178.3 177.4 140 160 180 200 220 240 260 280 300 320 8 16 32 64 128 256 SizeinMB Compression Buffer Size in KB File Size ZLIB SNAPPY Large Columns vs More Columns ● Adjust when >1000 columns Trade offs ● Compression ● Low memory use More additions ● Dynamically partitioned insert
19.
Page19 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: Streaming Ingest + ACID Broken pattern: Partitions for Atomicity- - Isolation & Consistency on retries+ Transactions are pluggable (txn.manager)+ Cache/Replication friendly (base + deltas)+
20.
Page 20 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: LLAP and Sub-second ORC – Pushing for Sub-second
21.
Page21 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: Row Indexes Min-Max pruning ● Evaluate on statistics Bloom filters ● Better String filters ● Filter a random distribution LLAP Future ● Row-level vector SARGs 5999989709 540,000 10,000 No Indexes Min-Max Indexes Bloomfilter Indexes from tpch_1000.lineitem where l_orderkey = 1212000001; (log scale) Rows Read
22.
Page22 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: Row Indexes Min-Max pruning ● Evaluate on Statistics Bloom filters ● Better String filters ● Filter a random distribution LLAP Future ● Row-level vector SARGs 74 4.5 1.34 No Indexes Min-Max Indexes Bloomfilter Indexes * from tpch_1000.lineitem where l_orderkey=1212000001; (smaller better) Time Taken (seconds)
23.
Page23 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: JDK8 SIMD Readers Integer encodings ● Base + Delta ● Run-length ● Direct Trade-off for Size/Speed ● Use fixed bit-width loops ● Snap to nearest bit-width 0 200 400 600 800 1000 1200 1400 1600 1800 1 2 4 8 16 24 32 40 48 56 64 MeanTime(ms) Bit Width ORC Read Integer Performance hive 0.13 unpacking hive-1.0 unpacking (new)
24.
Page24 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: Vectorization + SIMD Advantage of a Single SerDe ● Primitive Types Allocation free tight inner loops ● JDK8 has auto-vectorization Vectorized Early Filter ● Vectors can be filtered early in ORC ● StringDictionary can be used to binary-search Vectorized SIMD Join ● Performance for single key joins 0x00007f13d2e6afb0: vmovdqu 0x10(%rsi,%rax,8),%ymm2 0x00007f13d2e6afb6: vaddpd %ymm1,%ymm2,%ymm2 0x00007f13d2e6afba: movslq %eax,%r10 0x00007f13d2e6afbd: vmovdqu 0x30(%rsi,%r10,8),%ymm3 ;*daload vector.expressions.gen.DoubleColAddDoubleColumn::evaluate (line 94) 0x00007f13d2e6afc4: vmovdqu %ymm2,0x10(%rdx,%rax,8) 0x00007f13d2e6afca: vaddpd %ymm1,%ymm3,%ymm2 0x00007f13d2e6afce: vmovdqu %ymm2,0x30(%rdx,%r10,8) ;*dastore vector.expressions.gen.DoubleColAddDoubleColumn::evaluate (line 94)
25.
Page25 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: Split Strategies + Tez Grouping Amdahl’s Law ● As fast as the slowest task ● Slice work thinly, but not too thin Split-generation vs Execution time ● ETL ● BI ● Hybrid Split-grouping & estimation ● ColumnarSplit size ● Group by estimate, not file size ● Bucket pruning Slow split
26.
Page26 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: LLAP - JIT Performance for short queries+ Row-group level caching+ Asynchronous IO Elevator+ + Multi-threaded Column Vector processing+
27.
Page27 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved ORC: LLAP (+ SIMD + Split Strategies + Row Indexes)
28.
Page28 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Questions? ? Interested? Stop by the Hortonworks booth to learn more
29.
Page29 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Endnotes (1) https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/ (2) http://www.slideshare.net/AdamKawa/a-perfect-hive-query-for-a-perfect-meeting-hadoop-summit-2014
30.
Page30 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved END
31.
Page31 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved But wait, there’s more 178.5 225.1 172.2 220.1 674 389 433 446 0 50 100 150 200 250 0 100 200 300 400 500 600 700 800 ORC (old zlib) ORC SNAPPY ORC (new zlib) Parquet (snappy) TPC-H Lineitem (1000 scale) Time Taken (seconds) Data Sizes (on disk)