Soumettre la recherche
Mettre en ligne
ORC File Introduction
•
Télécharger en tant que PPTX, PDF
•
13 j'aime
•
11,815 vues
Owen O'Malley
Suivre
I present the Optimized Row Columnar (ORC) file format for Apache Hive.
Lire moins
Lire la suite
Technologie
Affichage du diaporama
Signaler
Partager
Affichage du diaporama
Signaler
Partager
1 sur 12
Télécharger maintenant
Recommandé
ORC Files
ORC Files
Owen O'Malley
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
HBase Application Performance Improvement
HBase Application Performance Improvement
Biju Nair
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
DataWorks Summit
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
Recommandé
ORC Files
ORC Files
Owen O'Malley
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
HBase Application Performance Improvement
HBase Application Performance Improvement
Biju Nair
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
DataWorks Summit
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
enissoz
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
Apache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
Redis introduction
Redis introduction
Federico Daniel Colombo Gennarelli
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
Apache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
Apache Spark Overview
Apache Spark Overview
Vadim Y. Bichutskiy
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
Cost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWS
Amazon Web Services
Performance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
NAVER D2
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark Summit
Introduction to Redis
Introduction to Redis
Arnab Mitra
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
Contenu connexe
Tendances
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
enissoz
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
Apache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
Redis introduction
Redis introduction
Federico Daniel Colombo Gennarelli
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
Apache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
Apache Spark Overview
Apache Spark Overview
Vadim Y. Bichutskiy
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
Cost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWS
Amazon Web Services
Performance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
NAVER D2
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark Summit
Introduction to Redis
Introduction to Redis
Arnab Mitra
Tendances
(20)
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Apache Tez – Present and Future
Apache Tez – Present and Future
Redis introduction
Redis introduction
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
Apache hadoop technology : Beginners
Apache hadoop technology : Beginners
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
Apache Spark Overview
Apache Spark Overview
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
Cost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWS
Performance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Hadoop Overview & Architecture
Hadoop Overview & Architecture
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Introduction to Redis
Introduction to Redis
En vedette
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
Owen O'Malley
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
Owen O'Malley
Adding ACID Updates to Hive
Adding ACID Updates to Hive
Owen O'Malley
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
Owen O'Malley
Data protection2015
Data protection2015
Owen O'Malley
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
Owen O'Malley
Next Generation MapReduce
Next Generation MapReduce
Owen O'Malley
Bay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 Intro
Owen O'Malley
Next Generation Hadoop Operations
Next Generation Hadoop Operations
Owen O'Malley
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Hadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...
Julien Le Dem
Sql on everything with drill
Sql on everything with drill
Julien Le Dem
Mapreduce total order sorting technique
Mapreduce total order sorting technique
Uday Vakalapudi
Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010
Yahoo Developer Network
Strata NY 2016: The future of column-oriented data processing with Arrow and ...
Strata NY 2016: The future of column-oriented data processing with Arrow and ...
Julien Le Dem
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
DataWorks Summit
En vedette
(20)
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
Adding ACID Updates to Hive
Adding ACID Updates to Hive
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
Data protection2015
Data protection2015
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
Next Generation MapReduce
Next Generation MapReduce
Bay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 Intro
Next Generation Hadoop Operations
Next Generation Hadoop Operations
Optimizing Hive Queries
Optimizing Hive Queries
Hadoop Security Architecture
Hadoop Security Architecture
Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...
Sql on everything with drill
Sql on everything with drill
Mapreduce total order sorting technique
Mapreduce total order sorting technique
Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010
Strata NY 2016: The future of column-oriented data processing with Arrow and ...
Strata NY 2016: The future of column-oriented data processing with Arrow and ...
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
Similaire à ORC File Introduction
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
Inside hadoop-dev
Inside hadoop-dev
Steve Loughran
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
Steve Loughran
HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talk
Steve Loughran
Mobile Development Meets Semantic Technology
Mobile Development Meets Semantic Technology
Blue Slate Solutions
Orange County HUG - Agile Data on HDP
Orange County HUG - Agile Data on HDP
Hortonworks
An Introduction to Spring Data
An Introduction to Spring Data
Oliver Gierke
LA HUG - Agile Analytics Applications on HDP
LA HUG - Agile Analytics Applications on HDP
Hortonworks
Hadoop: today and tomorrow
Hadoop: today and tomorrow
Steve Loughran
Cloud Consolidation with Oracle (RAC) - How much is too much?
Cloud Consolidation with Oracle (RAC) - How much is too much?
Markus Michalewicz
Sentri SharePoint Performance webinar
Sentri SharePoint Performance webinar
Sentri
DB2 z/OS & Java - What\'s New?
DB2 z/OS & Java - What\'s New?
Laura Hood
Agile analytics applications on hadoop
Agile analytics applications on hadoop
Hortonworks
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
russell_jurney
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache Accumulo
Hortonworks
Introduction to DDD
Introduction to DDD
Radosław Mejer
ORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, Smaller
DataWorks Summit
Ozone and HDFS’s evolution
Ozone and HDFS’s evolution
DataWorks Summit
ORC 2015
ORC 2015
t3rmin4t0r
Storage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store Databases
David Walker
Similaire à ORC File Introduction
(20)
Optimizing Hive Queries
Optimizing Hive Queries
Inside hadoop-dev
Inside hadoop-dev
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talk
Mobile Development Meets Semantic Technology
Mobile Development Meets Semantic Technology
Orange County HUG - Agile Data on HDP
Orange County HUG - Agile Data on HDP
An Introduction to Spring Data
An Introduction to Spring Data
LA HUG - Agile Analytics Applications on HDP
LA HUG - Agile Analytics Applications on HDP
Hadoop: today and tomorrow
Hadoop: today and tomorrow
Cloud Consolidation with Oracle (RAC) - How much is too much?
Cloud Consolidation with Oracle (RAC) - How much is too much?
Sentri SharePoint Performance webinar
Sentri SharePoint Performance webinar
DB2 z/OS & Java - What\'s New?
DB2 z/OS & Java - What\'s New?
Agile analytics applications on hadoop
Agile analytics applications on hadoop
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache Accumulo
Introduction to DDD
Introduction to DDD
ORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, Smaller
Ozone and HDFS’s evolution
Ozone and HDFS’s evolution
ORC 2015
ORC 2015
Storage Characteristics Of Call Data Records In Column Store Databases
Storage Characteristics Of Call Data Records In Column Store Databases
Plus de Owen O'Malley
Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
Owen O'Malley
Big Data's Journey to ACID
Big Data's Journey to ACID
Owen O'Malley
ORC Deep Dive 2020
ORC Deep Dive 2020
Owen O'Malley
Protect your private data with ORC column encryption
Protect your private data with ORC column encryption
Owen O'Malley
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column Encryption
Owen O'Malley
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Owen O'Malley
Strata NYC 2018 Iceberg
Strata NYC 2018 Iceberg
Owen O'Malley
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Owen O'Malley
ORC Column Encryption
ORC Column Encryption
Owen O'Malley
Plus de Owen O'Malley
(9)
Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
Big Data's Journey to ACID
Big Data's Journey to ACID
ORC Deep Dive 2020
ORC Deep Dive 2020
Protect your private data with ORC column encryption
Protect your private data with ORC column encryption
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column Encryption
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Strata NYC 2018 Iceberg
Strata NYC 2018 Iceberg
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
ORC Column Encryption
ORC Column Encryption
Dernier
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Pixlogix Infotech
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Antenna Manufacturer Coco
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Enterprise Knowledge
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Delhi Call girls
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
Dernier
(20)
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
ORC File Introduction
1.
ORC Files Owen O’Malley owen@hortonworks.com December
2012 © Hortonworks Inc. 2012 Page 1
2.
Top Level
Page 2 © Hortonworks Inc. 2012
3.
File Structure
Page 3 © Hortonworks Inc. 2012
4.
Stripe Structure
Page 4 © Hortonworks Inc. 2012
5.
File Layout
Page 5 © Hortonworks Inc. 2012
6.
Integer Column Serialization
Page 6 © Hortonworks Inc. 2012
7.
String Column Serialization
Page 7 © Hortonworks Inc. 2012
8.
Compression
Page 8 © Hortonworks Inc. 2012
9.
Projection and Predicate
Filtering Page 9 © Hortonworks Inc. 2012
10.
Example File Sizes
Page 10 © Hortonworks Inc. 2012
11.
Final notes
Page 11 © Hortonworks Inc. 2012
12.
Comparison
RC File Trevni ORC File Hive Type Model N N Y Separate complex columns N Y Y Splits found quickly N Y Y Default column group size 4MB 64MB* 250MB Files per a bucket 1 >1 1 Store min, max, sum, count N N Y Versioned metadata N Y Y Run length data encoding N N Y Store strings in dictionary N N Y Store row count N Y Y Skip compressed blocks N N Y Store internal indexes N N Y Page 12 © Hortonworks Inc. 2012
Télécharger maintenant