SlideShare a Scribd company logo
Ā© 2015 Dremio Corporation
If you have your own Columnar format,
stop now and use Parquet
šŸ˜›
Julien Le Dem
Principal Architect, Dremio
VP Apache Parquet
Ā© 2015 Dremio Corporation
About Dremio
Jacques Nadeau
Founder & CTO
ā€¢Apache Drill PMC Chair
ā€¢Recognized SQL & NoSQL expert
ā€¢Quigo (AOL); Offermatica (ADBE);
aQuantive (MSFT)
Tomer Shiran
Founder & CEO
ā€¢Apache Drill Founder
ā€¢MapR (VP Product); Microsoft; IBM
Research
ā€¢Carnegie Mellon, Technion
Julien Le Dem
Architect
ā€¢Apache Parquet Founder
ā€¢Apache Pig PMC Member
ā€¢Twitter (Lead, Analytics Data Pipeline);
Yahoo! (Architect)
Top Silicon Valley VCsā€¢ Enabling self-service data discovery, exploration and
analysis on modern data
ā€¢ Founded in June 2015
ā€¢ Building on open source technologies including Drill,
Parquet, Spark
Ā© 2015 Dremio Corporation
Background of Parquet
ā€¢ Twitterā€™s data
ā€¢ Lots of data: Instrumentation, User graph, Derived data, ...
ā€¢ Complex: deeply nested structures
ā€¢ Analytics infrastructure:
ā€¢ Several 1000s nodes Hadoop clusters
ā€¢ Log collection to HDFS in Thrift
ā€¢ Parquet
ā€¢ Columnar: space and query efficient
ā€¢ Inspired from the Google Dremel Paper
ā€¢ supports complex data
ā€¢ interoperable
Caillebotte: The Parquet Planers
Ā© 2015 Dremio Corporation
Parquet timeline
ā€¢ Fall 2012: Twitter & Clouderaā€™s Impala team
merge efforts to develop columnar formats.
ā€¢ March 2013: OSS announcement; Criteo
signs on for Hive integration.
ā€¢ July 2013: 1.0 release. 18 contributors from
more than 5 organizations.
ā€¢ August 2013: Drill chose Parquet as its
primary storage format.
ā€¢ May 2014: Apache Incubator. 40+
contributors, 18 with 1000+ LOC. 26
incremental releases.
ā€¢ Apr 2015: Parquet graduates from the
Apache Incubator.
Ā© 2015 Dremio Corporation
What does Parquet do?
Interoperability
Space efļ¬ciency
Query efļ¬ciency
@EmrgencyKittens
Ā© 2015 Dremio Corporation
Columnar storage
Logical table
representation
Row layout
Column layout
Nested schema
a b c
a b c
a1 b1 c1
a2 b2 c2
a3 b3 c3
a4 b4 c4
a5 b5 c5
a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5
a1 b1 c1a2 b2 c2a3 b3 c3a4 b4 c4a5 b5 c5
encoded chunk encoded chunk encoded chunk
On Disk:
Encodings: Dictionary, RLE, Delta, Preļ¬x
Ā© 2015 Dremio Corporation
Nested representation
Document
DocId Links Name
Backward Forward Language Url
Code Country
Columns:
docid
links.backward
links.forward
name.language.code
name.language.country
name.url
Schema:
Borrowed from the Google Dremel paper
https://blog.twitter.com/2013/dremel-made-simple-with-parquet
Ā© 2015 Dremio Corporation
Statistics
Vertical partitioning
(projection push down)
Horizontal partitioning
(predicate push down)
Read only the data
you need!
+ =
a b c
a1 b1 c1
a2 b2 c2
a3 b3 c3
a4 b4 c4
a5 b5 c5
a b c
a1 b1 c1
a2 b2 c2
a3 b3 c3
a4 b4 c4
a5 b5 c5
a b c
a1 b1 c1
a2 b2 c2
a3 b3 c3
a4 b4 c4
a5 b5 c5
+ =
Ā© 2015 Dremio Corporation
Interoperability
Library level integration Query engine integration
Avro Thrift
Protocol
Buffer
Pig Tuple Hive SerDe
Assembly/striping (model agnostic)
Parquet ļ¬le format (language agnostic)
Object model
parquet-avroConverters parquet-thrift parquet-proto parquet-pig parquet-hive
Column encodings
Impala
...
...
Encodings (C)
PrestoDrill ā€¦
Ā© 2015 Dremio Corporation
Query engines, frameworks and libraries
integrated with Parquet (non exhaustive)
Query engines:
Hive, Impala, HAWQ,
IBM Big SQL, Drill, Tajo,
Pig, Presto, SparkSQL
Frameworks:
Spark, MapReduce, Cascading,
Crunch, Scalding, Kite
Data Models:
Avro, Thrift, ProtocolBuffers,
POJOs
Ā© 2015 Dremio Corporation
Loose coupling
ā€¢ Users donā€™t want to load
their data into every tool.
ā€¢ Many tools are available
and show up every day.
ā€¢ The cost of trying a new
tool should be minimal
Storage (HDFS/S3/ā€¦)
Interactive
queries
(Drill, Impala,
Presto, ā€¦)
automated
dashboard
machine
learning
Query-efļ¬cient
format
Parquet
Graph
Processing
(Giraph, ā€¦)
Batch
computation
(Pig, Cascading,
Scalding, Spark,
ā€¦)
Ā© 2015 Dremio Corporation
Get involved
Twitter:
- @ApacheParquet
Mailing list:
- dev@parquet.apache.org
Github repo:
- https://github.com/apache/parquet-mr
Parquet sync ups:
- Regular meetings on google hangout

More Related Content

What's hot

Apache parquet - Apache big data North America 2017
Apache parquet - Apache big data North America 2017Apache parquet - Apache big data North America 2017
Apache parquet - Apache big data North America 2017
techmaddy
Ā 
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperabilityHow Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperability
Uwe Korn
Ā 
Strata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed databaseStrata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed database
Julien Le Dem
Ā 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
Julien Le Dem
Ā 
HUG_Ireland_Apache_Arrow_Tomer_Shiran
HUG_Ireland_Apache_Arrow_Tomer_Shiran HUG_Ireland_Apache_Arrow_Tomer_Shiran
HUG_Ireland_Apache_Arrow_Tomer_Shiran
John Mulhall
Ā 
From flat files to deconstructed database
From flat files to deconstructed databaseFrom flat files to deconstructed database
From flat files to deconstructed database
Julien Le Dem
Ā 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
Wes McKinney
Ā 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
airisData
Ā 
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the FuturePython Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
Wes McKinney
Ā 
ORC Deep Dive 2020
ORC Deep Dive 2020ORC Deep Dive 2020
ORC Deep Dive 2020
Owen O'Malley
Ā 
Apache Arrow - An Overview
Apache Arrow - An OverviewApache Arrow - An Overview
Apache Arrow - An Overview
Dremio Corporation
Ā 
HUG France - Apache Drill
HUG France - Apache DrillHUG France - Apache Drill
HUG France - Apache Drill
MapR Technologies
Ā 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
Dremio Corporation
Ā 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyHDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
DataWorks Summit
Ā 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
Ā 
PyData London 2017 ā€“ Efficient and portable DataFrame storage with Apache Par...
PyData London 2017 ā€“ Efficient and portable DataFrame storage with Apache Par...PyData London 2017 ā€“ Efficient and portable DataFrame storage with Apache Par...
PyData London 2017 ā€“ Efficient and portable DataFrame storage with Apache Par...
Uwe Korn
Ā 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit
Ā 
Big Data's Journey to ACID
Big Data's Journey to ACIDBig Data's Journey to ACID
Big Data's Journey to ACID
Owen O'Malley
Ā 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Skillspeed
Ā 
Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
Wes McKinney
Ā 

What's hot (20)

Apache parquet - Apache big data North America 2017
Apache parquet - Apache big data North America 2017Apache parquet - Apache big data North America 2017
Apache parquet - Apache big data North America 2017
Ā 
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperabilityHow Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperability
Ā 
Strata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed databaseStrata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed database
Ā 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
Ā 
HUG_Ireland_Apache_Arrow_Tomer_Shiran
HUG_Ireland_Apache_Arrow_Tomer_Shiran HUG_Ireland_Apache_Arrow_Tomer_Shiran
HUG_Ireland_Apache_Arrow_Tomer_Shiran
Ā 
From flat files to deconstructed database
From flat files to deconstructed databaseFrom flat files to deconstructed database
From flat files to deconstructed database
Ā 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
Ā 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
Ā 
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the FuturePython Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
Ā 
ORC Deep Dive 2020
ORC Deep Dive 2020ORC Deep Dive 2020
ORC Deep Dive 2020
Ā 
Apache Arrow - An Overview
Apache Arrow - An OverviewApache Arrow - An Overview
Apache Arrow - An Overview
Ā 
HUG France - Apache Drill
HUG France - Apache DrillHUG France - Apache Drill
HUG France - Apache Drill
Ā 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
Ā 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyHDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
Ā 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Ā 
PyData London 2017 ā€“ Efficient and portable DataFrame storage with Apache Par...
PyData London 2017 ā€“ Efficient and portable DataFrame storage with Apache Par...PyData London 2017 ā€“ Efficient and portable DataFrame storage with Apache Par...
PyData London 2017 ā€“ Efficient and portable DataFrame storage with Apache Par...
Ā 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
Ā 
Big Data's Journey to ACID
Big Data's Journey to ACIDBig Data's Journey to ACID
Big Data's Journey to ACID
Ā 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Ā 
Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
Ā 

Viewers also liked

Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
Ā 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
Ā 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
Ā 
į„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼į„Œį…„į†Øį„‹į…µį†« į„ƒį…©į„†į…¦į„‹į…µį†« į„…į…¦į„‹į…µį„‹į…„ į„€į…®į„Žį…®į†Øį„’į…”į„€į…µ
į„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼į„Œį…„į†Øį„‹į…µį†« į„ƒį…©į„†į…¦į„‹į…µį†« į„…į…¦į„‹į…µį„‹į…„ į„€į…®į„Žį…®į†Øį„’į…”į„€į…µį„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼į„Œį…„į†Øį„‹į…µį†« į„ƒį…©į„†į…¦į„‹į…µį†« į„…į…¦į„‹į…µį„‹į…„ į„€į…®į„Žį…®į†Øį„’į…”į„€į…µ
į„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼į„Œį…„į†Øį„‹į…µį†« į„ƒį…©į„†į…¦į„‹į…µį†« į„…į…¦į„‹į…µį„‹į…„ į„€į…®į„Žį…®į†Øį„’į…”į„€į…µYoung-Ho Cho
Ā 
[NEXT ķ”„ģ—° Week2] UNIX ėŖ…ė ¹ģ–“ ź°„ė‹Øķ•˜ź²Œ ģ‚“ķŽ“ė³“źø°
[NEXT ķ”„ģ—° Week2] UNIX ėŖ…ė ¹ģ–“ ź°„ė‹Øķ•˜ź²Œ ģ‚“ķŽ“ė³“źø°[NEXT ķ”„ģ—° Week2] UNIX ėŖ…ė ¹ģ–“ ź°„ė‹Øķ•˜ź²Œ ģ‚“ķŽ“ė³“źø°
[NEXT ķ”„ģ—° Week2] UNIX ėŖ…ė ¹ģ–“ ź°„ė‹Øķ•˜ź²Œ ģ‚“ķŽ“ė³“źø°
Young-Ho Cho
Ā 
į„‹į…¢į„‘į…³į†Æį„…į…µį„į…¦į„‹į…µį„‰į…§į†« į„‹į…”į„į…µį„į…¦į†Øį„Žį…„į„‹į…Ŗ į„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼
į„‹į…¢į„‘į…³į†Æį„…į…µį„į…¦į„‹į…µį„‰į…§į†« į„‹į…”į„į…µį„į…¦į†Øį„Žį…„į„‹į…Ŗ į„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼ į„‹į…¢į„‘į…³į†Æį„…į…µį„į…¦į„‹į…µį„‰į…§į†« į„‹į…”į„į…µį„į…¦į†Øį„Žį…„į„‹į…Ŗ į„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼
į„‹į…¢į„‘į…³į†Æį„…į…µį„į…¦į„‹į…µį„‰į…§į†« į„‹į…”į„į…µį„į…¦į†Øį„Žį…„į„‹į…Ŗ į„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼
Young-Ho Cho
Ā 
Domain Driven Design
Domain Driven DesignDomain Driven Design
Domain Driven Design
Young-Ho Cho
Ā 
į„ƒį…©į„†į…¦į„‹į…µį†« į„Œį…®į„ƒį…© į„‰į…„į†Æį„€į…Øį„‹į…“ į„‡į…©į†«į„Œį…µį†Æ
į„ƒį…©į„†į…¦į„‹į…µį†« į„Œį…®į„ƒį…© į„‰į…„į†Æį„€į…Øį„‹į…“ į„‡į…©į†«į„Œį…µį†Æį„ƒį…©į„†į…¦į„‹į…µį†« į„Œį…®į„ƒį…© į„‰į…„į†Æį„€į…Øį„‹į…“ į„‡į…©į†«į„Œį…µį†Æ
į„ƒį…©į„†į…¦į„‹į…µį†« į„Œį…®į„ƒį…© į„‰į…„į†Æį„€į…Øį„‹į…“ į„‡į…©į†«į„Œį…µį†Æ
Young-Ho Cho
Ā 
What Makes Great Infographics
What Makes Great InfographicsWhat Makes Great Infographics
What Makes Great Infographics
SlideShare
Ā 
Masters of SlideShare
Masters of SlideShareMasters of SlideShare
Masters of SlideShare
Kapost
Ā 
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to SlideshareSTOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
Empowered Presentations
Ā 
You Suck At PowerPoint!
You Suck At PowerPoint!You Suck At PowerPoint!
You Suck At PowerPoint!
Jesse Desjardins - @jessedee
Ā 
10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization
Oneupweb
Ā 
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content MarketingHow To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
Content Marketing Institute
Ā 
Poster Hadoop summit 2011: pig embedding in scripting languages
Poster Hadoop summit 2011: pig embedding in scripting languagesPoster Hadoop summit 2011: pig embedding in scripting languages
Poster Hadoop summit 2011: pig embedding in scripting languages
Julien Le Dem
Ā 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & Tricks
SlideShare
Ā 
Embedding Pig in scripting languages
Embedding Pig in scripting languagesEmbedding Pig in scripting languages
Embedding Pig in scripting languages
Julien Le Dem
Ā 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
Yue Chen
Ā 
Processing edges on apache giraph
Processing edges on apache giraphProcessing edges on apache giraph
Processing edges on apache giraphDataWorks Summit
Ā 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File Introduction
Owen O'Malley
Ā 

Viewers also liked (20)

Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
Ā 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Ā 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Ā 
į„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼į„Œį…„į†Øį„‹į…µį†« į„ƒį…©į„†į…¦į„‹į…µį†« į„…į…¦į„‹į…µį„‹į…„ į„€į…®į„Žį…®į†Øį„’į…”į„€į…µ
į„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼į„Œį…„į†Øį„‹į…µį†« į„ƒį…©į„†į…¦į„‹į…µį†« į„…į…¦į„‹į…µį„‹į…„ į„€į…®į„Žį…®į†Øį„’į…”į„€į…µį„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼į„Œį…„į†Øį„‹į…µį†« į„ƒį…©į„†į…¦į„‹į…µį†« į„…į…¦į„‹į…µį„‹į…„ į„€į…®į„Žį…®į†Øį„’į…”į„€į…µ
į„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼į„Œį…„į†Øį„‹į…µį†« į„ƒį…©į„†į…¦į„‹į…µį†« į„…į…¦į„‹į…µį„‹į…„ į„€į…®į„Žį…®į†Øį„’į…”į„€į…µ
Ā 
[NEXT ķ”„ģ—° Week2] UNIX ėŖ…ė ¹ģ–“ ź°„ė‹Øķ•˜ź²Œ ģ‚“ķŽ“ė³“źø°
[NEXT ķ”„ģ—° Week2] UNIX ėŖ…ė ¹ģ–“ ź°„ė‹Øķ•˜ź²Œ ģ‚“ķŽ“ė³“źø°[NEXT ķ”„ģ—° Week2] UNIX ėŖ…ė ¹ģ–“ ź°„ė‹Øķ•˜ź²Œ ģ‚“ķŽ“ė³“źø°
[NEXT ķ”„ģ—° Week2] UNIX ėŖ…ė ¹ģ–“ ź°„ė‹Øķ•˜ź²Œ ģ‚“ķŽ“ė³“źø°
Ā 
į„‹į…¢į„‘į…³į†Æį„…į…µį„į…¦į„‹į…µį„‰į…§į†« į„‹į…”į„į…µį„į…¦į†Øį„Žį…„į„‹į…Ŗ į„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼
į„‹į…¢į„‘į…³į†Æį„…į…µį„į…¦į„‹į…µį„‰į…§į†« į„‹į…”į„į…µį„į…¦į†Øį„Žį…„į„‹į…Ŗ į„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼ į„‹į…¢į„‘į…³į†Æį„…į…µį„į…¦į„‹į…µį„‰į…§į†« į„‹į…”į„į…µį„į…¦į†Øį„Žį…„į„‹į…Ŗ į„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼
į„‹į…¢į„‘į…³į†Æį„…į…µį„į…¦į„‹į…µį„‰į…§į†« į„‹į…”į„į…µį„į…¦į†Øį„Žį…„į„‹į…Ŗ į„€į…¢į†Øį„Žį…¦į„Œį…µį„’į…£į†¼
Ā 
Domain Driven Design
Domain Driven DesignDomain Driven Design
Domain Driven Design
Ā 
į„ƒį…©į„†į…¦į„‹į…µį†« į„Œį…®į„ƒį…© į„‰į…„į†Æį„€į…Øį„‹į…“ į„‡į…©į†«į„Œį…µį†Æ
į„ƒį…©į„†į…¦į„‹į…µį†« į„Œį…®į„ƒį…© į„‰į…„į†Æį„€į…Øį„‹į…“ į„‡į…©į†«į„Œį…µį†Æį„ƒį…©į„†į…¦į„‹į…µį†« į„Œį…®į„ƒį…© į„‰į…„į†Æį„€į…Øį„‹į…“ į„‡į…©į†«į„Œį…µį†Æ
į„ƒį…©į„†į…¦į„‹į…µį†« į„Œį…®į„ƒį…© į„‰į…„į†Æį„€į…Øį„‹į…“ į„‡į…©į†«į„Œį…µį†Æ
Ā 
What Makes Great Infographics
What Makes Great InfographicsWhat Makes Great Infographics
What Makes Great Infographics
Ā 
Masters of SlideShare
Masters of SlideShareMasters of SlideShare
Masters of SlideShare
Ā 
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to SlideshareSTOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
Ā 
You Suck At PowerPoint!
You Suck At PowerPoint!You Suck At PowerPoint!
You Suck At PowerPoint!
Ā 
10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization
Ā 
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content MarketingHow To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
Ā 
Poster Hadoop summit 2011: pig embedding in scripting languages
Poster Hadoop summit 2011: pig embedding in scripting languagesPoster Hadoop summit 2011: pig embedding in scripting languages
Poster Hadoop summit 2011: pig embedding in scripting languages
Ā 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & Tricks
Ā 
Embedding Pig in scripting languages
Embedding Pig in scripting languagesEmbedding Pig in scripting languages
Embedding Pig in scripting languages
Ā 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
Ā 
Processing edges on apache giraph
Processing edges on apache giraphProcessing edges on apache giraph
Processing edges on apache giraph
Ā 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File Introduction
Ā 

Similar to If you have your own Columnar format, stop now and use Parquet šŸ˜›

The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Dremio Corporation
Ā 
The Big Connection: Integrating Cloud with Enterprise Systems
The Big Connection: Integrating Cloud with Enterprise SystemsThe Big Connection: Integrating Cloud with Enterprise Systems
The Big Connection: Integrating Cloud with Enterprise Systems
Inside Analysis
Ā 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
Timothy Spann
Ā 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
DataWorks Summit/Hadoop Summit
Ā 
Steve Totman Syncsort Big Data Warehousing hug 23 sept Final
Steve Totman Syncsort Big Data Warehousing hug 23 sept FinalSteve Totman Syncsort Big Data Warehousing hug 23 sept Final
Steve Totman Syncsort Big Data Warehousing hug 23 sept Final
Steven Totman
Ā 
Getting Started with HTML5 in Tech Com (STC 2012)
Getting Started with HTML5 in Tech Com (STC 2012)Getting Started with HTML5 in Tech Com (STC 2012)
Getting Started with HTML5 in Tech Com (STC 2012)
Peter Lubbers
Ā 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
Sean Roberts
Ā 
Advantages of the Cloud_Q2_2017.pptx
Advantages of the Cloud_Q2_2017.pptxAdvantages of the Cloud_Q2_2017.pptx
Advantages of the Cloud_Q2_2017.pptx
SaboneSabone
Ā 
PyData: The Next Generation
PyData: The Next GenerationPyData: The Next Generation
PyData: The Next Generation
Wes McKinney
Ā 
Level Up ā€“ How to Achieve Hadoop Acceleration
Level Up ā€“ How to Achieve Hadoop AccelerationLevel Up ā€“ How to Achieve Hadoop Acceleration
Level Up ā€“ How to Achieve Hadoop Acceleration
Inside Analysis
Ā 
Octo and the DevSecOps Evolution at Oracle by Ian Van Hoven
Octo and the DevSecOps Evolution at Oracle by Ian Van HovenOcto and the DevSecOps Evolution at Oracle by Ian Van Hoven
Octo and the DevSecOps Evolution at Oracle by Ian Van Hoven
InfluxData
Ā 
Modernizing i5 Applications
Modernizing i5 ApplicationsModernizing i5 Applications
Modernizing i5 Applications
ZendCon
Ā 
Cloud Native Applications Containers Microservices Platforms CICD Oh my
Cloud Native Applications Containers Microservices Platforms CICD Oh myCloud Native Applications Containers Microservices Platforms CICD Oh my
Cloud Native Applications Containers Microservices Platforms CICD Oh my
Fabio Chiodini
Ā 
Enabling Data centric Teams
Enabling Data centric TeamsEnabling Data centric Teams
Enabling Data centric Teams
Data Con LA
Ā 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)
Kent Graziano
Ā 
Senior C++ engineer
Senior C++ engineerSenior C++ engineer
Senior C++ engineer
Nataliya Zhuk
Ā 
BPM and SOA Are Going Mobile: An Architectural Perspective
BPM and SOA Are Going Mobile: An Architectural PerspectiveBPM and SOA Are Going Mobile: An Architectural Perspective
BPM and SOA Are Going Mobile: An Architectural PerspectiveGuido Schmutz
Ā 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho
Ā 
Mastering Docker and Docker Swarm
Mastering Docker and Docker Swarm Mastering Docker and Docker Swarm
Mastering Docker and Docker Swarm
Ankit Yadav
Ā 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
Skillspeed
Ā 

Similar to If you have your own Columnar format, stop now and use Parquet šŸ˜› (20)

The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Ā 
The Big Connection: Integrating Cloud with Enterprise Systems
The Big Connection: Integrating Cloud with Enterprise SystemsThe Big Connection: Integrating Cloud with Enterprise Systems
The Big Connection: Integrating Cloud with Enterprise Systems
Ā 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
Ā 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
Ā 
Steve Totman Syncsort Big Data Warehousing hug 23 sept Final
Steve Totman Syncsort Big Data Warehousing hug 23 sept FinalSteve Totman Syncsort Big Data Warehousing hug 23 sept Final
Steve Totman Syncsort Big Data Warehousing hug 23 sept Final
Ā 
Getting Started with HTML5 in Tech Com (STC 2012)
Getting Started with HTML5 in Tech Com (STC 2012)Getting Started with HTML5 in Tech Com (STC 2012)
Getting Started with HTML5 in Tech Com (STC 2012)
Ā 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
Ā 
Advantages of the Cloud_Q2_2017.pptx
Advantages of the Cloud_Q2_2017.pptxAdvantages of the Cloud_Q2_2017.pptx
Advantages of the Cloud_Q2_2017.pptx
Ā 
PyData: The Next Generation
PyData: The Next GenerationPyData: The Next Generation
PyData: The Next Generation
Ā 
Level Up ā€“ How to Achieve Hadoop Acceleration
Level Up ā€“ How to Achieve Hadoop AccelerationLevel Up ā€“ How to Achieve Hadoop Acceleration
Level Up ā€“ How to Achieve Hadoop Acceleration
Ā 
Octo and the DevSecOps Evolution at Oracle by Ian Van Hoven
Octo and the DevSecOps Evolution at Oracle by Ian Van HovenOcto and the DevSecOps Evolution at Oracle by Ian Van Hoven
Octo and the DevSecOps Evolution at Oracle by Ian Van Hoven
Ā 
Modernizing i5 Applications
Modernizing i5 ApplicationsModernizing i5 Applications
Modernizing i5 Applications
Ā 
Cloud Native Applications Containers Microservices Platforms CICD Oh my
Cloud Native Applications Containers Microservices Platforms CICD Oh myCloud Native Applications Containers Microservices Platforms CICD Oh my
Cloud Native Applications Containers Microservices Platforms CICD Oh my
Ā 
Enabling Data centric Teams
Enabling Data centric TeamsEnabling Data centric Teams
Enabling Data centric Teams
Ā 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)
Ā 
Senior C++ engineer
Senior C++ engineerSenior C++ engineer
Senior C++ engineer
Ā 
BPM and SOA Are Going Mobile: An Architectural Perspective
BPM and SOA Are Going Mobile: An Architectural PerspectiveBPM and SOA Are Going Mobile: An Architectural Perspective
BPM and SOA Are Going Mobile: An Architectural Perspective
Ā 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Ā 
Mastering Docker and Docker Swarm
Mastering Docker and Docker Swarm Mastering Docker and Docker Swarm
Mastering Docker and Docker Swarm
Ā 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
Ā 

More from Julien Le Dem

Data and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageData and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineage
Julien Le Dem
Ā 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & Marquez
Julien Le Dem
Ā 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
Julien Le Dem
Ā 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
Julien Le Dem
Ā 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
Julien Le Dem
Ā 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
Julien Le Dem
Ā 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Julien Le Dem
Ā 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
Julien Le Dem
Ā 
Parquet Twitter Seattle open house
Parquet Twitter Seattle open houseParquet Twitter Seattle open house
Parquet Twitter Seattle open house
Julien Le Dem
Ā 
Parquet overview
Parquet overviewParquet overview
Parquet overview
Julien Le Dem
Ā 

More from Julien Le Dem (10)

Data and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageData and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineage
Ā 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & Marquez
Ā 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
Ā 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
Ā 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
Ā 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
Ā 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Ā 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
Ā 
Parquet Twitter Seattle open house
Parquet Twitter Seattle open houseParquet Twitter Seattle open house
Parquet Twitter Seattle open house
Ā 
Parquet overview
Parquet overviewParquet overview
Parquet overview
Ā 

Recently uploaded

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
Ā 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
Ā 
ŠšŠŠ¢Š•Š Š˜ŠŠ ŠŠ‘Š—ŠÆŠ¢ŠžŠ’Š Ā«Š•Ń„ŠµŠŗтŠøŠ²Š½Šµ ŠæŠ»Š°Š½ŃƒŠ²Š°Š½Š½Ń тŠµŃŃ‚ŃƒŠ²Š°Š½Š½Ń ŠŗŠ»ŃŽŃ‡Š¾Š²Ń– Š°ŃŠæŠµŠŗтŠø тŠ° ŠæрŠ°Šŗт...
ŠšŠŠ¢Š•Š Š˜ŠŠ ŠŠ‘Š—ŠÆŠ¢ŠžŠ’Š  Ā«Š•Ń„ŠµŠŗтŠøŠ²Š½Šµ ŠæŠ»Š°Š½ŃƒŠ²Š°Š½Š½Ń тŠµŃŃ‚ŃƒŠ²Š°Š½Š½Ń  ŠŗŠ»ŃŽŃ‡Š¾Š²Ń– Š°ŃŠæŠµŠŗтŠø тŠ° ŠæрŠ°Šŗт...ŠšŠŠ¢Š•Š Š˜ŠŠ ŠŠ‘Š—ŠÆŠ¢ŠžŠ’Š  Ā«Š•Ń„ŠµŠŗтŠøŠ²Š½Šµ ŠæŠ»Š°Š½ŃƒŠ²Š°Š½Š½Ń тŠµŃŃ‚ŃƒŠ²Š°Š½Š½Ń  ŠŗŠ»ŃŽŃ‡Š¾Š²Ń– Š°ŃŠæŠµŠŗтŠø тŠ° ŠæрŠ°Šŗт...
ŠšŠŠ¢Š•Š Š˜ŠŠ ŠŠ‘Š—ŠÆŠ¢ŠžŠ’Š Ā«Š•Ń„ŠµŠŗтŠøŠ²Š½Šµ ŠæŠ»Š°Š½ŃƒŠ²Š°Š½Š½Ń тŠµŃŃ‚ŃƒŠ²Š°Š½Š½Ń ŠŗŠ»ŃŽŃ‡Š¾Š²Ń– Š°ŃŠæŠµŠŗтŠø тŠ° ŠæрŠ°Šŗт...
QADay
Ā 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
Ā 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
Ā 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
Ā 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
Ā 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
Ā 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
Ā 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
Ā 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
Ā 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilotā„¢
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilotā„¢Le nuove frontiere dell'AI nell'RPA con UiPath Autopilotā„¢
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilotā„¢
UiPathCommunity
Ā 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
Ā 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
Ā 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
Ā 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
Ā 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
Ā 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
Ā 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
Ā 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
Ā 

Recently uploaded (20)

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Ā 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Ā 
ŠšŠŠ¢Š•Š Š˜ŠŠ ŠŠ‘Š—ŠÆŠ¢ŠžŠ’Š Ā«Š•Ń„ŠµŠŗтŠøŠ²Š½Šµ ŠæŠ»Š°Š½ŃƒŠ²Š°Š½Š½Ń тŠµŃŃ‚ŃƒŠ²Š°Š½Š½Ń ŠŗŠ»ŃŽŃ‡Š¾Š²Ń– Š°ŃŠæŠµŠŗтŠø тŠ° ŠæрŠ°Šŗт...
ŠšŠŠ¢Š•Š Š˜ŠŠ ŠŠ‘Š—ŠÆŠ¢ŠžŠ’Š  Ā«Š•Ń„ŠµŠŗтŠøŠ²Š½Šµ ŠæŠ»Š°Š½ŃƒŠ²Š°Š½Š½Ń тŠµŃŃ‚ŃƒŠ²Š°Š½Š½Ń  ŠŗŠ»ŃŽŃ‡Š¾Š²Ń– Š°ŃŠæŠµŠŗтŠø тŠ° ŠæрŠ°Šŗт...ŠšŠŠ¢Š•Š Š˜ŠŠ ŠŠ‘Š—ŠÆŠ¢ŠžŠ’Š  Ā«Š•Ń„ŠµŠŗтŠøŠ²Š½Šµ ŠæŠ»Š°Š½ŃƒŠ²Š°Š½Š½Ń тŠµŃŃ‚ŃƒŠ²Š°Š½Š½Ń  ŠŗŠ»ŃŽŃ‡Š¾Š²Ń– Š°ŃŠæŠµŠŗтŠø тŠ° ŠæрŠ°Šŗт...
ŠšŠŠ¢Š•Š Š˜ŠŠ ŠŠ‘Š—ŠÆŠ¢ŠžŠ’Š Ā«Š•Ń„ŠµŠŗтŠøŠ²Š½Šµ ŠæŠ»Š°Š½ŃƒŠ²Š°Š½Š½Ń тŠµŃŃ‚ŃƒŠ²Š°Š½Š½Ń ŠŗŠ»ŃŽŃ‡Š¾Š²Ń– Š°ŃŠæŠµŠŗтŠø тŠ° ŠæрŠ°Šŗт...
Ā 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Ā 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
Ā 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Ā 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Ā 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
Ā 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Ā 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Ā 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Ā 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilotā„¢
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilotā„¢Le nuove frontiere dell'AI nell'RPA con UiPath Autopilotā„¢
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilotā„¢
Ā 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Ā 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Ā 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Ā 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
Ā 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Ā 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Ā 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
Ā 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Ā 

If you have your own Columnar format, stop now and use Parquet šŸ˜›

  • 1. Ā© 2015 Dremio Corporation If you have your own Columnar format, stop now and use Parquet šŸ˜› Julien Le Dem Principal Architect, Dremio VP Apache Parquet
  • 2. Ā© 2015 Dremio Corporation About Dremio Jacques Nadeau Founder & CTO ā€¢Apache Drill PMC Chair ā€¢Recognized SQL & NoSQL expert ā€¢Quigo (AOL); Offermatica (ADBE); aQuantive (MSFT) Tomer Shiran Founder & CEO ā€¢Apache Drill Founder ā€¢MapR (VP Product); Microsoft; IBM Research ā€¢Carnegie Mellon, Technion Julien Le Dem Architect ā€¢Apache Parquet Founder ā€¢Apache Pig PMC Member ā€¢Twitter (Lead, Analytics Data Pipeline); Yahoo! (Architect) Top Silicon Valley VCsā€¢ Enabling self-service data discovery, exploration and analysis on modern data ā€¢ Founded in June 2015 ā€¢ Building on open source technologies including Drill, Parquet, Spark
  • 3. Ā© 2015 Dremio Corporation Background of Parquet ā€¢ Twitterā€™s data ā€¢ Lots of data: Instrumentation, User graph, Derived data, ... ā€¢ Complex: deeply nested structures ā€¢ Analytics infrastructure: ā€¢ Several 1000s nodes Hadoop clusters ā€¢ Log collection to HDFS in Thrift ā€¢ Parquet ā€¢ Columnar: space and query efficient ā€¢ Inspired from the Google Dremel Paper ā€¢ supports complex data ā€¢ interoperable Caillebotte: The Parquet Planers
  • 4. Ā© 2015 Dremio Corporation Parquet timeline ā€¢ Fall 2012: Twitter & Clouderaā€™s Impala team merge efforts to develop columnar formats. ā€¢ March 2013: OSS announcement; Criteo signs on for Hive integration. ā€¢ July 2013: 1.0 release. 18 contributors from more than 5 organizations. ā€¢ August 2013: Drill chose Parquet as its primary storage format. ā€¢ May 2014: Apache Incubator. 40+ contributors, 18 with 1000+ LOC. 26 incremental releases. ā€¢ Apr 2015: Parquet graduates from the Apache Incubator.
  • 5. Ā© 2015 Dremio Corporation What does Parquet do? Interoperability Space efļ¬ciency Query efļ¬ciency @EmrgencyKittens
  • 6. Ā© 2015 Dremio Corporation Columnar storage Logical table representation Row layout Column layout Nested schema a b c a b c a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a1 b1 c1a2 b2 c2a3 b3 c3a4 b4 c4a5 b5 c5 encoded chunk encoded chunk encoded chunk On Disk: Encodings: Dictionary, RLE, Delta, Preļ¬x
  • 7. Ā© 2015 Dremio Corporation Nested representation Document DocId Links Name Backward Forward Language Url Code Country Columns: docid links.backward links.forward name.language.code name.language.country name.url Schema: Borrowed from the Google Dremel paper https://blog.twitter.com/2013/dremel-made-simple-with-parquet
  • 8. Ā© 2015 Dremio Corporation Statistics Vertical partitioning (projection push down) Horizontal partitioning (predicate push down) Read only the data you need! + = a b c a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a b c a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a b c a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 + =
  • 9. Ā© 2015 Dremio Corporation Interoperability Library level integration Query engine integration Avro Thrift Protocol Buffer Pig Tuple Hive SerDe Assembly/striping (model agnostic) Parquet ļ¬le format (language agnostic) Object model parquet-avroConverters parquet-thrift parquet-proto parquet-pig parquet-hive Column encodings Impala ... ... Encodings (C) PrestoDrill ā€¦
  • 10. Ā© 2015 Dremio Corporation Query engines, frameworks and libraries integrated with Parquet (non exhaustive) Query engines: Hive, Impala, HAWQ, IBM Big SQL, Drill, Tajo, Pig, Presto, SparkSQL Frameworks: Spark, MapReduce, Cascading, Crunch, Scalding, Kite Data Models: Avro, Thrift, ProtocolBuffers, POJOs
  • 11. Ā© 2015 Dremio Corporation Loose coupling ā€¢ Users donā€™t want to load their data into every tool. ā€¢ Many tools are available and show up every day. ā€¢ The cost of trying a new tool should be minimal Storage (HDFS/S3/ā€¦) Interactive queries (Drill, Impala, Presto, ā€¦) automated dashboard machine learning Query-efļ¬cient format Parquet Graph Processing (Giraph, ā€¦) Batch computation (Pig, Cascading, Scalding, Spark, ā€¦)
  • 12. Ā© 2015 Dremio Corporation Get involved Twitter: - @ApacheParquet Mailing list: - dev@parquet.apache.org Github repo: - https://github.com/apache/parquet-mr Parquet sync ups: - Regular meetings on google hangout