SlideShare une entreprise Scribd logo
1  sur  24
Apache Tez : Accelerating Hadoop
Query Processing
Bikas Saha
@bikassaha

© Hortonworks Inc. 2013

Page 1
Tez – Introduction
• Distributed execution framework
targeted towards data-processing
applications.
• Based on expressing a computation
as a dataflow graph.
• Highly customizable to meet a
broad spectrum of use cases.
• Built on top of YARN – the resource
management framework for
Hadoop.
• Open source Apache incubator
project and Apache licensed.
© Hortonworks Inc. 2013

Page 2
Tez – Design Themes
• Empowering End Users
• Execution Performance

© Hortonworks Inc. 2013

Page 3
Tez – Empowering End Users
• Expressive dataflow definition API’s
• Flexible Input-Processor-Output runtime model
• Data type agnostic
• Simplifying deployment

© Hortonworks Inc. 2013

Page 4
Tez – Empowering End Users
• Expressive dataflow definition API’s
Task-1

Task-2

Preprocessor Stage

Task-1

Task-2

Partition Stage

Samples

Sampler
Ranges

Distributed Sort

Task-1

© Hortonworks Inc. 2013

Task-2

Aggregate Stage

Page 5
Tez – Empowering End Users
• Flexible Input-Processor-Output runtime model
– Construct physical runtime executors dynamically by connecting
different inputs, processors and outputs.
– End goal is to have a library of inputs, outputs and processors that can
be programmatically composed to generate useful tasks.

HDFSInput

ShuffleInput

MapProcessor

ReduceProcessor

JoinProcessor

FileSortedOutput

HDFSOutput

FileSortedOutput

Mapper

FinalReduce

IntermediateJoiner

© Hortonworks Inc. 2013

Input1

Input2

Page 6
Tez – Empowering End Users
• Data type agnostic
– Tez is only concerned with the movement of data. Files and streams of
bytes.
– Clean separation between logical application layer and physical
framework layer. Design important to be a platform for a variety of
applications.

Tez Task

File

User Code
Key Value

Bytes

Bytes
Tuples

Stream

© Hortonworks Inc. 2013

Page 7
Tez – Empowering End Users
• Simplifying deployment
– Tez is a completely client side application.
– No deployments to do. Simply upload to any accessible FileSystem and
change local Tez configuration to point to that.
– Enables running different versions concurrently. Easy to test new
functionality while keeping stable versions for production.
– Leverages YARN local resources.
HDFS
Tez Lib 1

Tez Lib 2

TezClient

TezTask

TezTask

TezClient

Client
Machine

Node
Manager

Node
Manager

Client
Machine

© Hortonworks Inc. 2013

Page 8
Tez – Empowering End Users
• Expressive dataflow definition API’s
• Flexible Input-Processor-Output runtime model
• Data type agnostic
• Simplifying usage
With great power API’s come great responsibilities 
Tez is a framework on which end user applications can be built

© Hortonworks Inc. 2013

Page 9
Tez – Execution Performance
• Performance gains over Map Reduce
• Optimal resource management
• Plan reconfiguration at runtime
• Dynamic physical data flow decisions

© Hortonworks Inc. 2013

Page 10
Tez – Execution Performance
• Performance gains over Map Reduce
– Eliminate replicated write barrier between successive computations.
– Eliminate job launch overhead of workflow jobs.
– Eliminate extra stage of map reads in every workflow job.
– Eliminate queue and resource contention suffered by workflow jobs that
are started after a predecessor job completes.

Pig/Hive - Tez

Pig/Hive - MR

© Hortonworks Inc. 2013

Page 11
Tez – Execution Performance
• Plan reconfiguration at runtime
– Dynamic runtime concurrency control based on data size, user operator
resources, available cluster resources and locality.
– Advanced changes in dataflow graph structure.
– Progressive graph construction in concert with user optimizer.

HDFS
Blocks
Stage 1
50 maps
100
partitions

Stage 2
100
reducers

Stage 1
50 maps
100
partitions

Only 10GB’s
of data

Stage 2
100 10
reducers

YARN
Resources

© Hortonworks Inc. 2013

Page 12
Tez – Execution Performance
• Optimal resource management
– Reuse YARN containers to launch new tasks.
– Reuse YARN containers to enable shared objects across tasks.
– TezSession to encapsulate all this for the user

Start Task

Tez
Application Master

Task Done

Start Task

YARN Container

TezTask1

TezTask2

Shared Objects

TezTask Host

YARN Container

© Hortonworks Inc. 2013

Page 13
Tez – Execution Performance
• Dynamic physical data flow decisions
– Decide the type of physical byte movement and storage on the fly.
– Store intermediate data on distributed store, local store or in-memory.
– Transfer bytes via blocking files or streaming and the spectrum in
between.
Producer
(small size)

Producer

Local File

At Runtime

In-Memory

Consumer

Consumer

© Hortonworks Inc. 2013

Page 14
Tez – Automatic Reduce Parallelism
Event Model
Map tasks send
data statistics
events to the
Reduce Vertex
Manager.

Vertex Manager

Map Vertex

Vertex Manager
Pluggable user logic
that understands the
data statistics and
can formulate the
correct parallelism.
Advises vertex
controller on
parallelism

Vertex State
Machine

App Master

Reduce Vertex
Cancel Task

© Hortonworks Inc. 2013

Page 15
Tez – Automatic Reduce Parallelism
Event Model
Map tasks send
data statistics
events to the
Reduce Vertex
Manager.

Data Size Statistics

Vertex Manager

Map Vertex

Vertex Manager
Pluggable user logic
that understands the
data statistics and
can formulate the
correct parallelism.
Advises vertex
controller on
parallelism

Vertex State
Machine

App Master

Reduce Vertex
Cancel Task

© Hortonworks Inc. 2013

Page 16
Tez – Automatic Reduce Parallelism
Event Model
Map tasks send
data statistics
events to the
Reduce Vertex
Manager.
Vertex Manager
Pluggable user logic
that understands the
data statistics and
can formulate the
correct parallelism.
Advises vertex
controller on
parallelism

Data Size Statistics

Vertex Manager

Map Vertex

Set Parallelism
Re-Route

Vertex State
Machine

App Master

Reduce Vertex
Cancel Task

© Hortonworks Inc. 2013

Page 17
Tez – Now and Next

© Hortonworks Inc. 2013

Page 18
Tez – Bridge the Data Spectrum
Fact Table

Dimension
Table 1

Dimension
Table 1

Fact Table

Broadcast
Join

Result
Table 1

Dimension
Table 2

Broadcast join
for small data sets

Dimension
Table 1
Dimension
Table 1

Broadcast
Join

Result
Table 2

Dimension
Table 3
Shuffle
Join

Typical pattern in a
TPC-DS query

Result
Table 3

© Hortonworks Inc. 2013

Based on data size,
the query optimizer
can run either plan
as a single Tez job

Page 19
Tez – Current status
• Apache Incubator Project
– Rapid development. Over 800 jiras opened. Over 600 resolved.
– Growing community of contributors and users

• Focus on stability
– Testing and quality are highest priority.
– Code ready and deployed on multi-node environments.

• Support for a vast topology of DAGs
– Already functionally equivalent to Map Reduce. Existing Map Reduce
jobs can be executed on Tez with few or no changes.
– Hive retargeted to use Tez for execution of queries (HIVE-4660).
– Pig to use Tez for execution of scripts (PIG-3446).

© Hortonworks Inc. 2013

Page 20
Tez – Roadmap
• Richer DAG support
– Support for co-scheduling
– Efficient iterations

• Performance optimizations
– More efficiencies in transfer of data
– Improve session performance

• Usability.
– Stability and testability
– Recovery and history
– Tools for performance analysis and debugging

© Hortonworks Inc. 2013

Page 21
Tez – Community
• Early adopters and code contributors welcome
– Adopters to drive more scenarios. Contributors to make them happen.
– Hive and Pig communities are on-board and making great progress - HIVE-4660
and PIG-3446

• Tez meetup for developers and users
– http://www.meetup.com/Apache-Tez-User-Group

• Technical blog series
– http://hortonworks.com/blog/apache-tez-a-new-chapter-in-hadoop-dataprocessing/ (will soon be available on the Apache Wiki)

• Useful links
– Work tracking: https://issues.apache.org/jira/browse/TEZ
– Code: https://github.com/apache/incubator-tez
– Developer list: dev@tez.incubator.apache.org
User list: user@tez.incubator.apache.org
Issues list: issues@tez.incubator.apache.org
© Hortonworks Inc. 2013

Page 22
Tez – Takeaways
• Distributed execution framework that works on computations
represented as dataflow graphs
• Naturally maps to execution plans produced by query
optimizers
• Customizable execution architecture designed to enable
dynamic performance optimizations at runtime
• Works out of the box with the platform figuring out the hard
stuff
• Span the spectrum of interactive latency to batch
• Open source Apache project – your use-cases and code are
welcome
• It works and is already being used by Hive and Pig
© Hortonworks Inc. 2013

Page 23
Tez
Thanks for your time and attention!
Deep dive on Tez video at
http://www.infoq.com/presentations/apache-tez
Questions?
@bikassaha

© Hortonworks Inc. 2013

Page 24

Contenu connexe

Tendances

Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, HortonworksHortonworks
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep DiveHortonworks
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with YarnDavid Kaiser
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processinghitesh1892
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache TezGetInData
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataDataWorks Summit
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataDataWorks Summit
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over YarnInMobi Technology
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive TuningAdam Muise
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR Technologies
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to HamsterThe Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to HamsterMilind Bhandarkar
 

Tendances (20)

Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache Tez
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big Data
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big Data
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over Yarn
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to HamsterThe Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 

Similaire à February 2014 HUG : Tez Details and Insides

Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingTeddy Choi
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaData Con LA
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing enginebigdatagurus_meetup
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Modern Data Stack France
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureJianfeng Zhang
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureRajesh Balamohan
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Caserta
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stingerGunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stingerhdhappy001
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitData Con LA
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoopRommel Garcia
 

Similaire à February 2014 HUG : Tez Details and Insides (20)

Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stingerGunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 

Plus de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Plus de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Dernier

GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 

Dernier (20)

LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 

February 2014 HUG : Tez Details and Insides

  • 1. Apache Tez : Accelerating Hadoop Query Processing Bikas Saha @bikassaha © Hortonworks Inc. 2013 Page 1
  • 2. Tez – Introduction • Distributed execution framework targeted towards data-processing applications. • Based on expressing a computation as a dataflow graph. • Highly customizable to meet a broad spectrum of use cases. • Built on top of YARN – the resource management framework for Hadoop. • Open source Apache incubator project and Apache licensed. © Hortonworks Inc. 2013 Page 2
  • 3. Tez – Design Themes • Empowering End Users • Execution Performance © Hortonworks Inc. 2013 Page 3
  • 4. Tez – Empowering End Users • Expressive dataflow definition API’s • Flexible Input-Processor-Output runtime model • Data type agnostic • Simplifying deployment © Hortonworks Inc. 2013 Page 4
  • 5. Tez – Empowering End Users • Expressive dataflow definition API’s Task-1 Task-2 Preprocessor Stage Task-1 Task-2 Partition Stage Samples Sampler Ranges Distributed Sort Task-1 © Hortonworks Inc. 2013 Task-2 Aggregate Stage Page 5
  • 6. Tez – Empowering End Users • Flexible Input-Processor-Output runtime model – Construct physical runtime executors dynamically by connecting different inputs, processors and outputs. – End goal is to have a library of inputs, outputs and processors that can be programmatically composed to generate useful tasks. HDFSInput ShuffleInput MapProcessor ReduceProcessor JoinProcessor FileSortedOutput HDFSOutput FileSortedOutput Mapper FinalReduce IntermediateJoiner © Hortonworks Inc. 2013 Input1 Input2 Page 6
  • 7. Tez – Empowering End Users • Data type agnostic – Tez is only concerned with the movement of data. Files and streams of bytes. – Clean separation between logical application layer and physical framework layer. Design important to be a platform for a variety of applications. Tez Task File User Code Key Value Bytes Bytes Tuples Stream © Hortonworks Inc. 2013 Page 7
  • 8. Tez – Empowering End Users • Simplifying deployment – Tez is a completely client side application. – No deployments to do. Simply upload to any accessible FileSystem and change local Tez configuration to point to that. – Enables running different versions concurrently. Easy to test new functionality while keeping stable versions for production. – Leverages YARN local resources. HDFS Tez Lib 1 Tez Lib 2 TezClient TezTask TezTask TezClient Client Machine Node Manager Node Manager Client Machine © Hortonworks Inc. 2013 Page 8
  • 9. Tez – Empowering End Users • Expressive dataflow definition API’s • Flexible Input-Processor-Output runtime model • Data type agnostic • Simplifying usage With great power API’s come great responsibilities  Tez is a framework on which end user applications can be built © Hortonworks Inc. 2013 Page 9
  • 10. Tez – Execution Performance • Performance gains over Map Reduce • Optimal resource management • Plan reconfiguration at runtime • Dynamic physical data flow decisions © Hortonworks Inc. 2013 Page 10
  • 11. Tez – Execution Performance • Performance gains over Map Reduce – Eliminate replicated write barrier between successive computations. – Eliminate job launch overhead of workflow jobs. – Eliminate extra stage of map reads in every workflow job. – Eliminate queue and resource contention suffered by workflow jobs that are started after a predecessor job completes. Pig/Hive - Tez Pig/Hive - MR © Hortonworks Inc. 2013 Page 11
  • 12. Tez – Execution Performance • Plan reconfiguration at runtime – Dynamic runtime concurrency control based on data size, user operator resources, available cluster resources and locality. – Advanced changes in dataflow graph structure. – Progressive graph construction in concert with user optimizer. HDFS Blocks Stage 1 50 maps 100 partitions Stage 2 100 reducers Stage 1 50 maps 100 partitions Only 10GB’s of data Stage 2 100 10 reducers YARN Resources © Hortonworks Inc. 2013 Page 12
  • 13. Tez – Execution Performance • Optimal resource management – Reuse YARN containers to launch new tasks. – Reuse YARN containers to enable shared objects across tasks. – TezSession to encapsulate all this for the user Start Task Tez Application Master Task Done Start Task YARN Container TezTask1 TezTask2 Shared Objects TezTask Host YARN Container © Hortonworks Inc. 2013 Page 13
  • 14. Tez – Execution Performance • Dynamic physical data flow decisions – Decide the type of physical byte movement and storage on the fly. – Store intermediate data on distributed store, local store or in-memory. – Transfer bytes via blocking files or streaming and the spectrum in between. Producer (small size) Producer Local File At Runtime In-Memory Consumer Consumer © Hortonworks Inc. 2013 Page 14
  • 15. Tez – Automatic Reduce Parallelism Event Model Map tasks send data statistics events to the Reduce Vertex Manager. Vertex Manager Map Vertex Vertex Manager Pluggable user logic that understands the data statistics and can formulate the correct parallelism. Advises vertex controller on parallelism Vertex State Machine App Master Reduce Vertex Cancel Task © Hortonworks Inc. 2013 Page 15
  • 16. Tez – Automatic Reduce Parallelism Event Model Map tasks send data statistics events to the Reduce Vertex Manager. Data Size Statistics Vertex Manager Map Vertex Vertex Manager Pluggable user logic that understands the data statistics and can formulate the correct parallelism. Advises vertex controller on parallelism Vertex State Machine App Master Reduce Vertex Cancel Task © Hortonworks Inc. 2013 Page 16
  • 17. Tez – Automatic Reduce Parallelism Event Model Map tasks send data statistics events to the Reduce Vertex Manager. Vertex Manager Pluggable user logic that understands the data statistics and can formulate the correct parallelism. Advises vertex controller on parallelism Data Size Statistics Vertex Manager Map Vertex Set Parallelism Re-Route Vertex State Machine App Master Reduce Vertex Cancel Task © Hortonworks Inc. 2013 Page 17
  • 18. Tez – Now and Next © Hortonworks Inc. 2013 Page 18
  • 19. Tez – Bridge the Data Spectrum Fact Table Dimension Table 1 Dimension Table 1 Fact Table Broadcast Join Result Table 1 Dimension Table 2 Broadcast join for small data sets Dimension Table 1 Dimension Table 1 Broadcast Join Result Table 2 Dimension Table 3 Shuffle Join Typical pattern in a TPC-DS query Result Table 3 © Hortonworks Inc. 2013 Based on data size, the query optimizer can run either plan as a single Tez job Page 19
  • 20. Tez – Current status • Apache Incubator Project – Rapid development. Over 800 jiras opened. Over 600 resolved. – Growing community of contributors and users • Focus on stability – Testing and quality are highest priority. – Code ready and deployed on multi-node environments. • Support for a vast topology of DAGs – Already functionally equivalent to Map Reduce. Existing Map Reduce jobs can be executed on Tez with few or no changes. – Hive retargeted to use Tez for execution of queries (HIVE-4660). – Pig to use Tez for execution of scripts (PIG-3446). © Hortonworks Inc. 2013 Page 20
  • 21. Tez – Roadmap • Richer DAG support – Support for co-scheduling – Efficient iterations • Performance optimizations – More efficiencies in transfer of data – Improve session performance • Usability. – Stability and testability – Recovery and history – Tools for performance analysis and debugging © Hortonworks Inc. 2013 Page 21
  • 22. Tez – Community • Early adopters and code contributors welcome – Adopters to drive more scenarios. Contributors to make them happen. – Hive and Pig communities are on-board and making great progress - HIVE-4660 and PIG-3446 • Tez meetup for developers and users – http://www.meetup.com/Apache-Tez-User-Group • Technical blog series – http://hortonworks.com/blog/apache-tez-a-new-chapter-in-hadoop-dataprocessing/ (will soon be available on the Apache Wiki) • Useful links – Work tracking: https://issues.apache.org/jira/browse/TEZ – Code: https://github.com/apache/incubator-tez – Developer list: dev@tez.incubator.apache.org User list: user@tez.incubator.apache.org Issues list: issues@tez.incubator.apache.org © Hortonworks Inc. 2013 Page 22
  • 23. Tez – Takeaways • Distributed execution framework that works on computations represented as dataflow graphs • Naturally maps to execution plans produced by query optimizers • Customizable execution architecture designed to enable dynamic performance optimizations at runtime • Works out of the box with the platform figuring out the hard stuff • Span the spectrum of interactive latency to batch • Open source Apache project – your use-cases and code are welcome • It works and is already being used by Hive and Pig © Hortonworks Inc. 2013 Page 23
  • 24. Tez Thanks for your time and attention! Deep dive on Tez video at http://www.infoq.com/presentations/apache-tez Questions? @bikassaha © Hortonworks Inc. 2013 Page 24