SlideShare a Scribd company logo
1 of 38
Download to read offline
Hadoop 3.0
Revolution or evolution?
uweprintz
/whoami
&
/disclaimer
Some Hadoop history
Hadoop 2
HDFS
Redundant, reliable storage
MapReduce
Data processing
YARN
Cluster resource management
Hive
SQL
Spark
In-Memory
…
Oct. 2013
Let there be YARN Apps!
Era of Enterprise Hadoop
2006
Hadoop 1
HDFS
Redundant, reliable
storage
MapReduce
Cluster resource mgmt. +
data processing
Let there be batch!
Era of Silicon Valley Hadoop
Hadoop 3
?
IoT
Machine
Learning GPU’s
TensorFlow
Data
Science
Streaming
Data
Cloud
FPGA’s
Artificial
Intelligence
Kafka
Late 2017
Let there be …?
Era of ?
Why Hadoop 3.0?
• Deprecated APIs can only be removed in major
release
• Wire-compatibility will be broken
• Change of default ports
• Hadoop 2.x Client —||—> Hadoop 3.x Server (and vice versa)
• Hadoop command scripts rewrite
• Big features that need stabilizing major release
What is Hadoop 3.0?
20142010 2011 201320122009 2015
2.2.02.0.0-alpha
branch-1
(branch-0.20)
1.0.0 1.1.0 1.2.1 (Stable)0.20.1 0.20.205
0.21.0
New append
0.23.0
branch-2
HDFS Snapshots
NFSv3 support
HDFS ACLs
HDFS Rolling Upgrades
RM Automatic Failover
2.6.0
YARN Rolling Upgrades
Transparent Encryption
Archival Storage
2.7.0
Hadoop 2
Drop JDK6 Support
File Truncate API
2016
branch-0.23
Hadoop 3
Hadoop 2 and 3 were
diverged 5+ years ago
Hadoop 1 (EOL)
Source: Akira Ajisaka
(with additions by Uwe Printz)
2017
0.22.0
0.23.11 (Final)
Security
trunk
2.3.0 2.5.0
2.4.0
NameNode Federation , YARN
NameNode HA
Heterogeneous storage
HDFS In-Memory Caching
2.8.0
3.0.0-alpha1
3.0.0-alpha2
2.1.0-beta
HDFS Extended
attributes
Docker Container in Linux
ATS 1.5
Hadoop 3.0 in a nutshell
• HDFS
• Erasure codes
• Low-Level Performance enhancements with Intel ISA-L
• 2+ NameNodes
• Intra-DataNode Balancer
• YARN
• Better support for long-running services
• Improved isolation & Docker support
• Scheduler enhancements
• Application Timeline Service v2
• New UI
• MapReduce
• Task-level native optimization
• Derive heap-size automatically
• DevOps
• Drop JDK7 & Move to JDK8
• Change of default ports
• Library & Dependency Upgrade
• Client-side classpath Isolation
• Shell Script Rewrite & ShellDoc
• .hadooprc & .hadoop-env
• Metrics2 Sink plugin for Kafka
HDFS
HDFS - Current implementation
• 3 replicas by default
• Tolerate maximum of 2 failures
Write request
Lease for file
Split into blocks
Request for
data nodes
List of
data nodes
HDFS Client
NameNode
DataNode 1 DataNode 2 DataNode 3
Write block +
checksum
• Simple, scalable & robust
• 200% space overhead
Write
Pipeline
Write
Pipeline
Calculate
checksum
ACKACK
ACK
Complete!
Erasure Coding (EC)
• k data blocks + m parity blocks
• Example: Reed-Solomon (6,3)
d d d d d d
Raw
Data
Splitting
d d d d d d
d d d d d d
d d d d d d
p p p
p p p
p p p
p p p
Encoding
Store data and parity
• Key Points
• XOR Coding —> Saves space, slower recovery
• Missing or corrupt data will be restored from available data and parity
• Parity can be smaller than data
EC - Main characteristics
Replication
(Factor 1)
Replication
(Factor 3)
Reed-Solomon
(6,3)
Reed-Solomon
(10,4)
Maximum fault tolerance 0 2 3 4
Space Efficiency 100 % 33 % 67 % 71 %
Data Locality Yes No (Phase 1) / Yes (Phase 2)
Write performance High Low
Read performance High Medium
Recovery costs Low High
Pluggable implementation,
first choice
Storage Tier Hot Warm Cold Frozen
Memory/SSD Disk Dense Disk EC
20 x Day 5 x Week 5 x Month 2 x Year
EC - Contiguous blocks
• Approach 1: Retain block size and add parity
File A File B File C
128
MB
128
MB
128
MB
128
MB
128
MB
128
MB
Block 1 Block 2 Block 3 Block 4 Block 5 Block 6
DN 3 DN 2 DN 12 DN 7 DN 5 DN 1
• Pro: Better for locality
• Con: Significant overhead for smaller files, always 3 parity
blocks needed
• Con: Client potentially needs to process GB’s of data for encoding
Parity Parity Parity
DN 6 DN 4 DN 8
Encoding
EC - Striping
• Approach 2: Splitting blocks into smaller cells (1 MB)
File A File B File C
• Pro: Works for small files
• Pro: Allows parallel write
• Con: No data locality -> Increased read latency &
More complicated recovery process
Block 2 Block 3 Block 4 Block 5 Block 6
DN 7 DN 3 DN 4 DN 1 DN 6 DN 12
Stripe 1
Stripe 2
Stripe n
Block 1 Block 4
Round-robin
… … … … … …
Parity
DN 8 DN 9 DN 10
Parity Parity
Encoding
… … …
• Start from striping to deal with smaller files
EC - Apache Hadoop’s decision (HDFS-7285)
Contiguous
Striping
Replication Erasure Coding
HDFS
Facebook f4
Azure
Ceph (before Firefly)
Lustre
Ceph (with Firefly)
QFS
Phase 1.1
HDFS-7285
Phase
1.2
HDFS-8031
Phase 3
(Future Work)
Phase 2
HDFS-8030
Hadoop 3.0.x implements Phase 1.1 and 1.2
EC - Shell Command
• Create a EC Zone on an empty directory
• All the files under a zone directory are automatically erasure coded
• Rename across zones with different EC schemas are disallowed
Usage: hdfs erasurecode [generic options]
[-getPolicy <path>]
[-help [cmd ...]]
[-listPolicies]
[-setPolicy [-p <policyName>] <path>]
-getPolicy <path> :
Get erasure coding policy information about at specified path
-listPolicies :
Get the list of erasure coding policies supported
-setPolicy [-p <policyName>] <path> :
Set a specified erasure coding policy to a directory
Options :
-p <policyName> erasure coding policy name to encode files. If not passed the
default policy will be used
<path> Path to a directory. Under this directory files will be
encoded using specified erasure coding policy
EC - Write Path
• Parallel write
• Client writes to 9 data nodes at the same time
• Calculate parity at client, at write time
• Durability
• Solomon-Reed(6,3) can tolerate max. 3 failures
• Visibility
• Read is supported for files being written
• Consistency
• Client can start reading from any 6 of the 9 data nodes
• Appendable
• Files can be reopened for appending data
HDFS Client
DataNode 1
…
…
DataNode 6
DataNode 7
DataNode 8
DataNode 9
1MB
Data
1MB
Data
Parity
Parity
Parity
ACK
ACK
ACK
ACK
ACK
EC - Write Failure Handling
• Data node failure
• Client ignores the failed data node and
continues writing
• Write path is able to tolerate 3 data node failures
• Requires at least 6 data nodes
• Missing blocks will be constructed later
HDFS Client
DataNode 1
…
…
DataNode 6
DataNode 7
DataNode 8
DataNode 9
1MB
Data
1MB
Data
Parity
Parity
Parity
ACK
ACK
ACK
ACK
ACK
EC - Read Path
• Read data from 6 data nodes
in parallel
HDFS Client
DataNode 1
…
…
DataNode 6
DataNode 7
DataNode 8
DataNode 9
1MB
Data
1MB
Data
Block
EC - Read Failure Handling
• Read data from 6 arbitrary
data nodes in parallel
• Read parity block to reconstruct missing
data block
HDFS Client
DataNode 1
…
…
DataNode 6
DataNode 7
DataNode 8
DataNode 9
1MB
Data
Block
Parity
Parity
reconstructs
EC - Network behavior
• Pro’s
• Low latency because of parallel read & write
• Good for small file sizes
• Con’s
• Requires high network bandwidth between client & server
• Dead data nodes result in high network traffic and reconstruction
time
EC - Coder implementation
• Legacy coder
• From Facebook’s HDFS-RAID project
• [Umbrella] HADOOP-11264
• Pure Java coder
• Code improvements over HDFS-RAID
• HADOOP-11542
• Intel ISA-L coder
• Native coder with Intel’s Intelligent Storage Acceleration Library
• Accelerates EC-related linear algebra calculations by exploiting advanced hardware
instruction sets like SSE, AVX, and AVX2
• HADOOP-11540
EC - Coder performance I
EC - Coder performance II
EC - Coder performance III
• Hadoop 1
• No built-in High Availability
• Needed to solve yourself via e.g. VMware
2+ Name Nodes (HDFS-6440)
• Hadoop 2
• High Availability out-of-the-box via Active-Passive Pattern
• Needed to recover immediately after failure NameNode
Active
NameNode
Standby
• Hadoop 3
• 1 Active NameNode with N Standby NameNodes
• Trade-off between operation costs vs. hardware costs
NameNode
Active
NameNode
Standby
NameNode
Standby
Intra-DataNode Balancer (HDFS-1312)
• Hadoop already has a Balancer between
DataNodes
• Needs to be called manually by design
• Typically used after adding additional worker nodes
• The Disk Balancer lets administrators rebalance
data across multiple disks of a DataNode
• It is useful to correct skewed data distribution often seen after adding or
replacing disks
• Adds hdfs diskbalancer that will submit a plan but does not wait for the
plan to finish executing and the DataNode will do the moves itself
YARN
YARN - Scheduling enhancements
• Application priorities within a queue (YARN-1963)
• For example, in queue Marketing Hive jobs > MapReduce jobs
• Inter-Queue priorities (YARN-4945)
• Queue 1 > Queue 2, irrespective of demand & capacity
• Previously based only on unconsumed capacity
• Affinity / Anti-Affinity (YARN-1042)
• More fine-granular restraints on locations, e.g. do not allocate HBase Region servers and Storm workers on the same
host
• Global Scheduling (YARN-5139)
• Currently YARN scheduling is done one-node-at-a-time at arrival of heart beats and can lead to suboptimal decisions
• With global scheduling, YARN scheduler looks at more nodes and selects the best nodes based on application
requirements which leads to a globally optimal placement and enhanced container scheduling throughput
• Gang Scheduling (YARN-624)
• Allow allocation of sets of containers, e.g. 1 container with 128GB of RAM and 16 cores OR 100 containers with 2GB of
RAM and 1 core
• Can be achieved already by holding on to containers but might lead to deadlocks and decreased cluster utilization
YARN - Built-in support for long-running services
• Simplified and first-class support for services (YARN-4692)
• Abstract common framework to support long running service (similar to Apache Slider)
• More simplified API for managing the service lifecycle of YARN Apps
• Better support for long running service
• Recognition of long running service (YARN-4725)
• Auto-restart of containers
• Containers for long running service are retried at same node in case of local state
• Service/Application upgrade support (YARN-4726)
• Hold on to containers during an upgrade of the YARN App
• Dynamic container resizing (YARN-1197)
• Only ask for minimum resources at start and rather adjust them at runtime
• Currently the only way is releasing containers and allocating new containers with the
expected size
YARN - Resource Isolation & Docker
• Better Resource Isolation
• Support for disk isolation (YARN-2619)
• Support for network isolation (YARN-2140)
• Uses cgroups to give containers their fair share
• Docker support in LinuxContainerExecutor (YARN-3611)
• The LinuxContainerExecutor already provides functionality around localization,
cgroups based resource management and isolation for CPU, network, disk, etc. as
well as security mechanisms
• Support Docker containers to be run inside of LinuxContainerExecutor
• Offers packaging and resource isolation
• Complements YARN’s support for long-running services
YARN - Service Discovery
• Services can run on any YARN node
• Dynamic IP, can change in case of node failures, etc.
• YARN Service Discovery via DNS (YARN-4757)
• The YARN service registry already provides facilities for applications to register their
endpoints and for clients to discover them but they are only available via Java API and REST
• Expose service information via a already available discovery mechanism: DNS
• Current YARN Service Registry records need to be converted into DNS entries
• Discovery of the container IP and service port via standard DNS lookups
• Mapping of Applications, e.g.
zkapp1.griduser.yarncluster.com -> 172.17.0.2
• Mapping of containers, e.g.
container-e3741-1454001598828-0131-01000004.yarncluster.com -> 172.17.0.3
YARN - Use the force!
YARN
MapReduce Tez Spark
YARN
MapReduce Tez Spark
YARN - New UI (YARN-3368)
Application Timeline Service v2 (YARN-2928)
Why?
• Scalability & Performance
• Single global instance of Writer/Reader
• Local disk based LevelDB storage
• Reliability
• Failure handling with local disk
• Single point-of-failure
• Usability
• Add configuration and metrics as first-class
members
• Better support for queries
• Flexibility
• Data model is more describable
Core Concepts
• Distributed write path
• Logical per app collector
• Separate reader instances
• Pluggable backend storage
• HBase
• Enhanced internal data model
• Metrics Aggregation
• Richer REST API for queries
Revolution or evolution?
• Major release, incompatible to Hadoop 2
• Main features are Erasure Coding and better
support for long-running services & Docker
• Good fit for IoT and Deep Learning use cases
Summary
Release time line
• 3.0.0-alpha1 - Sep/3/2016
• Alpha2 - Jan/25/2017
• Alpha3 - Q2 2017 (Estimated)
• Beta/GA - Q3/Q4 2017 (Estimated)
…but it’s not a revolution!
Twitter:
@uweprintz
uwe.seiler@codecentric.de
Mail:
uwe.printz@codecentric.de
Phone
+49 176 1076531
XING:
https://www.xing.com/profile/Uwe_Printz
Thank you!
Slide 1: https://unsplash.com/photos/CIXoFys3gsw
Slide 2: Copyright by Uwe Printz
Slide 7: https://unsplash.com/photos/LHlwgjbSo3k
Slide 34: https://unsplash.com/photos/Cvf1IqUel9w
Slide 36: https://imgflip.com/i/mkovb
Slide 37: Copyright by Uwe Printz
All pictures CC0 or shot by the author

More Related Content

What's hot

Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
DataWorks Summit
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
 

What's hot (20)

Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoop
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 

Similar to Hadoop 3.0 - Revolution or evolution?

Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
saipriyacoool
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
mundlapudi
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
Leons Petražickis
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
Cisco Canada
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Databricks
 

Similar to Hadoop 3.0 - Revolution or evolution? (20)

Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Scalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data SystemsScalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data Systems
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 

More from Uwe Printz

First meetup of the MongoDB User Group Frankfurt
First meetup of the MongoDB User Group FrankfurtFirst meetup of the MongoDB User Group Frankfurt
First meetup of the MongoDB User Group Frankfurt
Uwe Printz
 

More from Uwe Printz (15)

Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Lightning Talk: Agility & Databases
Lightning Talk: Agility & DatabasesLightning Talk: Agility & Databases
Lightning Talk: Agility & Databases
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
MongoDB für Java Programmierer (JUGKA, 11.12.13)
MongoDB für Java Programmierer (JUGKA, 11.12.13)MongoDB für Java Programmierer (JUGKA, 11.12.13)
MongoDB für Java Programmierer (JUGKA, 11.12.13)
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
MongoDB für Java-Programmierer
MongoDB für Java-ProgrammiererMongoDB für Java-Programmierer
MongoDB für Java-Programmierer
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
 
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBMap/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDB
 
First meetup of the MongoDB User Group Frankfurt
First meetup of the MongoDB User Group FrankfurtFirst meetup of the MongoDB User Group Frankfurt
First meetup of the MongoDB User Group Frankfurt
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Hadoop 3.0 - Revolution or evolution?

  • 1. Hadoop 3.0 Revolution or evolution? uweprintz
  • 3. Some Hadoop history Hadoop 2 HDFS Redundant, reliable storage MapReduce Data processing YARN Cluster resource management Hive SQL Spark In-Memory … Oct. 2013 Let there be YARN Apps! Era of Enterprise Hadoop 2006 Hadoop 1 HDFS Redundant, reliable storage MapReduce Cluster resource mgmt. + data processing Let there be batch! Era of Silicon Valley Hadoop Hadoop 3 ? IoT Machine Learning GPU’s TensorFlow Data Science Streaming Data Cloud FPGA’s Artificial Intelligence Kafka Late 2017 Let there be …? Era of ?
  • 4. Why Hadoop 3.0? • Deprecated APIs can only be removed in major release • Wire-compatibility will be broken • Change of default ports • Hadoop 2.x Client —||—> Hadoop 3.x Server (and vice versa) • Hadoop command scripts rewrite • Big features that need stabilizing major release
  • 5. What is Hadoop 3.0? 20142010 2011 201320122009 2015 2.2.02.0.0-alpha branch-1 (branch-0.20) 1.0.0 1.1.0 1.2.1 (Stable)0.20.1 0.20.205 0.21.0 New append 0.23.0 branch-2 HDFS Snapshots NFSv3 support HDFS ACLs HDFS Rolling Upgrades RM Automatic Failover 2.6.0 YARN Rolling Upgrades Transparent Encryption Archival Storage 2.7.0 Hadoop 2 Drop JDK6 Support File Truncate API 2016 branch-0.23 Hadoop 3 Hadoop 2 and 3 were diverged 5+ years ago Hadoop 1 (EOL) Source: Akira Ajisaka (with additions by Uwe Printz) 2017 0.22.0 0.23.11 (Final) Security trunk 2.3.0 2.5.0 2.4.0 NameNode Federation , YARN NameNode HA Heterogeneous storage HDFS In-Memory Caching 2.8.0 3.0.0-alpha1 3.0.0-alpha2 2.1.0-beta HDFS Extended attributes Docker Container in Linux ATS 1.5
  • 6. Hadoop 3.0 in a nutshell • HDFS • Erasure codes • Low-Level Performance enhancements with Intel ISA-L • 2+ NameNodes • Intra-DataNode Balancer • YARN • Better support for long-running services • Improved isolation & Docker support • Scheduler enhancements • Application Timeline Service v2 • New UI • MapReduce • Task-level native optimization • Derive heap-size automatically • DevOps • Drop JDK7 & Move to JDK8 • Change of default ports • Library & Dependency Upgrade • Client-side classpath Isolation • Shell Script Rewrite & ShellDoc • .hadooprc & .hadoop-env • Metrics2 Sink plugin for Kafka
  • 8. HDFS - Current implementation • 3 replicas by default • Tolerate maximum of 2 failures Write request Lease for file Split into blocks Request for data nodes List of data nodes HDFS Client NameNode DataNode 1 DataNode 2 DataNode 3 Write block + checksum • Simple, scalable & robust • 200% space overhead Write Pipeline Write Pipeline Calculate checksum ACKACK ACK Complete!
  • 9. Erasure Coding (EC) • k data blocks + m parity blocks • Example: Reed-Solomon (6,3) d d d d d d Raw Data Splitting d d d d d d d d d d d d d d d d d d p p p p p p p p p p p p Encoding Store data and parity • Key Points • XOR Coding —> Saves space, slower recovery • Missing or corrupt data will be restored from available data and parity • Parity can be smaller than data
  • 10. EC - Main characteristics Replication (Factor 1) Replication (Factor 3) Reed-Solomon (6,3) Reed-Solomon (10,4) Maximum fault tolerance 0 2 3 4 Space Efficiency 100 % 33 % 67 % 71 % Data Locality Yes No (Phase 1) / Yes (Phase 2) Write performance High Low Read performance High Medium Recovery costs Low High Pluggable implementation, first choice Storage Tier Hot Warm Cold Frozen Memory/SSD Disk Dense Disk EC 20 x Day 5 x Week 5 x Month 2 x Year
  • 11. EC - Contiguous blocks • Approach 1: Retain block size and add parity File A File B File C 128 MB 128 MB 128 MB 128 MB 128 MB 128 MB Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 DN 3 DN 2 DN 12 DN 7 DN 5 DN 1 • Pro: Better for locality • Con: Significant overhead for smaller files, always 3 parity blocks needed • Con: Client potentially needs to process GB’s of data for encoding Parity Parity Parity DN 6 DN 4 DN 8 Encoding
  • 12. EC - Striping • Approach 2: Splitting blocks into smaller cells (1 MB) File A File B File C • Pro: Works for small files • Pro: Allows parallel write • Con: No data locality -> Increased read latency & More complicated recovery process Block 2 Block 3 Block 4 Block 5 Block 6 DN 7 DN 3 DN 4 DN 1 DN 6 DN 12 Stripe 1 Stripe 2 Stripe n Block 1 Block 4 Round-robin … … … … … … Parity DN 8 DN 9 DN 10 Parity Parity Encoding … … …
  • 13. • Start from striping to deal with smaller files EC - Apache Hadoop’s decision (HDFS-7285) Contiguous Striping Replication Erasure Coding HDFS Facebook f4 Azure Ceph (before Firefly) Lustre Ceph (with Firefly) QFS Phase 1.1 HDFS-7285 Phase 1.2 HDFS-8031 Phase 3 (Future Work) Phase 2 HDFS-8030 Hadoop 3.0.x implements Phase 1.1 and 1.2
  • 14. EC - Shell Command • Create a EC Zone on an empty directory • All the files under a zone directory are automatically erasure coded • Rename across zones with different EC schemas are disallowed Usage: hdfs erasurecode [generic options] [-getPolicy <path>] [-help [cmd ...]] [-listPolicies] [-setPolicy [-p <policyName>] <path>] -getPolicy <path> : Get erasure coding policy information about at specified path -listPolicies : Get the list of erasure coding policies supported -setPolicy [-p <policyName>] <path> : Set a specified erasure coding policy to a directory Options : -p <policyName> erasure coding policy name to encode files. If not passed the default policy will be used <path> Path to a directory. Under this directory files will be encoded using specified erasure coding policy
  • 15. EC - Write Path • Parallel write • Client writes to 9 data nodes at the same time • Calculate parity at client, at write time • Durability • Solomon-Reed(6,3) can tolerate max. 3 failures • Visibility • Read is supported for files being written • Consistency • Client can start reading from any 6 of the 9 data nodes • Appendable • Files can be reopened for appending data HDFS Client DataNode 1 … … DataNode 6 DataNode 7 DataNode 8 DataNode 9 1MB Data 1MB Data Parity Parity Parity ACK ACK ACK ACK ACK
  • 16. EC - Write Failure Handling • Data node failure • Client ignores the failed data node and continues writing • Write path is able to tolerate 3 data node failures • Requires at least 6 data nodes • Missing blocks will be constructed later HDFS Client DataNode 1 … … DataNode 6 DataNode 7 DataNode 8 DataNode 9 1MB Data 1MB Data Parity Parity Parity ACK ACK ACK ACK ACK
  • 17. EC - Read Path • Read data from 6 data nodes in parallel HDFS Client DataNode 1 … … DataNode 6 DataNode 7 DataNode 8 DataNode 9 1MB Data 1MB Data Block
  • 18. EC - Read Failure Handling • Read data from 6 arbitrary data nodes in parallel • Read parity block to reconstruct missing data block HDFS Client DataNode 1 … … DataNode 6 DataNode 7 DataNode 8 DataNode 9 1MB Data Block Parity Parity reconstructs
  • 19. EC - Network behavior • Pro’s • Low latency because of parallel read & write • Good for small file sizes • Con’s • Requires high network bandwidth between client & server • Dead data nodes result in high network traffic and reconstruction time
  • 20. EC - Coder implementation • Legacy coder • From Facebook’s HDFS-RAID project • [Umbrella] HADOOP-11264 • Pure Java coder • Code improvements over HDFS-RAID • HADOOP-11542 • Intel ISA-L coder • Native coder with Intel’s Intelligent Storage Acceleration Library • Accelerates EC-related linear algebra calculations by exploiting advanced hardware instruction sets like SSE, AVX, and AVX2 • HADOOP-11540
  • 21. EC - Coder performance I
  • 22. EC - Coder performance II
  • 23. EC - Coder performance III
  • 24. • Hadoop 1 • No built-in High Availability • Needed to solve yourself via e.g. VMware 2+ Name Nodes (HDFS-6440) • Hadoop 2 • High Availability out-of-the-box via Active-Passive Pattern • Needed to recover immediately after failure NameNode Active NameNode Standby • Hadoop 3 • 1 Active NameNode with N Standby NameNodes • Trade-off between operation costs vs. hardware costs NameNode Active NameNode Standby NameNode Standby
  • 25. Intra-DataNode Balancer (HDFS-1312) • Hadoop already has a Balancer between DataNodes • Needs to be called manually by design • Typically used after adding additional worker nodes • The Disk Balancer lets administrators rebalance data across multiple disks of a DataNode • It is useful to correct skewed data distribution often seen after adding or replacing disks • Adds hdfs diskbalancer that will submit a plan but does not wait for the plan to finish executing and the DataNode will do the moves itself
  • 26. YARN
  • 27. YARN - Scheduling enhancements • Application priorities within a queue (YARN-1963) • For example, in queue Marketing Hive jobs > MapReduce jobs • Inter-Queue priorities (YARN-4945) • Queue 1 > Queue 2, irrespective of demand & capacity • Previously based only on unconsumed capacity • Affinity / Anti-Affinity (YARN-1042) • More fine-granular restraints on locations, e.g. do not allocate HBase Region servers and Storm workers on the same host • Global Scheduling (YARN-5139) • Currently YARN scheduling is done one-node-at-a-time at arrival of heart beats and can lead to suboptimal decisions • With global scheduling, YARN scheduler looks at more nodes and selects the best nodes based on application requirements which leads to a globally optimal placement and enhanced container scheduling throughput • Gang Scheduling (YARN-624) • Allow allocation of sets of containers, e.g. 1 container with 128GB of RAM and 16 cores OR 100 containers with 2GB of RAM and 1 core • Can be achieved already by holding on to containers but might lead to deadlocks and decreased cluster utilization
  • 28. YARN - Built-in support for long-running services • Simplified and first-class support for services (YARN-4692) • Abstract common framework to support long running service (similar to Apache Slider) • More simplified API for managing the service lifecycle of YARN Apps • Better support for long running service • Recognition of long running service (YARN-4725) • Auto-restart of containers • Containers for long running service are retried at same node in case of local state • Service/Application upgrade support (YARN-4726) • Hold on to containers during an upgrade of the YARN App • Dynamic container resizing (YARN-1197) • Only ask for minimum resources at start and rather adjust them at runtime • Currently the only way is releasing containers and allocating new containers with the expected size
  • 29. YARN - Resource Isolation & Docker • Better Resource Isolation • Support for disk isolation (YARN-2619) • Support for network isolation (YARN-2140) • Uses cgroups to give containers their fair share • Docker support in LinuxContainerExecutor (YARN-3611) • The LinuxContainerExecutor already provides functionality around localization, cgroups based resource management and isolation for CPU, network, disk, etc. as well as security mechanisms • Support Docker containers to be run inside of LinuxContainerExecutor • Offers packaging and resource isolation • Complements YARN’s support for long-running services
  • 30. YARN - Service Discovery • Services can run on any YARN node • Dynamic IP, can change in case of node failures, etc. • YARN Service Discovery via DNS (YARN-4757) • The YARN service registry already provides facilities for applications to register their endpoints and for clients to discover them but they are only available via Java API and REST • Expose service information via a already available discovery mechanism: DNS • Current YARN Service Registry records need to be converted into DNS entries • Discovery of the container IP and service port via standard DNS lookups • Mapping of Applications, e.g. zkapp1.griduser.yarncluster.com -> 172.17.0.2 • Mapping of containers, e.g. container-e3741-1454001598828-0131-01000004.yarncluster.com -> 172.17.0.3
  • 31. YARN - Use the force! YARN MapReduce Tez Spark YARN MapReduce Tez Spark
  • 32. YARN - New UI (YARN-3368)
  • 33. Application Timeline Service v2 (YARN-2928) Why? • Scalability & Performance • Single global instance of Writer/Reader • Local disk based LevelDB storage • Reliability • Failure handling with local disk • Single point-of-failure • Usability • Add configuration and metrics as first-class members • Better support for queries • Flexibility • Data model is more describable Core Concepts • Distributed write path • Logical per app collector • Separate reader instances • Pluggable backend storage • HBase • Enhanced internal data model • Metrics Aggregation • Richer REST API for queries
  • 35. • Major release, incompatible to Hadoop 2 • Main features are Erasure Coding and better support for long-running services & Docker • Good fit for IoT and Deep Learning use cases Summary Release time line • 3.0.0-alpha1 - Sep/3/2016 • Alpha2 - Jan/25/2017 • Alpha3 - Q2 2017 (Estimated) • Beta/GA - Q3/Q4 2017 (Estimated)
  • 36. …but it’s not a revolution!
  • 38. Slide 1: https://unsplash.com/photos/CIXoFys3gsw Slide 2: Copyright by Uwe Printz Slide 7: https://unsplash.com/photos/LHlwgjbSo3k Slide 34: https://unsplash.com/photos/Cvf1IqUel9w Slide 36: https://imgflip.com/i/mkovb Slide 37: Copyright by Uwe Printz All pictures CC0 or shot by the author