SlideShare a Scribd company logo
1 of 54
Download to read offline
1© Cloudera, Inc. All rights reserved.
Apache Hadoop 3
Andrew Wang Daniel Templeton
andrew.wang@cloudera.com daniel@cloudera.com
2© Cloudera, Inc. All rights reserved.
Who We Are
Andrew Wang
● HDFS @ Cloudera
● Hadoop PMC Member
● Release Manager for Hadoop 3.0
Daniel Templeton
● YARN @ Cloudera
● Hadoop PMC Member
3© Cloudera, Inc. All rights reserved.
An Abbreviated History of Hadoop Releases
Date Release Major Notes
2007-11-04 0.14.1 First release at the ASF
2011-12-27 1.0.0 Security, HBase support
2012-05-23 2.0.0 YARN, NameNode HA, wire compat
2014-11-18 2.6.0 HDFS encryption, rolling upgrade, node labels
2015-04-21 2.7.0 Truncate, Variable-length blocks, YARN Global Caching,
2017-03-22 2.8.0 Cloud improvement, Azure Data Lake, and etc.
2017-11-17 2.9.0 Stability Improvement
2017-12-13 3.0.0 Java 8, Erasure Coding, S3Guard, YARN Timeline Service
4© Cloudera, Inc. All rights reserved.
Motivation for Hadoop 3
● Upgrade minimum Java version to Java 8
○ Java 7 end-of-life in April 2015
○ Many Java libraries now only support Java 8
● HDFS erasure coding
○ Major feature that refactored core pieces of HDFS
○ Too big to backport to 2.x
● Classpath isolation
○ Potentially impacts all clients
● Other miscellaneous incompatible bugfixes and improvements
○ Hadoop 2.x was branched in 2011
○ 6 years of changes waiting for 3.0
5© Cloudera, Inc. All rights reserved.
Hadoop 3 Status and Release Plan
● After four alphas and one beta, 3.0.0 is out!
● Took close to two years from inception
● 3.0.1 and 3.1.0 are already in progress
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0.0+release
Release Date
3.0.0-alpha1 2016-09-03 ✔
3.0.0-alpha2 2017-01-25 ✔
3.0.0-alpha3 2017-05-26 ✔
3.0.0-alpha4 2017-07-07 ✔
3.0.0-beta1 2017-10-03 ✔
3.0.0 GA 2017-12-13 ✔
3.0.1 2017 Mar
6© Cloudera, Inc. All rights reserved.
HDFS & Hadoop Features
7© Cloudera, Inc. All rights reserved.
3x replication vs. Erasure coding
b1 b2 b3
/foo.csv - 3 block file
8© Cloudera, Inc. All rights reserved.
3x replication vs. Erasure coding
b1 b2 b3
/foo.csv - 3 block file
b1 b2 b3
b1 b2 b3
9© Cloudera, Inc. All rights reserved.
3x replication vs. Erasure coding
b1 b2 b3
/foo.csv - 3 block file
b1 b2 b3
b1 b2 b3
3 replicas
3 blocks
3 x 3 = 9 total replicas
9 / 3 = 200% overhead!
10© Cloudera, Inc. All rights reserved.
3x replication vs. Erasure coding
b1 b2 b3
/foo.csv - 3 block file
11© Cloudera, Inc. All rights reserved.
3x replication vs. Erasure coding
b1 b2 b3
/foo.csv - 3 block file
p1 p2
12© Cloudera, Inc. All rights reserved.
3x replication vs. Erasure coding
b1 b2 b3
/foo.csv - 3 block file
p1 p2
3 data blocks 2 parity blocks
3 + 2 = 5 replicas
5 / 3 = 67% overhead!
13© Cloudera, Inc. All rights reserved.
3x replication vs. Erasure coding
b1 b2 b3
/foo.csv - 3 block file
p1 p2
3 data blocks 2 parity blocks
3 + 2 = 5 replicas
5 / 3 = 67% overhead!
b1 b2 b10
/bigfoo.csv - 10 block file
p1 p4
10 data blocks 4 parity blocks
10 + 4 = 14 replicas
14 / 10 = 40% overhead!
... ...
14© Cloudera, Inc. All rights reserved.
EC Reconstruction
b1 b2 b3
/foo.csv - 3 block file
p1 p2 Reed-Solomon (3,2)
15© Cloudera, Inc. All rights reserved.
EC Reconstruction
b1 b2 b3
/foo.csv - 3 block file
p1 p2 Reed-Solomon (3,2)
X
16© Cloudera, Inc. All rights reserved.
EC Reconstruction
b1 b2 b3
/foo.csv - 3 block file
p1 p2 Reed-Solomon (3,2)
Read 3 remaining blocks
b3
Run RS to recover b3
New copy of b3 recovered
X
17© Cloudera, Inc. All rights reserved.
Erasure coding (HDFS-7285)
● Motivation: improve storage efficiency of HDFS
○ ~2x the storage efficiency compared to 3x replication
○ Reduction of overhead from 200% to 40%
● Uses Reed-Solomon(k,m) erasure codes instead of replication
○ Support for multiple erasure coding policies
○ RS(3,2), RS(6,3), RS(10,4)
● Can improves data durability
○ RS(6,3) can tolerate 3 failures
○ RS(10,4) can tolerate 4 failures
● Missing blocks reconstructed from remaining blocks
18© Cloudera, Inc. All rights reserved.
EC implications
● File data is striped across multiple nodes and racks
● Reads and writes are remote and cross-rack
● Reconstruction is network-intensive, reads m blocks cross-rack
● Important to use Intel’s optimized ISA-L for performance
○ 1+ GB/s encode/decode speed, much faster than Java implementation
● Combine data into larger files to avoid an explosion in # replicas
○ Bad: 1x1GB file -> RS(10,4) -> 14x100MB EC blocks (4.6x # replicas)
○ Good: 10x1GB file -> RS(10,4) -> 14x1GB EC blocks (0.46x # replicas)
● Works best for archival / cold data use cases
19© Cloudera, Inc. All rights reserved.
EC performance
20© Cloudera, Inc. All rights reserved.
EC performance
21© Cloudera, Inc. All rights reserved.
EC performance
22© Cloudera, Inc. All rights reserved.
Erasure coding status
● Massive development effort by the Hadoop community
○ 20+ contributors from many companies
■ Cloudera, Intel, Hortonworks, Huawei, Y! JP, …
○ 100s of commits over more than three years (started in 2014)
● Erasure coding is ready in 3.0.0 GA!
● Current focus is on testing and integration efforts
○ Want the complete Hadoop stack to work with HDFS erasure coding enabled
○ Ongoing stress / endurance testing to ensure stability at scale
23© Cloudera, Inc. All rights reserved.
● Hadoop leaks lots of dependencies
onto the application’s classpath
○ Known offenders: Guava,
Protobuf, Jackson, Jetty, …
● No separate HDFS client jar means
server jars are leaked
● YARN / MR clients not shaded
● HDFS-6200: Split HDFS client into
separate JAR
● HADOOP-11804: Shaded
hadoop-client dependency
● YARN-6466: Shade the task
umbilical for a clean YARN
container environment (ongoing)
Classpath isolation (HADOOP-11656)
24© Cloudera, Inc. All rights reserved.
Miscellaneous
● Supportability improvements
○ Shell script rewrite
○ Intra-DataNode balancer
○ Move default ports out of the ephemeral range
● Support for multiple Standby NameNodes
● Cloud enhancements
○ Support for Microsoft Azure Data Lake and Aliyun OSS
○ S3 consistency and performance improvements
● Tightened Hadoop compatibility policy
25© Cloudera, Inc. All rights reserved.
YARN Features
26© Cloudera, Inc. All rights reserved.
Job History Server
Resource
Manager
27© Cloudera, Inc. All rights reserved.
Job History Server
Resource
Manager
jobs
28© Cloudera, Inc. All rights reserved.
Job History Server
Resource
Manager
jobs
Job
History
Server
29© Cloudera, Inc. All rights reserved.
Job History Server
Resource
Manager
jobs
Job
History
Server
HDFS
Node
Manager
30© Cloudera, Inc. All rights reserved.
Job History Server
Resource
Manager
jobs
Job
History
Server
Spark
History
Server
31© Cloudera, Inc. All rights reserved.
Job History Server
Resource
Manager
jobs
Job
History
Server
Spark
History
Server
?
32© Cloudera, Inc. All rights reserved.
Application Timeline Service v2
● Store for application and system events and data
○ Distributed
○ Scalable
○ Structured Data Model
● Updated in real time
○ Application status
○ Application metrics
○ System metrics
● Fed by resource manager, node manager, and application masters
● REST API
33© Cloudera, Inc. All rights reserved.
Application Timeline Service v2
Resource
Manager
jobs
Application
Timeline
Service
HBase
34© Cloudera, Inc. All rights reserved.
Timeline
Reader
Timeline
Reader
Application Timeline Service v2
Resource
Manager
Timeline
Collecter
HBase Node
Manager
Application
Master
Timeline
Collecter
Timeline
Reader
35© Cloudera, Inc. All rights reserved.
Application Timeline Service v2 Flows
36© Cloudera, Inc. All rights reserved.
Application Timeline Service v2 Flows
37© Cloudera, Inc. All rights reserved.
Application Timeline Service v2 Flows
38© Cloudera, Inc. All rights reserved.
Old YARN UI
39© Cloudera, Inc. All rights reserved.
New YARN UI
● Rich client application
○ Built on Node.js and Ember
● Improved visibility into cluster usage
○ Memory, CPU
○ By queues and applications
○ Sunburst graphs for hierarchical queues
○ NodeManager heatmap
● ATSv2 integration
○ Plot container start/stop events
○ Easy to capture delays in app execution
40© Cloudera, Inc. All rights reserved.
New YARN UI: Cluster Overview
41© Cloudera, Inc. All rights reserved.
New YARN UI: Queues
42© Cloudera, Inc. All rights reserved.
● Before Hadoop 3 memory and CPU are the only managed resources
● Resource Types allows adding new managed resources
○ Countable resources: GPUs, Disks etc.
○ Static resources: Java version, Python version, hardware profile, ...
■ Still in proposal stage
● Resource profiles
○ Similar conceptually to EC2 instance types
○ Capture complex resource request
● DRF for scheduling
● Current virtual CPU cores and memory resources work as before
Resource Types
43© Cloudera, Inc. All rights reserved.
YARN Federation
● YARN scalability
○ Twitter runs a 10k node cluster with fair scheduler
○ Yahoo! runs 4k node cluster with capacity scheduler
● Federation
○ Restrict users to sub-clusters based on policy
○ Scalability to 100k nodes and beyond
○ Independent cluster scheduling
44© Cloudera, Inc. All rights reserved.
YARN Federation
Router
Resource
Manager
Node Manager
Node Manager
Node Manager
Node Manager
Resource
Manager
Node Manager
Node Manager
Node Manager
Node Manager
Policy
Admin
45© Cloudera, Inc. All rights reserved.
Opportunistic Containers
● Scheduler’s job is to keep all resources busy
● Scheduling gaps
○ Nothing to run
○ Resource contention
○ Resource reservations
● Opportunistic containers fill those gaps
○ Requested explicitly
○ Dedicated scheduler
○ Queued at the node managers
○ Scheduled locally when resources are available
○ Preempted when guaranteed containers need to run
● Coming in 2.9 and 3.0
46© Cloudera, Inc. All rights reserved.
Oversubscription
● Resource utilization is typically
low in most clusters (20-50%)
○ Provision for peak usage
● Usage < Allocation
○ Mean Usage = ½ Peak Usage
47© Cloudera, Inc. All rights reserved.
Oversubscription
● Oversubscription
○ Allocate opportunistic containers to use allocated-but-unused resources
○ Jobs automatically use these unless they opt-out
○ Threshold to control aggressiveness of oversubscription
○ Threshold to trigger preemption
● Currently in progress
48© Cloudera, Inc. All rights reserved.
● Long Running Services
○ Slider merging into YARN
○ Docker support
● Scheduler improvements
○ Capacity scheduler
■ Performance and preemption
improvements
■ Online scheduling (“global
scheduler”)
■ Queue management
○ Fair scheduler
■ Performance and preemption
improvements
● High availability improvements
○ Better handling of transient
network issues
○ ZK-store scalability: Limit number
of children under a znode
● MapReduce Native Collector
(MAPREDUCE-2841)
○ Native implementation of the map
output collector
○ Up to 30% faster for
shuffle-intensive jobs
Other YARN Improvements
49© Cloudera, Inc. All rights reserved.
Summary: What’s new in Hadoop 3.0?
● Storage Optimization
○ HDFS: Erasure codes
● Improved Visibility into Cluster Operations
○ YARN: ATSv2
○ YARN: New UI
● Scalability & Multi-tenancy
○ YARN: Federation
● Improved Utilization
○ YARN: Opportunistic Containers
○ YARN: Oversubscription
● Refactor Base
○ Lots of Trunk content
○ JDK8 and newer dependent libraries
50© Cloudera, Inc. All rights reserved.
Compatibility and Testing
51© Cloudera, Inc. All rights reserved.
Compatibility
● Strong feedback from large users on the need for compatibility
● Preserves wire-compatibility with Hadoop 2 clients
○ Impossible to coordinate upgrading off-cluster Hadoop clients
● Will support rolling upgrade from Hadoop 2 to Hadoop 3
○ Can’t take downtime to upgrade a business-critical cluster
● Not fully preserving API compatibility!
○ Dependency version bumps
○ Removal of deprecated APIs and tools
○ Shell script rewrite, rework of Hadoop tools scripts
○ Incompatible bug fixes
52© Cloudera, Inc. All rights reserved.
Testing and Validation
● Cloudera CDH 6 is based on upstream Hadoop 3.0.0
○ Running full test suite
○ Integration of Hadoop 3 with all components in CDH stack
○ Same integration tests used to validate CDH5
● Plans for extensive HDFS EC testing by Cloudera and Intel
● Happy synergy between 2.8.x and 3.0.x lines
○ Shares much of the same code, fixes flow into both
○ Yahoo! doing scale testing of 2.8.0
53© Cloudera, Inc. All rights reserved.
Conclusion
● Hadoop 3.0.0 GA is out!
● Shiny new features
○ HDFS erasure coding
○ Client classpath isolation
○ YARN ATSv2
○ YARN Federation
○ Opportunistic containers and oversubscription
● Great time to get involved in testing and validation
54© Cloudera, Inc. All rights reserved.
Thank you
Andrew Wang Daniel Templeton
andrew.wang@cloudera.com daniel@cloudera.com

More Related Content

What's hot

Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...Simplilearn
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Simplilearn
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 

What's hot (20)

Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 

Similar to Apache Hadoop 3

Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Wei-Chiu Chuang
 
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalabilityOzone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalabilityDinesh Chitlangia
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Big Data Spain
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYWangda Tan
 
Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdf
Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdfCloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdf
Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdfwchevreuil
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith SharmaNewton Alex
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...Cloudera, Inc.
 
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...JordanHambleton
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2hdhappy001
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneErik Krogen
 

Similar to Apache Hadoop 3 (20)

Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
 
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalabilityOzone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalability
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
 
Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdf
Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdfCloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdf
Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdf
 
DDN Product Update from SC13
DDN Product Update from SC13DDN Product Update from SC13
DDN Product Update from SC13
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
 
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
 
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptxKudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationShrmpro
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 

Recently uploaded (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 

Apache Hadoop 3

  • 1. 1© Cloudera, Inc. All rights reserved. Apache Hadoop 3 Andrew Wang Daniel Templeton andrew.wang@cloudera.com daniel@cloudera.com
  • 2. 2© Cloudera, Inc. All rights reserved. Who We Are Andrew Wang ● HDFS @ Cloudera ● Hadoop PMC Member ● Release Manager for Hadoop 3.0 Daniel Templeton ● YARN @ Cloudera ● Hadoop PMC Member
  • 3. 3© Cloudera, Inc. All rights reserved. An Abbreviated History of Hadoop Releases Date Release Major Notes 2007-11-04 0.14.1 First release at the ASF 2011-12-27 1.0.0 Security, HBase support 2012-05-23 2.0.0 YARN, NameNode HA, wire compat 2014-11-18 2.6.0 HDFS encryption, rolling upgrade, node labels 2015-04-21 2.7.0 Truncate, Variable-length blocks, YARN Global Caching, 2017-03-22 2.8.0 Cloud improvement, Azure Data Lake, and etc. 2017-11-17 2.9.0 Stability Improvement 2017-12-13 3.0.0 Java 8, Erasure Coding, S3Guard, YARN Timeline Service
  • 4. 4© Cloudera, Inc. All rights reserved. Motivation for Hadoop 3 ● Upgrade minimum Java version to Java 8 ○ Java 7 end-of-life in April 2015 ○ Many Java libraries now only support Java 8 ● HDFS erasure coding ○ Major feature that refactored core pieces of HDFS ○ Too big to backport to 2.x ● Classpath isolation ○ Potentially impacts all clients ● Other miscellaneous incompatible bugfixes and improvements ○ Hadoop 2.x was branched in 2011 ○ 6 years of changes waiting for 3.0
  • 5. 5© Cloudera, Inc. All rights reserved. Hadoop 3 Status and Release Plan ● After four alphas and one beta, 3.0.0 is out! ● Took close to two years from inception ● 3.0.1 and 3.1.0 are already in progress https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0.0+release Release Date 3.0.0-alpha1 2016-09-03 ✔ 3.0.0-alpha2 2017-01-25 ✔ 3.0.0-alpha3 2017-05-26 ✔ 3.0.0-alpha4 2017-07-07 ✔ 3.0.0-beta1 2017-10-03 ✔ 3.0.0 GA 2017-12-13 ✔ 3.0.1 2017 Mar
  • 6. 6© Cloudera, Inc. All rights reserved. HDFS & Hadoop Features
  • 7. 7© Cloudera, Inc. All rights reserved. 3x replication vs. Erasure coding b1 b2 b3 /foo.csv - 3 block file
  • 8. 8© Cloudera, Inc. All rights reserved. 3x replication vs. Erasure coding b1 b2 b3 /foo.csv - 3 block file b1 b2 b3 b1 b2 b3
  • 9. 9© Cloudera, Inc. All rights reserved. 3x replication vs. Erasure coding b1 b2 b3 /foo.csv - 3 block file b1 b2 b3 b1 b2 b3 3 replicas 3 blocks 3 x 3 = 9 total replicas 9 / 3 = 200% overhead!
  • 10. 10© Cloudera, Inc. All rights reserved. 3x replication vs. Erasure coding b1 b2 b3 /foo.csv - 3 block file
  • 11. 11© Cloudera, Inc. All rights reserved. 3x replication vs. Erasure coding b1 b2 b3 /foo.csv - 3 block file p1 p2
  • 12. 12© Cloudera, Inc. All rights reserved. 3x replication vs. Erasure coding b1 b2 b3 /foo.csv - 3 block file p1 p2 3 data blocks 2 parity blocks 3 + 2 = 5 replicas 5 / 3 = 67% overhead!
  • 13. 13© Cloudera, Inc. All rights reserved. 3x replication vs. Erasure coding b1 b2 b3 /foo.csv - 3 block file p1 p2 3 data blocks 2 parity blocks 3 + 2 = 5 replicas 5 / 3 = 67% overhead! b1 b2 b10 /bigfoo.csv - 10 block file p1 p4 10 data blocks 4 parity blocks 10 + 4 = 14 replicas 14 / 10 = 40% overhead! ... ...
  • 14. 14© Cloudera, Inc. All rights reserved. EC Reconstruction b1 b2 b3 /foo.csv - 3 block file p1 p2 Reed-Solomon (3,2)
  • 15. 15© Cloudera, Inc. All rights reserved. EC Reconstruction b1 b2 b3 /foo.csv - 3 block file p1 p2 Reed-Solomon (3,2) X
  • 16. 16© Cloudera, Inc. All rights reserved. EC Reconstruction b1 b2 b3 /foo.csv - 3 block file p1 p2 Reed-Solomon (3,2) Read 3 remaining blocks b3 Run RS to recover b3 New copy of b3 recovered X
  • 17. 17© Cloudera, Inc. All rights reserved. Erasure coding (HDFS-7285) ● Motivation: improve storage efficiency of HDFS ○ ~2x the storage efficiency compared to 3x replication ○ Reduction of overhead from 200% to 40% ● Uses Reed-Solomon(k,m) erasure codes instead of replication ○ Support for multiple erasure coding policies ○ RS(3,2), RS(6,3), RS(10,4) ● Can improves data durability ○ RS(6,3) can tolerate 3 failures ○ RS(10,4) can tolerate 4 failures ● Missing blocks reconstructed from remaining blocks
  • 18. 18© Cloudera, Inc. All rights reserved. EC implications ● File data is striped across multiple nodes and racks ● Reads and writes are remote and cross-rack ● Reconstruction is network-intensive, reads m blocks cross-rack ● Important to use Intel’s optimized ISA-L for performance ○ 1+ GB/s encode/decode speed, much faster than Java implementation ● Combine data into larger files to avoid an explosion in # replicas ○ Bad: 1x1GB file -> RS(10,4) -> 14x100MB EC blocks (4.6x # replicas) ○ Good: 10x1GB file -> RS(10,4) -> 14x1GB EC blocks (0.46x # replicas) ● Works best for archival / cold data use cases
  • 19. 19© Cloudera, Inc. All rights reserved. EC performance
  • 20. 20© Cloudera, Inc. All rights reserved. EC performance
  • 21. 21© Cloudera, Inc. All rights reserved. EC performance
  • 22. 22© Cloudera, Inc. All rights reserved. Erasure coding status ● Massive development effort by the Hadoop community ○ 20+ contributors from many companies ■ Cloudera, Intel, Hortonworks, Huawei, Y! JP, … ○ 100s of commits over more than three years (started in 2014) ● Erasure coding is ready in 3.0.0 GA! ● Current focus is on testing and integration efforts ○ Want the complete Hadoop stack to work with HDFS erasure coding enabled ○ Ongoing stress / endurance testing to ensure stability at scale
  • 23. 23© Cloudera, Inc. All rights reserved. ● Hadoop leaks lots of dependencies onto the application’s classpath ○ Known offenders: Guava, Protobuf, Jackson, Jetty, … ● No separate HDFS client jar means server jars are leaked ● YARN / MR clients not shaded ● HDFS-6200: Split HDFS client into separate JAR ● HADOOP-11804: Shaded hadoop-client dependency ● YARN-6466: Shade the task umbilical for a clean YARN container environment (ongoing) Classpath isolation (HADOOP-11656)
  • 24. 24© Cloudera, Inc. All rights reserved. Miscellaneous ● Supportability improvements ○ Shell script rewrite ○ Intra-DataNode balancer ○ Move default ports out of the ephemeral range ● Support for multiple Standby NameNodes ● Cloud enhancements ○ Support for Microsoft Azure Data Lake and Aliyun OSS ○ S3 consistency and performance improvements ● Tightened Hadoop compatibility policy
  • 25. 25© Cloudera, Inc. All rights reserved. YARN Features
  • 26. 26© Cloudera, Inc. All rights reserved. Job History Server Resource Manager
  • 27. 27© Cloudera, Inc. All rights reserved. Job History Server Resource Manager jobs
  • 28. 28© Cloudera, Inc. All rights reserved. Job History Server Resource Manager jobs Job History Server
  • 29. 29© Cloudera, Inc. All rights reserved. Job History Server Resource Manager jobs Job History Server HDFS Node Manager
  • 30. 30© Cloudera, Inc. All rights reserved. Job History Server Resource Manager jobs Job History Server Spark History Server
  • 31. 31© Cloudera, Inc. All rights reserved. Job History Server Resource Manager jobs Job History Server Spark History Server ?
  • 32. 32© Cloudera, Inc. All rights reserved. Application Timeline Service v2 ● Store for application and system events and data ○ Distributed ○ Scalable ○ Structured Data Model ● Updated in real time ○ Application status ○ Application metrics ○ System metrics ● Fed by resource manager, node manager, and application masters ● REST API
  • 33. 33© Cloudera, Inc. All rights reserved. Application Timeline Service v2 Resource Manager jobs Application Timeline Service HBase
  • 34. 34© Cloudera, Inc. All rights reserved. Timeline Reader Timeline Reader Application Timeline Service v2 Resource Manager Timeline Collecter HBase Node Manager Application Master Timeline Collecter Timeline Reader
  • 35. 35© Cloudera, Inc. All rights reserved. Application Timeline Service v2 Flows
  • 36. 36© Cloudera, Inc. All rights reserved. Application Timeline Service v2 Flows
  • 37. 37© Cloudera, Inc. All rights reserved. Application Timeline Service v2 Flows
  • 38. 38© Cloudera, Inc. All rights reserved. Old YARN UI
  • 39. 39© Cloudera, Inc. All rights reserved. New YARN UI ● Rich client application ○ Built on Node.js and Ember ● Improved visibility into cluster usage ○ Memory, CPU ○ By queues and applications ○ Sunburst graphs for hierarchical queues ○ NodeManager heatmap ● ATSv2 integration ○ Plot container start/stop events ○ Easy to capture delays in app execution
  • 40. 40© Cloudera, Inc. All rights reserved. New YARN UI: Cluster Overview
  • 41. 41© Cloudera, Inc. All rights reserved. New YARN UI: Queues
  • 42. 42© Cloudera, Inc. All rights reserved. ● Before Hadoop 3 memory and CPU are the only managed resources ● Resource Types allows adding new managed resources ○ Countable resources: GPUs, Disks etc. ○ Static resources: Java version, Python version, hardware profile, ... ■ Still in proposal stage ● Resource profiles ○ Similar conceptually to EC2 instance types ○ Capture complex resource request ● DRF for scheduling ● Current virtual CPU cores and memory resources work as before Resource Types
  • 43. 43© Cloudera, Inc. All rights reserved. YARN Federation ● YARN scalability ○ Twitter runs a 10k node cluster with fair scheduler ○ Yahoo! runs 4k node cluster with capacity scheduler ● Federation ○ Restrict users to sub-clusters based on policy ○ Scalability to 100k nodes and beyond ○ Independent cluster scheduling
  • 44. 44© Cloudera, Inc. All rights reserved. YARN Federation Router Resource Manager Node Manager Node Manager Node Manager Node Manager Resource Manager Node Manager Node Manager Node Manager Node Manager Policy Admin
  • 45. 45© Cloudera, Inc. All rights reserved. Opportunistic Containers ● Scheduler’s job is to keep all resources busy ● Scheduling gaps ○ Nothing to run ○ Resource contention ○ Resource reservations ● Opportunistic containers fill those gaps ○ Requested explicitly ○ Dedicated scheduler ○ Queued at the node managers ○ Scheduled locally when resources are available ○ Preempted when guaranteed containers need to run ● Coming in 2.9 and 3.0
  • 46. 46© Cloudera, Inc. All rights reserved. Oversubscription ● Resource utilization is typically low in most clusters (20-50%) ○ Provision for peak usage ● Usage < Allocation ○ Mean Usage = ½ Peak Usage
  • 47. 47© Cloudera, Inc. All rights reserved. Oversubscription ● Oversubscription ○ Allocate opportunistic containers to use allocated-but-unused resources ○ Jobs automatically use these unless they opt-out ○ Threshold to control aggressiveness of oversubscription ○ Threshold to trigger preemption ● Currently in progress
  • 48. 48© Cloudera, Inc. All rights reserved. ● Long Running Services ○ Slider merging into YARN ○ Docker support ● Scheduler improvements ○ Capacity scheduler ■ Performance and preemption improvements ■ Online scheduling (“global scheduler”) ■ Queue management ○ Fair scheduler ■ Performance and preemption improvements ● High availability improvements ○ Better handling of transient network issues ○ ZK-store scalability: Limit number of children under a znode ● MapReduce Native Collector (MAPREDUCE-2841) ○ Native implementation of the map output collector ○ Up to 30% faster for shuffle-intensive jobs Other YARN Improvements
  • 49. 49© Cloudera, Inc. All rights reserved. Summary: What’s new in Hadoop 3.0? ● Storage Optimization ○ HDFS: Erasure codes ● Improved Visibility into Cluster Operations ○ YARN: ATSv2 ○ YARN: New UI ● Scalability & Multi-tenancy ○ YARN: Federation ● Improved Utilization ○ YARN: Opportunistic Containers ○ YARN: Oversubscription ● Refactor Base ○ Lots of Trunk content ○ JDK8 and newer dependent libraries
  • 50. 50© Cloudera, Inc. All rights reserved. Compatibility and Testing
  • 51. 51© Cloudera, Inc. All rights reserved. Compatibility ● Strong feedback from large users on the need for compatibility ● Preserves wire-compatibility with Hadoop 2 clients ○ Impossible to coordinate upgrading off-cluster Hadoop clients ● Will support rolling upgrade from Hadoop 2 to Hadoop 3 ○ Can’t take downtime to upgrade a business-critical cluster ● Not fully preserving API compatibility! ○ Dependency version bumps ○ Removal of deprecated APIs and tools ○ Shell script rewrite, rework of Hadoop tools scripts ○ Incompatible bug fixes
  • 52. 52© Cloudera, Inc. All rights reserved. Testing and Validation ● Cloudera CDH 6 is based on upstream Hadoop 3.0.0 ○ Running full test suite ○ Integration of Hadoop 3 with all components in CDH stack ○ Same integration tests used to validate CDH5 ● Plans for extensive HDFS EC testing by Cloudera and Intel ● Happy synergy between 2.8.x and 3.0.x lines ○ Shares much of the same code, fixes flow into both ○ Yahoo! doing scale testing of 2.8.0
  • 53. 53© Cloudera, Inc. All rights reserved. Conclusion ● Hadoop 3.0.0 GA is out! ● Shiny new features ○ HDFS erasure coding ○ Client classpath isolation ○ YARN ATSv2 ○ YARN Federation ○ Opportunistic containers and oversubscription ● Great time to get involved in testing and validation
  • 54. 54© Cloudera, Inc. All rights reserved. Thank you Andrew Wang Daniel Templeton andrew.wang@cloudera.com daniel@cloudera.com