SlideShare une entreprise Scribd logo
1  sur  47
Télécharger pour lire hors ligne
Apache Hadoop 3.x State of
The Union and Upgrade
Guidance
Wei-Chiu Chuang
@Cloudera,
Apache Hadoop PMC
Wangda Tan
Sr. Manager, Compute Platform
@Cloudera,
Apache Hadoop PMC
Agenda
❏ Hadoop Community Updates & Overview
❏ Updates from YARN, Submarine, HDFS, Ozone
❏ Upcoming releases
❏ Upgrade guidance
Community Updates
Resolved Issues by Top 10 ASF Projects
Resolved Issues within Hadoop by Subproject (Monthly)
Resolved issue in Hadoop (Monthly)
Number of Unique #Contributors of Hadoop (Monthly)
(All pictures credits to Marton Elek)
Hadoop 3.x Overview
Big Data/Long Running Services With Hadoop 3
BATCH
WORKLOADS
DEEP LEARNING
APPS
PUBLIC
CLOUD
STORAGE
STORAGECOMPUTE (on-prem/on-cloud)
HIVE on LLAP
SERVICES
Hadoop
Ozone
Themes of Hadoop 3.x
Scalability Containerization
Cloud-native Machine
Learning
Cost-efficiency
YARN
❏ Production-ready Docker container support on YARN.
❏ Containerized Spark
❏ Package/Dependency Isolation
❏ Interactive Docker Shell support ( YARN-8762 )
❏ OCI/squashfs (Like runc) container runtime.
Containerization
Available since 3.3.0
Available since 3.1.0
Target 3.3.0
❏ Autoscaling
❏ Scaling recommendations
❏ Smarter scheduling
❏ Bin-packing Pack containers as opposed to spreading
them around to downscale nodes better
❏ Account for speculative nodes like spot instances
❏ Downscaling nodes
❏ Improved Decommissioning
❏ Consider shuffle/auxiliary services data
YARN in a cloud-native environment YARN-9548
Ongoing Effort
Global Scheduling Framework YARN-5139
Scheduler Capabilities enhancements
❏ Look at several nodes at one time.
❏ Fine grained locks.
❏ Multiple allocation threads.
❏ 5-10x allocation throughput gains.
Available since 3.0.0
❏ Node Attributes: Tagging node with attribute and schedule containers based
on that. (3.2.0)
❏ Placement Constraint: Affinity, Anti-Affinity, etc. (3.1.0)
❏ Dynamic Auto Queue Creation (Capacity Scheduler) (3.1.0)
❏ Scheduling Activity Troubleshooter. (3.3.0)
Other Enhancements
Submarine
❏ Started since Aug 2018.
❏ Benefit from Hadoop’s feature like
GPU/Docker on YARN support.
❏ Enables Infra engineers / data scientists to run
deep learning apps
❏ Tensorflow, Pytorch, MXNet.. on
YARN/K8s
❏ Supports Hadoop 2.7+.
❏ LinkedIn TonY joined Submarine family
Machine Learning – Hadoop Submarine
❏ Lots of new stuff in upcoming releases (0.3.0).
❏ Mini-submarine for easy trying Submarine
from single node.
❏ Brand-new Submarine web interface for
end-to-end user Experiences.
❏ Tensorflow/PyTorch on K8s.
❏ 15+ Contributors and community is fast
growing..
Machine Learning – Hadoop Submarine
NetEase:
• One of the largest online
game/news/music provider in
China.
• 245 GPU Cluster runs
Submarine.
• One of the model built is
music recommendation model
which invoked 1B+/days.
LinkedIn:
• 250+ GPU machines
• 500+ TensorFlow trainings/day.
• Serves applications in
recommendation systems and
NLP.
• Collaboration on
Submarine/TonY runtime and
SDK development.
Ke.com:
• 50+ GPU machines (includes 19
multi-v100 GPU machines),
based on Hadoop trunk (3.3.0).
• Serves applications like
image/voice recognition, etc.
Machine Learning – Hadoop Submarine Prod Use cases
And many users are evaluating
Submarine…
Machine Learning – Submarine new UI demo
New Submarine UI
Storage
HDFS Updates - Consistent Read from Standby
❏ Offload reads to non-active
NameNodes to improve overall file
system performance.
❏ Consistency: if a client can report
the last transaction ID seen by it,
then a standby can allow a read if it
has caught up to that transaction
ID seen by the client.
❏ Used in production at Uber and
LinkedIn.
HDFS Updates - Router Based Federation
❏ Router based Federation Supports Security.
❏ Lots of work on scalability and the ability to handle slower sub-clusters.
❏ We are seeing usage across the industry
And many more HDFS features
❏ Selective Wire Encryption
❏ Cost based Fair call queue
❏ Dynamometer
❏ Storage Policy Satisfier
❏ Support Non-volatile storage class memory in HDFS cache directives
Ongoing development
❏ RPC support for TLS
❏ KMSv2
❏ OpenTracing integration
❏ JDK11 support
Cloud Connector Updates - S3A/S3Guard
S3Guard is no longer considered experimental
❏ Maintain consistency through corner cases involving partial failure of rename/delete
operations.
❏ Out of band support - detecting and adapting to other applications overwriting files.
❏ Tracking of etag and version Ids for stricter consistency when you want to defend against
OOB changes.
❏ “authoritative mode” improves performance dramatically.
S3A File system supports Delegation Tokens.
❏ Full user + secret + encryption keys: simplest, but secrets do not leave your system.
❏ Generated session tokens + encryption keys: keeps the long lived secrets locally; life of non-
renewable tokens limited
ABFS: “Azure Datalake Gen 2” Connector
❏ A high performance cloud store & filesystem for Azure
❏ Added in Hadoop 3.2.0;
❏ Stabilization in trunk with all fixes backported to 3.2.1
❏ Has a similar extension point for Delegation Token plugins as S3A.
(though implementing DTs is “left as an exercise”. Contributions
welcome)
Credit to Thomas Marquardt and Da Zhou @Microsoft for their work
—and welcome to the Hadoop Committer Team!
Hadoop Common
On going Effort
❏ RPC support for TLS
❏ KMSv2
❏ OpenTracing integration
❏ JDK11 support
Ozone
Ozone
❏ Object Store made for Big Data
workloads.
❏ A long term successor of
HDFS.
❏ In-place upgrade from HDFS
(roadmap)
❏ Contribution from
Hortonworks/Cloudera/Tencent
…
❏ Tremendous progress over past
year
❏ Three Alpha Releases so far.
❏ 0.2: basic object store.
❏ 0.3: s3 protocol.
❏ 0.4: Security and Ranger support.
❏ 0.4.1 release (Native ACLs) coming out soon (December-
ish).
❏ 0.5.0 will be the beta release.
❏ Reliability and performance improvement.
❏ HA
Ozone Upcoming releases
Releases
Release Plan - Core Hadoop
2018 2.6.5 2.7.5 - 2.7.7 2.8.3 - 2.8.5
Stabilization, Maintenance,
Bug fix
Stabilization, Maintenance, Bug fix Stabilization, Maintenance, Bug fix
2.9.0 - 2.9.2 3.0.0 - 3.0.3 3.1.0 - 3.1.1
YARN Federation,
Opportunistic Containers
EC, Global scheduling, multiple resource
types, Timeline Service V2, RBF for HDFS
GPU/FPGA, Long Running Services,
Placement Constraints, Docker on
YARN GA
2019 3.1.2 3.2.0 2.10 ( Planned)
Stabilization, Maintenance,
Bug fix
Node Attributes, Submarine, Storage
Policy Satisfier, ABFS connector
❏ YARN resource types/GPU
support (YARN-8200 )
❏ Selective wire encryption
(HDFS-13541)
❏ HDFS Rolling upgrades from
2.x to 3.x(HDFS-14509)
3.1.3 (RC0, Target Sep 2019) 3.2.1 (Sep 2019, released). 3.3.0 (Planned)
Stabilization, Maintenance,
Bug fix
Stabilization, Maintenance, Bug fix (GA of
3.2)
❏ OCI/SquashFS
❏ NEC Vector Engine
❏ Consistent reads from Standby
❏ NVMe for HDFS cache
Release Plan - Submarine
2018 0.1.0
2019 0.2.0 0.3.0 ( Planned )
❏ Support for other runtimes
❏ Pytorch
❏ Linkedin’s TonY
❏ Zeppelin Notebook support
❏ Support K8s runtimes
❏ Mini-submarine
❏ Submarine-workbench
❏ Submarine SDK
❏ Voted to become a seperate Apache project
❏ No longer part of Core Hadoop releases
❏ YARN
❏ Distributed Tensorflow
❏ MXNet
End of Life Policy
❏ EOL of Releases with no maintenance release in long term (1.5+ yrs)
❏ Security-only releases on EOL versions if requested.
❏ EOLed Versions
❏ Hadoop 2.7.x (and lower)
❏ Hadoop 3.0.x
Upgrades
(Hadoop 2 -> Hadoop 3)
Express/Rolling Upgrades
❏ Stop the world Upgrades
❏ Cluster downtime
❏ Less stringent prerequisites
❏ Process
❏ Upgrade masters and workers in one
shot
Express Upgrades
❏ Preserve cluster operation
❏ Minimizes Service impact and downtime
❏ Can take longer to complete
❏ Process
❏ Upgrades masters and workers in batches
Rolling Upgrades
Recommendation - Express or Rolling?
❏Major version upgrade
❏ Challenges and issues in supporting Rolling Upgrades
❏Technical challenges with rolling upgrade
❏ Lot of work done/WIP by Hadoop community to support upgrades without
Downtime. Should be part of releases soon.
❏ Backward incompatible changes blocks rolling upgrade.
❏Recommended
❏ Express Upgrade from Hadoop 2 to 3
Wire compatibility
❏ Preserves compatibility with Hadoop 2 clients
❏ Distcp/WebHDFS compatibility preserved
API compatibility
Not fully, but minimal impact.
❏ Dependency version bumps
❏ Removal of deprecated APIs and tools
❏ Shell script rewrite, rework of Hadoop tools/scripts.
Compatibility
Source & Target Versions
Upgrades Validated with
Why 2.8.x release?
● Most of production deployments are close to 2.8.x
What should users of 2.6.x and 2.7.x do?
● Do more validations before upgrading, we do see some users directly upgrade from
2.7.x to 3.x.
Hadoop 2 Base version Hadoop 3 Base version
Apache Hadoop 2.8.x Apache Hadoop 3.1.x
Upgrade Process/Details
Refer to our earlier talk for further details
Migrating Hadoop cluster and workloads from Hadoop 2 to
Hadoop 3
Many successful use cases for
Hadoop 3.x (New And
Upgrade) in Production
Summary of upgrade
❏ Hadoop 3
Eagerly awaited release with lots of new features and optimizations !
❏ Lots of large clusters already on Hadoop 3 at enterprises
❏ Express Upgrades are recommended
❏ If you haven't upgraded yet, NOW is the best time!
Questions?
Rate today’s session
Session page on conference website O’Reilly Events App

Contenu connexe

Tendances

Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Spark Summit
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...Chris Fregly
 
A fun cup of joe with open liberty
A fun cup of joe with open libertyA fun cup of joe with open liberty
A fun cup of joe with open libertyAndy Mauer
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentDataWorks Summit
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitSpark Summit
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
One-click Hadoop Cluster Deployment on OpenPOWER Systems
One-click Hadoop Cluster Deployment on OpenPOWER SystemsOne-click Hadoop Cluster Deployment on OpenPOWER Systems
One-click Hadoop Cluster Deployment on OpenPOWER SystemsPradeep Kumar
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
 
On The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterOn The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterSrihari Sriraman
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache StormP. Taylor Goetz
 
Hadoop engineering bo_f_final
Hadoop engineering bo_f_finalHadoop engineering bo_f_final
Hadoop engineering bo_f_finalRamya Sunil
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK toolsHaribabu Nandyal Padmanaban
 
OpenStack Summit Vancouver: Lessons learned on upgrades
OpenStack Summit Vancouver:  Lessons learned on upgradesOpenStack Summit Vancouver:  Lessons learned on upgrades
OpenStack Summit Vancouver: Lessons learned on upgradesFrédéric Lepied
 
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...Uri Cohen
 
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsBeyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsDataWorks Summit
 

Tendances (20)

Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
 
A fun cup of joe with open liberty
A fun cup of joe with open libertyA fun cup of joe with open liberty
A fun cup of joe with open liberty
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
 
ha_module5
ha_module5ha_module5
ha_module5
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
One-click Hadoop Cluster Deployment on OpenPOWER Systems
One-click Hadoop Cluster Deployment on OpenPOWER SystemsOne-click Hadoop Cluster Deployment on OpenPOWER Systems
One-click Hadoop Cluster Deployment on OpenPOWER Systems
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 
On The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterOn The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL Cluster
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Hadoop engineering bo_f_final
Hadoop engineering bo_f_finalHadoop engineering bo_f_final
Hadoop engineering bo_f_final
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
 
OpenStack Summit Vancouver: Lessons learned on upgrades
OpenStack Summit Vancouver:  Lessons learned on upgradesOpenStack Summit Vancouver:  Lessons learned on upgrades
OpenStack Summit Vancouver: Lessons learned on upgrades
 
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
 
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsBeyond unit tests: Deployment and testing for Hadoop/Spark workflows
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
 
Cobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsCobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale Environments
 

Similaire à Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY

Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneErik Krogen
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Community
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...HostedbyConfluent
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...HostedbyConfluent
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Frank Munz
 
Drizzle Keynote at the MySQL User's Conference
Drizzle Keynote at the MySQL User's ConferenceDrizzle Keynote at the MySQL User's Conference
Drizzle Keynote at the MySQL User's ConferenceBrian Aker
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Wei-Chiu Chuang
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin Kuberton
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleDataWorks Summit/Hadoop Summit
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBDDan Frincu
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...OpenStack
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Chris Nauroth
 

Similaire à Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY (20)

Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
 
Drizzle Keynote at the MySQL User's Conference
Drizzle Keynote at the MySQL User's ConferenceDrizzle Keynote at the MySQL User's Conference
Drizzle Keynote at the MySQL User's Conference
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 

Plus de Wangda Tan

Running Tensorflow In Production: Challenges and Solutions on YARN 3.x
Running Tensorflow In Production: Challenges and Solutions on YARN 3.x Running Tensorflow In Production: Challenges and Solutions on YARN 3.x
Running Tensorflow In Production: Challenges and Solutions on YARN 3.x Wangda Tan
 
Dataworks Berlin Summit 18' - Deep learning On YARN - Running Distributed Te...
Dataworks Berlin Summit 18' - Deep learning On YARN -  Running Distributed Te...Dataworks Berlin Summit 18' - Deep learning On YARN -  Running Distributed Te...
Dataworks Berlin Summit 18' - Deep learning On YARN - Running Distributed Te...Wangda Tan
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionWangda Tan
 
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016Wangda Tan
 
Node labels in YARN
Node labels in YARNNode labels in YARN
Node labels in YARNWangda Tan
 
Hadoop summit-diverse-workload
Hadoop summit-diverse-workloadHadoop summit-diverse-workload
Hadoop summit-diverse-workloadWangda Tan
 

Plus de Wangda Tan (6)

Running Tensorflow In Production: Challenges and Solutions on YARN 3.x
Running Tensorflow In Production: Challenges and Solutions on YARN 3.x Running Tensorflow In Production: Challenges and Solutions on YARN 3.x
Running Tensorflow In Production: Challenges and Solutions on YARN 3.x
 
Dataworks Berlin Summit 18' - Deep learning On YARN - Running Distributed Te...
Dataworks Berlin Summit 18' - Deep learning On YARN -  Running Distributed Te...Dataworks Berlin Summit 18' - Deep learning On YARN -  Running Distributed Te...
Dataworks Berlin Summit 18' - Deep learning On YARN - Running Distributed Te...
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
 
Node labels in YARN
Node labels in YARNNode labels in YARN
Node labels in YARN
 
Hadoop summit-diverse-workload
Hadoop summit-diverse-workloadHadoop summit-diverse-workload
Hadoop summit-diverse-workload
 

Dernier

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Dernier (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY

  • 1. Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang @Cloudera, Apache Hadoop PMC Wangda Tan Sr. Manager, Compute Platform @Cloudera, Apache Hadoop PMC
  • 2. Agenda ❏ Hadoop Community Updates & Overview ❏ Updates from YARN, Submarine, HDFS, Ozone ❏ Upcoming releases ❏ Upgrade guidance
  • 4.
  • 5. Resolved Issues by Top 10 ASF Projects
  • 6. Resolved Issues within Hadoop by Subproject (Monthly)
  • 7. Resolved issue in Hadoop (Monthly)
  • 8. Number of Unique #Contributors of Hadoop (Monthly) (All pictures credits to Marton Elek)
  • 10. Big Data/Long Running Services With Hadoop 3 BATCH WORKLOADS DEEP LEARNING APPS PUBLIC CLOUD STORAGE STORAGECOMPUTE (on-prem/on-cloud) HIVE on LLAP SERVICES Hadoop Ozone
  • 11. Themes of Hadoop 3.x Scalability Containerization Cloud-native Machine Learning Cost-efficiency
  • 12. YARN
  • 13. ❏ Production-ready Docker container support on YARN. ❏ Containerized Spark ❏ Package/Dependency Isolation ❏ Interactive Docker Shell support ( YARN-8762 ) ❏ OCI/squashfs (Like runc) container runtime. Containerization Available since 3.3.0 Available since 3.1.0 Target 3.3.0
  • 14. ❏ Autoscaling ❏ Scaling recommendations ❏ Smarter scheduling ❏ Bin-packing Pack containers as opposed to spreading them around to downscale nodes better ❏ Account for speculative nodes like spot instances ❏ Downscaling nodes ❏ Improved Decommissioning ❏ Consider shuffle/auxiliary services data YARN in a cloud-native environment YARN-9548 Ongoing Effort
  • 15. Global Scheduling Framework YARN-5139 Scheduler Capabilities enhancements ❏ Look at several nodes at one time. ❏ Fine grained locks. ❏ Multiple allocation threads. ❏ 5-10x allocation throughput gains. Available since 3.0.0
  • 16. ❏ Node Attributes: Tagging node with attribute and schedule containers based on that. (3.2.0) ❏ Placement Constraint: Affinity, Anti-Affinity, etc. (3.1.0) ❏ Dynamic Auto Queue Creation (Capacity Scheduler) (3.1.0) ❏ Scheduling Activity Troubleshooter. (3.3.0) Other Enhancements
  • 18. ❏ Started since Aug 2018. ❏ Benefit from Hadoop’s feature like GPU/Docker on YARN support. ❏ Enables Infra engineers / data scientists to run deep learning apps ❏ Tensorflow, Pytorch, MXNet.. on YARN/K8s ❏ Supports Hadoop 2.7+. ❏ LinkedIn TonY joined Submarine family Machine Learning – Hadoop Submarine
  • 19. ❏ Lots of new stuff in upcoming releases (0.3.0). ❏ Mini-submarine for easy trying Submarine from single node. ❏ Brand-new Submarine web interface for end-to-end user Experiences. ❏ Tensorflow/PyTorch on K8s. ❏ 15+ Contributors and community is fast growing.. Machine Learning – Hadoop Submarine
  • 20. NetEase: • One of the largest online game/news/music provider in China. • 245 GPU Cluster runs Submarine. • One of the model built is music recommendation model which invoked 1B+/days. LinkedIn: • 250+ GPU machines • 500+ TensorFlow trainings/day. • Serves applications in recommendation systems and NLP. • Collaboration on Submarine/TonY runtime and SDK development. Ke.com: • 50+ GPU machines (includes 19 multi-v100 GPU machines), based on Hadoop trunk (3.3.0). • Serves applications like image/voice recognition, etc. Machine Learning – Hadoop Submarine Prod Use cases And many users are evaluating Submarine…
  • 21. Machine Learning – Submarine new UI demo
  • 24. HDFS Updates - Consistent Read from Standby ❏ Offload reads to non-active NameNodes to improve overall file system performance. ❏ Consistency: if a client can report the last transaction ID seen by it, then a standby can allow a read if it has caught up to that transaction ID seen by the client. ❏ Used in production at Uber and LinkedIn.
  • 25. HDFS Updates - Router Based Federation ❏ Router based Federation Supports Security. ❏ Lots of work on scalability and the ability to handle slower sub-clusters. ❏ We are seeing usage across the industry
  • 26. And many more HDFS features ❏ Selective Wire Encryption ❏ Cost based Fair call queue ❏ Dynamometer ❏ Storage Policy Satisfier ❏ Support Non-volatile storage class memory in HDFS cache directives Ongoing development ❏ RPC support for TLS ❏ KMSv2 ❏ OpenTracing integration ❏ JDK11 support
  • 27. Cloud Connector Updates - S3A/S3Guard S3Guard is no longer considered experimental ❏ Maintain consistency through corner cases involving partial failure of rename/delete operations. ❏ Out of band support - detecting and adapting to other applications overwriting files. ❏ Tracking of etag and version Ids for stricter consistency when you want to defend against OOB changes. ❏ “authoritative mode” improves performance dramatically. S3A File system supports Delegation Tokens. ❏ Full user + secret + encryption keys: simplest, but secrets do not leave your system. ❏ Generated session tokens + encryption keys: keeps the long lived secrets locally; life of non- renewable tokens limited
  • 28. ABFS: “Azure Datalake Gen 2” Connector ❏ A high performance cloud store & filesystem for Azure ❏ Added in Hadoop 3.2.0; ❏ Stabilization in trunk with all fixes backported to 3.2.1 ❏ Has a similar extension point for Delegation Token plugins as S3A. (though implementing DTs is “left as an exercise”. Contributions welcome) Credit to Thomas Marquardt and Da Zhou @Microsoft for their work —and welcome to the Hadoop Committer Team!
  • 30. On going Effort ❏ RPC support for TLS ❏ KMSv2 ❏ OpenTracing integration ❏ JDK11 support
  • 31. Ozone
  • 32. Ozone ❏ Object Store made for Big Data workloads. ❏ A long term successor of HDFS. ❏ In-place upgrade from HDFS (roadmap) ❏ Contribution from Hortonworks/Cloudera/Tencent … ❏ Tremendous progress over past year
  • 33. ❏ Three Alpha Releases so far. ❏ 0.2: basic object store. ❏ 0.3: s3 protocol. ❏ 0.4: Security and Ranger support. ❏ 0.4.1 release (Native ACLs) coming out soon (December- ish). ❏ 0.5.0 will be the beta release. ❏ Reliability and performance improvement. ❏ HA Ozone Upcoming releases
  • 35. Release Plan - Core Hadoop 2018 2.6.5 2.7.5 - 2.7.7 2.8.3 - 2.8.5 Stabilization, Maintenance, Bug fix Stabilization, Maintenance, Bug fix Stabilization, Maintenance, Bug fix 2.9.0 - 2.9.2 3.0.0 - 3.0.3 3.1.0 - 3.1.1 YARN Federation, Opportunistic Containers EC, Global scheduling, multiple resource types, Timeline Service V2, RBF for HDFS GPU/FPGA, Long Running Services, Placement Constraints, Docker on YARN GA 2019 3.1.2 3.2.0 2.10 ( Planned) Stabilization, Maintenance, Bug fix Node Attributes, Submarine, Storage Policy Satisfier, ABFS connector ❏ YARN resource types/GPU support (YARN-8200 ) ❏ Selective wire encryption (HDFS-13541) ❏ HDFS Rolling upgrades from 2.x to 3.x(HDFS-14509) 3.1.3 (RC0, Target Sep 2019) 3.2.1 (Sep 2019, released). 3.3.0 (Planned) Stabilization, Maintenance, Bug fix Stabilization, Maintenance, Bug fix (GA of 3.2) ❏ OCI/SquashFS ❏ NEC Vector Engine ❏ Consistent reads from Standby ❏ NVMe for HDFS cache
  • 36. Release Plan - Submarine 2018 0.1.0 2019 0.2.0 0.3.0 ( Planned ) ❏ Support for other runtimes ❏ Pytorch ❏ Linkedin’s TonY ❏ Zeppelin Notebook support ❏ Support K8s runtimes ❏ Mini-submarine ❏ Submarine-workbench ❏ Submarine SDK ❏ Voted to become a seperate Apache project ❏ No longer part of Core Hadoop releases ❏ YARN ❏ Distributed Tensorflow ❏ MXNet
  • 37. End of Life Policy ❏ EOL of Releases with no maintenance release in long term (1.5+ yrs) ❏ Security-only releases on EOL versions if requested. ❏ EOLed Versions ❏ Hadoop 2.7.x (and lower) ❏ Hadoop 3.0.x
  • 39. Express/Rolling Upgrades ❏ Stop the world Upgrades ❏ Cluster downtime ❏ Less stringent prerequisites ❏ Process ❏ Upgrade masters and workers in one shot Express Upgrades ❏ Preserve cluster operation ❏ Minimizes Service impact and downtime ❏ Can take longer to complete ❏ Process ❏ Upgrades masters and workers in batches Rolling Upgrades
  • 40. Recommendation - Express or Rolling? ❏Major version upgrade ❏ Challenges and issues in supporting Rolling Upgrades ❏Technical challenges with rolling upgrade ❏ Lot of work done/WIP by Hadoop community to support upgrades without Downtime. Should be part of releases soon. ❏ Backward incompatible changes blocks rolling upgrade. ❏Recommended ❏ Express Upgrade from Hadoop 2 to 3
  • 41. Wire compatibility ❏ Preserves compatibility with Hadoop 2 clients ❏ Distcp/WebHDFS compatibility preserved API compatibility Not fully, but minimal impact. ❏ Dependency version bumps ❏ Removal of deprecated APIs and tools ❏ Shell script rewrite, rework of Hadoop tools/scripts. Compatibility
  • 42. Source & Target Versions Upgrades Validated with Why 2.8.x release? ● Most of production deployments are close to 2.8.x What should users of 2.6.x and 2.7.x do? ● Do more validations before upgrading, we do see some users directly upgrade from 2.7.x to 3.x. Hadoop 2 Base version Hadoop 3 Base version Apache Hadoop 2.8.x Apache Hadoop 3.1.x
  • 43. Upgrade Process/Details Refer to our earlier talk for further details Migrating Hadoop cluster and workloads from Hadoop 2 to Hadoop 3
  • 44. Many successful use cases for Hadoop 3.x (New And Upgrade) in Production
  • 45. Summary of upgrade ❏ Hadoop 3 Eagerly awaited release with lots of new features and optimizations ! ❏ Lots of large clusters already on Hadoop 3 at enterprises ❏ Express Upgrades are recommended ❏ If you haven't upgraded yet, NOW is the best time!
  • 47. Rate today’s session Session page on conference website O’Reilly Events App