SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Moving the Oath
Grid to Docker
1
Eric Badger, Oath
● RHEL 7 adoption
● Security
● Isolation
2
Motivations
*Note: Running arbitrary, user-supplied Docker images on YARN
was not a motivation
Architecture
3
● Tasks run in Docker containers
● Daemons run on bare-metal RHEL 7
4
● Seccomp profile
● Linux capabilities
● Run with --no-new-privileges
● Read-only containers and limited mounts
● Enter container as user UID:GID
5
Security
6
Bind-mounts
Bind-mount RO/RW Reason
Hadoop release RO Hadoop jars and confs
Usercache/Filecache directories RO Distributed cache
NSCD socket RW Use host cache for user lookups
HDFS short-circuit socket RW HDFS short-circuit reads/writes
Container log directories RW Write logs where nodemanager can see for
aggregation
Application local directories RW Application-specific temporary space, /tmp, /var/tmp
Grid Migration
7
● No cluster downtime
● No changes to current jobs
● No more than 5% performance degradation
8
Requirements
● Docker or Linux runtime chosen per node
- Based on RHEL version
- See YARN-6456
● During migration jobs will run as mix of bare-metal
processes and Docker containers
● User-transparent
9
Node Specific Runtime
● Preload images on nodes
- Avoid thundering herd
- Avoid task timeouts
● Force jobs to use a Docker image
● Allow users to pick a different image from a small set of
allowed images
10
Image Management
Challenges
11
● System call overhead due to seccomp
● Docker losing track of containers in high-memory situations
● System PID reuse issue causing Docker to restart
● Debugging tasks inside of containers
● Tasks can’t talk to each other through /tmp anymore
12
Challenges
Future
Improvements
13
● User-specific layers
- Harder than it seems
● Podman
● Kata containers
14
Future Improvements
Phase 1: YARN-3611 - Support Docker Containers in
LinuxContainerExecutor
- 92 subtasks, all resolved
Phase 2: YARN-8274 - YARN Container Phase 2
- 26 subtasks, 7 resolved
15
Docker in Apache Hadoop
Many thanks to those who contributed to this work, including the
following:
Shane Kumpf, Eric Yang, Miklos Szegedi, Billie Rinaldi, Chandni
Singh, Craig Condit, Jason Lowe, Jim Brennan, Nathan Roberts,
Dheeraj Kapur
16
Acknowledgements

Contenu connexe

Tendances

Tendances (20)

OpenVZ, Virtuozzo and Docker
OpenVZ, Virtuozzo and DockerOpenVZ, Virtuozzo and Docker
OpenVZ, Virtuozzo and Docker
 
Docker architecture (version modified)
Docker architecture (version modified)Docker architecture (version modified)
Docker architecture (version modified)
 
Revolutionizing WSO2 PaaS with Kubernetes & App Factory
Revolutionizing WSO2 PaaS with Kubernetes & App FactoryRevolutionizing WSO2 PaaS with Kubernetes & App Factory
Revolutionizing WSO2 PaaS with Kubernetes & App Factory
 
Docker Architecture (v1.3)
Docker Architecture (v1.3)Docker Architecture (v1.3)
Docker Architecture (v1.3)
 
Docker quick start
Docker quick startDocker quick start
Docker quick start
 
Docker Global Hack Day #3
Docker Global Hack Day #3 Docker Global Hack Day #3
Docker Global Hack Day #3
 
Containers orchestrators: Docker vs. Kubernetes
Containers orchestrators: Docker vs. KubernetesContainers orchestrators: Docker vs. Kubernetes
Containers orchestrators: Docker vs. Kubernetes
 
Docker internals
Docker internalsDocker internals
Docker internals
 
LXD Container Hypervisor
LXD Container HypervisorLXD Container Hypervisor
LXD Container Hypervisor
 
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copyLinux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
 
Docker Swarm & Machine
Docker Swarm & MachineDocker Swarm & Machine
Docker Swarm & Machine
 
Orchestrating Docker containers at scale
Orchestrating Docker containers at scaleOrchestrating Docker containers at scale
Orchestrating Docker containers at scale
 
LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)
LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)
LXC – NextGen Virtualization for Cloud benefit realization (cloudexpo)
 
Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015
 
Performance comparison between Linux Containers and Virtual Machines
Performance comparison between Linux Containers and Virtual MachinesPerformance comparison between Linux Containers and Virtual Machines
Performance comparison between Linux Containers and Virtual Machines
 
Shifter: Containers in HPC Environments
Shifter: Containers in HPC EnvironmentsShifter: Containers in HPC Environments
Shifter: Containers in HPC Environments
 
Docker Architecture
Docker ArchitectureDocker Architecture
Docker Architecture
 
Containers in the Cloud
Containers in the CloudContainers in the Cloud
Containers in the Cloud
 
Lxd the proper way of runing containers
Lxd   the proper way of runing containersLxd   the proper way of runing containers
Lxd the proper way of runing containers
 
Docker 1.11 @ Docker SF Meetup
Docker 1.11 @ Docker SF MeetupDocker 1.11 @ Docker SF Meetup
Docker 1.11 @ Docker SF Meetup
 

Similaire à Moving the Oath Grid to Docker, Eric Badger, Oath

A Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and ContainersA Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and Containers
Docker, Inc.
 

Similaire à Moving the Oath Grid to Docker, Eric Badger, Oath (20)

The internals and the latest trends of container runtimes
The internals and the latest trends of container runtimesThe internals and the latest trends of container runtimes
The internals and the latest trends of container runtimes
 
Docker+java
Docker+javaDocker+java
Docker+java
 
DockerCC.pdf
DockerCC.pdfDockerCC.pdf
DockerCC.pdf
 
Docker primer and tips
Docker primer and tipsDocker primer and tips
Docker primer and tips
 
Super powered Drupal development with docker
Super powered Drupal development with dockerSuper powered Drupal development with docker
Super powered Drupal development with docker
 
Introducing docker
Introducing dockerIntroducing docker
Introducing docker
 
Containers: from development to production at DevNation 2015
Containers: from development to production at DevNation 2015Containers: from development to production at DevNation 2015
Containers: from development to production at DevNation 2015
 
Docker up and Running For Web Developers
Docker up and Running For Web DevelopersDocker up and Running For Web Developers
Docker up and Running For Web Developers
 
Docker Up and Running for Web Developers
Docker Up and Running for Web DevelopersDocker Up and Running for Web Developers
Docker Up and Running for Web Developers
 
Best Practices for Developing & Deploying Java Applications with Docker
Best Practices for Developing & Deploying Java Applications with DockerBest Practices for Developing & Deploying Java Applications with Docker
Best Practices for Developing & Deploying Java Applications with Docker
 
Using Docker to boost your development experience with Drupal
Using Docker to boost your development experience with DrupalUsing Docker to boost your development experience with Drupal
Using Docker to boost your development experience with Drupal
 
Powercoders · Docker · Fall 2021.pptx
Powercoders · Docker · Fall 2021.pptxPowercoders · Docker · Fall 2021.pptx
Powercoders · Docker · Fall 2021.pptx
 
JOSA TechTalk: Introduction to docker
JOSA TechTalk: Introduction to dockerJOSA TechTalk: Introduction to docker
JOSA TechTalk: Introduction to docker
 
A Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and ContainersA Gentle Introduction to Docker and Containers
A Gentle Introduction to Docker and Containers
 
Janus & docker: friends or foe
Janus & docker: friends or foe Janus & docker: friends or foe
Janus & docker: friends or foe
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigData
 
Docker from A to Z, including Swarm and OCCS
Docker from A to Z, including Swarm and OCCSDocker from A to Z, including Swarm and OCCS
Docker from A to Z, including Swarm and OCCS
 
Java and Container - Make it Awesome !
Java and Container - Make it Awesome !Java and Container - Make it Awesome !
Java and Container - Make it Awesome !
 
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo SeidelOSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 

Plus de Yahoo Developer Network

Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Yahoo Developer Network
 

Plus de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Moving the Oath Grid to Docker, Eric Badger, Oath

  • 1. Moving the Oath Grid to Docker 1 Eric Badger, Oath
  • 2. ● RHEL 7 adoption ● Security ● Isolation 2 Motivations *Note: Running arbitrary, user-supplied Docker images on YARN was not a motivation
  • 4. ● Tasks run in Docker containers ● Daemons run on bare-metal RHEL 7 4
  • 5. ● Seccomp profile ● Linux capabilities ● Run with --no-new-privileges ● Read-only containers and limited mounts ● Enter container as user UID:GID 5 Security
  • 6. 6 Bind-mounts Bind-mount RO/RW Reason Hadoop release RO Hadoop jars and confs Usercache/Filecache directories RO Distributed cache NSCD socket RW Use host cache for user lookups HDFS short-circuit socket RW HDFS short-circuit reads/writes Container log directories RW Write logs where nodemanager can see for aggregation Application local directories RW Application-specific temporary space, /tmp, /var/tmp
  • 8. ● No cluster downtime ● No changes to current jobs ● No more than 5% performance degradation 8 Requirements
  • 9. ● Docker or Linux runtime chosen per node - Based on RHEL version - See YARN-6456 ● During migration jobs will run as mix of bare-metal processes and Docker containers ● User-transparent 9 Node Specific Runtime
  • 10. ● Preload images on nodes - Avoid thundering herd - Avoid task timeouts ● Force jobs to use a Docker image ● Allow users to pick a different image from a small set of allowed images 10 Image Management
  • 12. ● System call overhead due to seccomp ● Docker losing track of containers in high-memory situations ● System PID reuse issue causing Docker to restart ● Debugging tasks inside of containers ● Tasks can’t talk to each other through /tmp anymore 12 Challenges
  • 14. ● User-specific layers - Harder than it seems ● Podman ● Kata containers 14 Future Improvements
  • 15. Phase 1: YARN-3611 - Support Docker Containers in LinuxContainerExecutor - 92 subtasks, all resolved Phase 2: YARN-8274 - YARN Container Phase 2 - 26 subtasks, 7 resolved 15 Docker in Apache Hadoop
  • 16. Many thanks to those who contributed to this work, including the following: Shane Kumpf, Eric Yang, Miklos Szegedi, Billie Rinaldi, Chandni Singh, Craig Condit, Jason Lowe, Jim Brennan, Nathan Roberts, Dheeraj Kapur 16 Acknowledgements