SlideShare une entreprise Scribd logo
1  sur  20
Scalable On-Demand Hadoop
Clusters with Docker and
Mesos
Andrew Nelson, Nutanix
@vmwnelson http://virtual-hiking.blogspot.com
Chris Mutchler, VMware
@chrismutchler http://virtualelephant.com
V
Agenda
 New Approach for Hadoop Ops
 Infrastructure Resource Considerations
 Docker as the new “Unit of Work”
 Future Work
2
Last Year’s State of the Art
 Self-service and multi-tenant Hadoop
 Elastic and decoupled infrastructure
 Extensible blueprinting
3
New Goals
 Operationalize multiple frameworks
 Decoupled service architecture
 Flexible and developer-friendly form factor
4
Apache Mesos Introduction
 Started at Berkeley
 Graduated to top level Apache project
2013
 Commercial entity is Mesosphere
 https://github.com/apache/mesos/
5
Mesos Architecture
6
Source: http://mesos.apache.org/assets/img/documentation/architecture3.jpg
Mesos as a Multi-Tenant
Resource Pool
7
Source: https://github.com/mesos/myriad/blob/phase1/docs/how-it-works.md
Tools to Build and Scale
 Serengeti, Vmware
 https://github.com/vmware-serengeti
 BOSH, Pivotal
 https://github.com/cloudfoundry/bosh
 Cloudify, Gigaspaces
 https://github.com/CloudifySource/cloudify
 Cloudbreak, SequenceIQ
 https://github.com/sequenceiq/cloudbreak
8
Advantages for Ops
 Mesos as a Resource Pool
 Multiple concurrent frameworks
 Decouple frameworks from resource pools
9
Compute Partitions on Mesos
10
Shared
Hadoop
Storm
Spark
Kafka
Hadoop Cassandra Storm Spark
Marathon
Cassandra
Siloed
HDFS as a Service
11
Namenode
Standby
Namenode
Secondary
Namenode
HDFS
MapReduce
Spark
Hive
Storm
…
Networking Services
 Service Discovery
 Handled per framework
 Port range resource managed by Mesos slave
 For example, Marathon uses HAProxy for request routing
 Per-container network monitoring
 Egress rate-limiting
12
Scheduling Options
 Mesos scheduling
 Capacity Scheduler
 Fair Scheduler
 Tenant scheduling examples
 Hadoop on Mesos
 Myriad (YARN) on Mesos
13
Dev Workflow
 Code Repo / Registry
 Pull / Push / Commit / Run
 Automated Builds
 Version tagging
 Marathon CI / CD
 Dependencies
 Rolling restarts
14
Registry Services
 Pluggable storage
 Webhooks
 Image control
 Security
 Logging
15
Registry
Repository Repository
Image Image Image
Advantages for Developers
 Interchangeable verbs for code<->containers
 Choice of framework to use as their PaaS
 Adopt microservices approach to app pipeline
16
Recommendations for Success
 Start small, scale fast
 Use most appropriate framework for the job
 Think ahead, decouple
 Plan for rolling restart capacity up front
17
Gap Analysis
 Be prepared to “look under the hood”
 Variable maturity and resiliency of the layers
 Networking
 Security
18
Where Are We Going Next
 Scale and learn
 Container-focused OS
 Software-defined networking services
 Discover key performance and availability metrics
19
Wrapping up
 Mesos allows for choice of framework
 Devs utilize Docker with familiar workflow
 Portable, flexible, and scalable architecture
20

Contenu connexe

Tendances

Stratoscale Latest and Greatest
Stratoscale Latest and GreatestStratoscale Latest and Greatest
Stratoscale Latest and Greatest
Zach Lanksbury
 
Successfully deploy build manage your cloud with cloud stack2
Successfully deploy build manage your cloud with cloud stack2Successfully deploy build manage your cloud with cloud stack2
Successfully deploy build manage your cloud with cloud stack2
ke4qqq
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Spark Summit
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
BlueData, Inc.
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 

Tendances (19)

Stratoscale Latest and Greatest
Stratoscale Latest and GreatestStratoscale Latest and Greatest
Stratoscale Latest and Greatest
 
Openshift Container Platform on Azure
Openshift Container Platform on Azure Openshift Container Platform on Azure
Openshift Container Platform on Azure
 
Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on Kubernetes
 
Scaling drupal on amazon web services dr
Scaling drupal on amazon web services drScaling drupal on amazon web services dr
Scaling drupal on amazon web services dr
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Successfully deploy build manage your cloud with cloud stack2
Successfully deploy build manage your cloud with cloud stack2Successfully deploy build manage your cloud with cloud stack2
Successfully deploy build manage your cloud with cloud stack2
 
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016 Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016
 
Understanding AWS with Terraform
Understanding AWS with TerraformUnderstanding AWS with Terraform
Understanding AWS with Terraform
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
Scale your docker containers with Mesos
Scale your docker containers with MesosScale your docker containers with Mesos
Scale your docker containers with Mesos
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreScaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
 
Dag Sonstebo - CloudStack usage service
Dag Sonstebo - CloudStack usage serviceDag Sonstebo - CloudStack usage service
Dag Sonstebo - CloudStack usage service
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Scaling HDFS at Xiaomi
Scaling HDFS at XiaomiScaling HDFS at Xiaomi
Scaling HDFS at Xiaomi
 
Redis Labs and SQL Server
Redis Labs and SQL ServerRedis Labs and SQL Server
Redis Labs and SQL Server
 

En vedette

Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 

En vedette (20)

Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLR
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLRData Infrastructure on Hadoop - Hadoop Summit 2011 BLR
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLR
 
Final White Paper_
Final White Paper_Final White Paper_
Final White Paper_
 
7+1 myths of the new os
7+1 myths of the new os7+1 myths of the new os
7+1 myths of the new os
 
A Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataA Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big Data
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Hi Speed Datawarehousing
Hi Speed DatawarehousingHi Speed Datawarehousing
Hi Speed Datawarehousing
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI Dutch
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise Scale
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
How to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using SemanticsHow to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using Semantics
 

Similaire à Scalable On-Demand Hadoop Clusters with Docker and Mesos

OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
NETWAYS
 
Moving Your Enterprise to the Cloud
Moving Your Enterprise to the CloudMoving Your Enterprise to the Cloud
Moving Your Enterprise to the Cloud
Imesh Gunaratne
 
A clear strategy for moving your enterprise to the cloud
A clear strategy for moving your enterprise to the cloudA clear strategy for moving your enterprise to the cloud
A clear strategy for moving your enterprise to the cloud
WSO2
 

Similaire à Scalable On-Demand Hadoop Clusters with Docker and Mesos (20)

Highly scalable caching service on cloud - Redis
Highly scalable caching service on cloud - RedisHighly scalable caching service on cloud - Redis
Highly scalable caching service on cloud - Redis
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
Mesos vs kubernetes comparison
Mesos vs kubernetes comparisonMesos vs kubernetes comparison
Mesos vs kubernetes comparison
 
OpenSlava 2014 - CloudFoundry inside-out
OpenSlava 2014 - CloudFoundry inside-outOpenSlava 2014 - CloudFoundry inside-out
OpenSlava 2014 - CloudFoundry inside-out
 
The New Stack Container Summit Talk
The New Stack Container Summit TalkThe New Stack Container Summit Talk
The New Stack Container Summit Talk
 
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
 
Open source based container solution in Azure - May Docker Meetup
Open source based container solution in Azure - May Docker MeetupOpen source based container solution in Azure - May Docker Meetup
Open source based container solution in Azure - May Docker Meetup
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overview
 
PaaS with Docker
PaaS with DockerPaaS with Docker
PaaS with Docker
 
Mesosphere quick overview
Mesosphere quick overviewMesosphere quick overview
Mesosphere quick overview
 
PaaS Solutions Comparison
PaaS Solutions ComparisonPaaS Solutions Comparison
PaaS Solutions Comparison
 
Net core microservice development made easy with azure dev spaces
Net core microservice development made easy with azure dev spacesNet core microservice development made easy with azure dev spaces
Net core microservice development made easy with azure dev spaces
 
Moving Your Enterprise to the Cloud
Moving Your Enterprise to the CloudMoving Your Enterprise to the Cloud
Moving Your Enterprise to the Cloud
 
Platform as a Service
Platform as a ServicePlatform as a Service
Platform as a Service
 
Mesos: Cluster Management System
Mesos: Cluster Management SystemMesos: Cluster Management System
Mesos: Cluster Management System
 
Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS
 
Comparison of Several PaaS Cloud Computing Platforms
Comparison of Several PaaS Cloud Computing PlatformsComparison of Several PaaS Cloud Computing Platforms
Comparison of Several PaaS Cloud Computing Platforms
 
A clear strategy for moving your enterprise to the cloud
A clear strategy for moving your enterprise to the cloudA clear strategy for moving your enterprise to the cloud
A clear strategy for moving your enterprise to the cloud
 
Cloud Native Application @ VMUG.IT 20150529
Cloud Native Application @ VMUG.IT 20150529Cloud Native Application @ VMUG.IT 20150529
Cloud Native Application @ VMUG.IT 20150529
 
Apache Mesos Overview and Integration
Apache Mesos Overview and IntegrationApache Mesos Overview and Integration
Apache Mesos Overview and Integration
 

Plus de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Scalable On-Demand Hadoop Clusters with Docker and Mesos

  • 1. Scalable On-Demand Hadoop Clusters with Docker and Mesos Andrew Nelson, Nutanix @vmwnelson http://virtual-hiking.blogspot.com Chris Mutchler, VMware @chrismutchler http://virtualelephant.com V
  • 2. Agenda  New Approach for Hadoop Ops  Infrastructure Resource Considerations  Docker as the new “Unit of Work”  Future Work 2
  • 3. Last Year’s State of the Art  Self-service and multi-tenant Hadoop  Elastic and decoupled infrastructure  Extensible blueprinting 3
  • 4. New Goals  Operationalize multiple frameworks  Decoupled service architecture  Flexible and developer-friendly form factor 4
  • 5. Apache Mesos Introduction  Started at Berkeley  Graduated to top level Apache project 2013  Commercial entity is Mesosphere  https://github.com/apache/mesos/ 5
  • 7. Mesos as a Multi-Tenant Resource Pool 7 Source: https://github.com/mesos/myriad/blob/phase1/docs/how-it-works.md
  • 8. Tools to Build and Scale  Serengeti, Vmware  https://github.com/vmware-serengeti  BOSH, Pivotal  https://github.com/cloudfoundry/bosh  Cloudify, Gigaspaces  https://github.com/CloudifySource/cloudify  Cloudbreak, SequenceIQ  https://github.com/sequenceiq/cloudbreak 8
  • 9. Advantages for Ops  Mesos as a Resource Pool  Multiple concurrent frameworks  Decouple frameworks from resource pools 9
  • 10. Compute Partitions on Mesos 10 Shared Hadoop Storm Spark Kafka Hadoop Cassandra Storm Spark Marathon Cassandra Siloed
  • 11. HDFS as a Service 11 Namenode Standby Namenode Secondary Namenode HDFS MapReduce Spark Hive Storm …
  • 12. Networking Services  Service Discovery  Handled per framework  Port range resource managed by Mesos slave  For example, Marathon uses HAProxy for request routing  Per-container network monitoring  Egress rate-limiting 12
  • 13. Scheduling Options  Mesos scheduling  Capacity Scheduler  Fair Scheduler  Tenant scheduling examples  Hadoop on Mesos  Myriad (YARN) on Mesos 13
  • 14. Dev Workflow  Code Repo / Registry  Pull / Push / Commit / Run  Automated Builds  Version tagging  Marathon CI / CD  Dependencies  Rolling restarts 14
  • 15. Registry Services  Pluggable storage  Webhooks  Image control  Security  Logging 15 Registry Repository Repository Image Image Image
  • 16. Advantages for Developers  Interchangeable verbs for code<->containers  Choice of framework to use as their PaaS  Adopt microservices approach to app pipeline 16
  • 17. Recommendations for Success  Start small, scale fast  Use most appropriate framework for the job  Think ahead, decouple  Plan for rolling restart capacity up front 17
  • 18. Gap Analysis  Be prepared to “look under the hood”  Variable maturity and resiliency of the layers  Networking  Security 18
  • 19. Where Are We Going Next  Scale and learn  Container-focused OS  Software-defined networking services  Discover key performance and availability metrics 19
  • 20. Wrapping up  Mesos allows for choice of framework  Devs utilize Docker with familiar workflow  Portable, flexible, and scalable architecture 20

Notes de l'éditeur

  1. I'm going to be discussing some new opportunities to change the operational model of Hadoop and how to accommodate new services as well as work on better integration and end to end testing of modern application pipelines. This has everything to do with how ops can provide devs with the most flexible building environment without stretching too far to try and support everything. Key takeaways: Hadoop+docker for lightweight self-service on your laptop, in your cloud For building modern app pipelines, need CI/CD, to iterate faster, need this self-service, customizable framework to build what the devs want to build Evaluate whether yarn fits your needs or mesos Just pick a physical form factor or pick a cloud and move on, with portability in mind, unique situation in so many software choices that will affect your ultimate product more than hardware will Test and iterate, scale and learn
  2. Last year, Chris and I talked about how Adobe was virtualizing their Hadoop clusters in order to emulate a public cloud environment. Developers wanted to be able to be more flexible in what kind of Hadoop cluster was deployed, sizing, which templates, and which distro they wanted to work with. All of these things could be customized and were enabled for self-service. Potentially each developer could utilize their own private, dedicated cluster for experimentation and not have to worry about dedicated hardware. The automation and blueprints necessary were shared via catalog and extended to accommodate more than just Hadoop to include other distributed systems such as Storm, Kafka, Mesos, etc.
  3. One key realization is that you can't get there with just one framework. There are a ton of different solutions out there for cluster management and for different frameworks, different building blocks that devs can use to build their app and its date pipeline. So we needed to be able to be more flexible in giving developers options for building their desired service. Should they be building realtime or batch workloads, how will they scale? What if parameters need to be changed as they scale? So many questions and new code to look at and devs need to be just as quick about evaluating what tools are helpful and worth including as what code they are adding in themselves With all of these different frameworks, and to retain the element of flexibility once they go down a road, the devs need to ensure they remain loosely coupled. Otherwise all this flexibility was kinda pointless. What's flexible about having to go back and start from scratch? You could do that before and it was in a lot simpler system right? Now we're all platform-building, even if we're using someone else's services to bootstrap basic functionality. We need to deliver reliability somewhere before we get to the top of the stack. That's what CI and CD are basically about, imo. So what we need that is telatively portable, easily resizable across these different frameworks and reasonably self-contained so that we can pick it up and move it around when we need to? Last year the currency was VMs. We could resize, repurpose, share hardware, and blueprint. I have worked with VMs in high performance and I don't think that's the issue. However, they are not developer-friendly. Dev-friendly to me is basically infrastructure as code, or even infra as text files. As an architect I want devs to feel free to customize, do it themselves, and be able to interact with the system in a form factor that is consistent with their processes. Key part of self-service is choice
  4. Users: Twitter, Airbnb, Apple, Ebay Aurora, Marathon, Chronos
  5. http://mesos.apache.org/documentation/latest/mesos-architecture/ http://mesos.apache.org/assets/img/documentation/architecture3.jpg So from an infara perspective, why not just work on YARN. Well, YARN is not a hierarchical scheduler frmawork. It’s a framework for writing scalable analytics jobs and it does that really well. But how to encapsulate infra for jobs that don't fit that model. Maybe next year, YARN will have a competely different set of capabilities but for now, we have devs with those diverse set of job characteristics. Allows for multiple executors Allows for multiple independent schedulers Allows for multiple frameworks / toolsets Highly available master The master enables fine-grained sharing of resources (cpu, ram, …) across applications by making them resource offers. Each resource offer contains a list of . The master decides how many resources to offer to each framework according to a given organizational policy, such as fair sharing, or strict priority. To support a diverse set of policies, the master employs a modular architecture that makes it easy to add new allocation modules via a plugin mechanism. A framework running on top of Mesos consists of two components: a scheduler that registers with the master to be offered resources, and an executor process that is launched on slave nodes to run the framework’s tasks (/documentation/latest/see theApp/Framework development guide for more details about application schedulers and executors). While the master determines how many resources are offered to each framework, the frameworks' schedulers select which of the offered resources to use. When a frameworks accepts offered resources, it passes to Mesos a description of the tasks it wants to run on them. In turn, Mesos launches the tasks on the corresponding slaves.
  6. https://github.com/mesos/myriad/blob/phase1/docs/how-it-works.md https://github.com/mesos/myriad/raw/phase1/docs/images/how-it-works.png Each tenant has their own framework Each tenant can derive their own scheduling Each tenant can leverage services in a decoupled fashion
  7. This list will probably keep growing before it becomes consolidated. This is about blueprinting the distributed systems. There will typically be an infrastructure layer and a configuration management layer. Vmw is a solution based on vmware vcenter and chef obviously. There is the flexibility of creating your own roles and recipes but dependent on vmw licensing based on sockets. There is only a single template ever at any given time and calls are blocking meaning only one cluster can be in any stage of cration at any given time. Bosh is its own animal, originally conceived as a way to stand up cloud foundry because it is its own distributed system that can't instantiate itself. There is a director-based version or bosh-init as a quick and less heavyweight CLI. Bosh uses yaml as its conf format of choice. It can handle any cloud platform with a known CPI or cloud platform interface. Its templates are called stemcells. It has an async queue kv store with multiple workers that can build in parallel. Networking and dns are fully declared in the manifest but have to be much more explicit. Cloudbreak is relatively new cloud agnostic framework that uses cloud specific APIs for building out components, for example aws cloudformation. For hadoop blueprints, it uses ambari and at the guest-image level, everything is docker with swarm for clustering and consul for communication and service mgmt Clouidfy uses open source tosca blueprints which are yaml files that contain srvice definitions, tiers and dependencies. Cloudify determines the infra compatibility layer and config mgmt is chef or puppet
  8. Mesos is fundamentally a framework for accommodating different frameworks on the same hardware using cgroups, docker
  9. http://mesos.apache.org/documentation/latest/mesos-frameworks/ Compute is determined by resource offers. Instead of trying to fit a workload on whats left of a host, the host or worker advertises some resources, its up to the framework what it can accept and provision or wait.
  10. You have HA, checkpointing, and a common durable and resilient storage layer that can support the ecosystem of compute platforms. MapReduce (batch) Spark (In-memory) HIVE (SQL) Storm (streaming) Solr (Lucene Search) Flume Kafka (with Camus)
  11. Imo, the most immature portion of the tenant svcs of mesos but still headed in the right direction. Frameworks don’t want to manage ports or physical networking. Allow for per container granularity monitoring and logging which is good for debugging.
  12. These are the top-level scheduling algorithms that Mesos can use. Remember that it’s a hierarchy. When a job request comes into the YARN resource manager, YARN evaluates all the resources available, and it places the job. It’s the one making the decision where jobs should go… YARN is optimized for scheduling Hadoop jobs, which are historically (and still typically) batch jobs with long run times. This means that YARN was not designed for long-running services, nor for short-lived interactive queries…, and while it’s possible to have it schedule other kinds of workloads, this is not an ideal model. … uses a two-level scheduling mechanism where resource offers are made to frameworks (applications that run on top of Mesos). The Mesos master node decides how many resources to offer each framework, while each framework determines the resources it accepts and what application to execute on those resources. This method of resource allocation allows near-optimal data locality when sharing a cluster of nodes amongst diverse frameworks. This open source software project is both a Mesos framework and a YARN scheduler that enables Mesos to manage YARN resource requests. When a job comes into YARN, it will schedule it via the Myriad Scheduler, which will match the request to incoming Mesos resource offers. Mesos, in turn, will pass it on to the Mesos worker nodes. The Mesos nodes will then communicate the request to a Myriad executor which is running the YARN node manager. Myriad launches YARN node managers on Mesos resources, which then communicate to the YARN resource manager what resources are available to them. YARN can then consume the resources as it sees fit. Myriad provides a seamless bridge from the pool of resources available in Mesos to the YARN tasks that want those resources.
  13. Developers can push their code and Dockerfile to Git, as they usually do From there, Jenkins can build a container from the Dockerfile and then publish to a registry
  14. As typical, will there be template-creep? Container-creep? Image curation and testing necessary, but hopefully this fits into your CI/CD methodology.
  15. Working with Docker for developers should feel very familiar. Docker push, pull, commit Version dependency and tag-based search verbs Can choose from Marathon, YARN 2.7.0 CI/CD with cloudbees, shippable, drone, jenkins, on and on
  16. Logging is key, of course, best to test and iterate since stuff will break and pick a method that allows you to revert easily Decouple! Be ready to pull in network teams and security teams early and often The SDN decoupling is in progress but for now, infra should be ready to be explicit so devs don’t have to be Don’t just shift complexity, abstract Security, SDLC and infrastructure and ops and…
  17. Often need to change as we scale Remove the guest os as much as possible, options are multiplying, coreos, lxd, msft nano, rhat atomic, vmware photon Don’t know which will work better so need to test and iterate, ultimately we want decoupled so it doesn’t or shouldn’t matter A lot of maturation in the SDN space, controllers are just reaching scalability of thousands of VMs, what happens when I throw a million containers at them? Test and iterate
  18. YARN can be first class citizen, avoids siloeing datacenter Avoid siloing dev into specific frameworks Docker is the new currency for continuous test and deployment of code in infrastructure as text form factor for CI/CD