Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Tomas Barton (@barton_tomas)
Load balancing
• one server is not enough
Time to launch new instance
• new server up and runing in a few seconds
Job allocation problem
• run important jobs first

• how many servers do we need?
Load trends
• various load patterns
Goals
• effective usage of resources
Goals
• scalable
Goals
• fault tolerant
Infrastructure
1

scalable

2

fault-tolerant

3

load balancing

4

high utilization
Someone must
have done it before . . .
Someone must
have done it before . . .
Yes, it was Google
Google Research

2004 - MapReduce paper
• MapReduce: Simplified Data Processing on Large Clusters
by Jeffrey Dean and Sanja...
Google’s secret weapon
“I prefer to call it the
system that will not be
named.”
John Wilkes
NOT THIS ONE
“Borg”
unofficial name

distributes jobs between
computers
saved cost of building at least
one entire data center
− central...
“Borg”

200x – no Borg paper at all
2011 – Mesos: a platform for fine-grained resource sharing
in the data center.
• Benjam...
The future?
2013 – Omega: flexible, scalable schedulers for large
compute clusters
• Malte Schwarzkopf, Andy Konwinski, Mic...
Benjamin Hindman

• PhD. at UC Berkley
• research on multi-core

processors

“Sixty-four cores or 128 cores on a single ch...
“We wanted people to be able to program for the data
center just like they program for their laptop.”
Evolution of computing
Datacenter

• low utilization of nodes
• long time to start new node (30 min – 2 h)
Evolution of computing
Datacenter

Virtual Machines

• even more machnines to manage
• high virtualization costs
• VM lice...
Evolution of computing
Datacenter

Virtual Machines

• sharing resources
• fault-tolerant

Mesos
Supercomputers
• supercomputers aren’t affordable
• single point of failure?
IaaC
Infrastructure as a computer
• build on commodity hardware
Scalablity
First Rule of Distributed Computing
“Everything fails all the time.”
— Werner Vogels, CTO of Amazon
Slave failure
• no problem
Master failure
• big problem
• single point of failure
Master failure – solution
• ok, let’s add secondary master
Secondary master failure

× not resistant to secondary master failure
× masters IPs are stored in slave’s config
• will sur...
Leader election

in case of master failure new one is elected
Leader election

in case of master failure new one is elected
Leader election
• you might think of the system like this:
Mesos scheme
• usually 4-6 standby masters are enough
• tolerant to |m| failures, |s|
|m|

|m| . . . number of standby mas...
Mesos scheme
• each slave obtain master’s IP from Zookeeper
• e.g. zk://192.168.1.1:2181/mesos
Common delusion

1
2
3
4

Network is reliable
Latency is zero
Transport cost is zero
The Network is homogeneous
Cloud Computing

design you application for failure

• split application into multiple components
• every component must h...
“If I seen further than
others it is by standing
upon the shoulders of
giants.”
Sir Isaac Newton
ZooKeeper
ZooKeeper
ZooKeeper
ZooKeeper

• not a key-value storage
• not a filesystem
• 1 MB item limit
Frameworks

• framework – application running on Mesos

two components:
1 scheduler
2 executor
Scheduling
• two levels scheduler

• Dominant Resource Fairness
Mesos architecture
• fault tolerant
• scalable
Isolation
• using cgroups
• isolates:
• CPU
• memory
• I/O
• network
Example usage – GitLab CI
AirBnB
Any web application can run on Mesos
Distributed cron
YARN
Alternative to Mesos

• Yet Another Resource Negotiator
Where to get Mesos?
• tarball – http://mesos.apache.org
• deb, rpm – http://mesosphere.io/downloads/
• custom package – ht...
Configuration management
• automatic server configuration
• portable
• should work on most Linux distributions (currently

D...
Thank you for attention!
tomas.barton@fit.cvut.cz
Resources
• Containers – Not Virtual Machines – Are the Future

Cloud
• Managing Twitter clusters with Mesos
• Return of t...
Introduction to Apache Mesos
Introduction to Apache Mesos
Introduction to Apache Mesos
Prochain SlideShare
Chargement dans…5
×
Prochain SlideShare
Introduction to Apache Mesos
Suivant
Télécharger pour lire hors ligne et voir en mode plein écran

101

Partager

Télécharger pour lire hors ligne

Introduction to Apache Mesos

Télécharger pour lire hors ligne

What is Apache Mesos and how to use it. A short introduction to distributed fault-tolerant systems with using ZooKeeper and Mesos. #installfest Prague 2014

Introduction to Apache Mesos

  1. 1. Tomas Barton (@barton_tomas)
  2. 2. Load balancing • one server is not enough
  3. 3. Time to launch new instance • new server up and runing in a few seconds
  4. 4. Job allocation problem • run important jobs first • how many servers do we need?
  5. 5. Load trends • various load patterns
  6. 6. Goals • effective usage of resources
  7. 7. Goals • scalable
  8. 8. Goals • fault tolerant
  9. 9. Infrastructure 1 scalable 2 fault-tolerant 3 load balancing 4 high utilization
  10. 10. Someone must have done it before . . .
  11. 11. Someone must have done it before . . . Yes, it was Google
  12. 12. Google Research 2004 - MapReduce paper • MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean and Sanjay Ghemawat from Google Lab ⇒ 2005 - Hadoop • by Doug Cutting and Mike Cafarella
  13. 13. Google’s secret weapon “I prefer to call it the system that will not be named.” John Wilkes
  14. 14. NOT THIS ONE
  15. 15. “Borg” unofficial name distributes jobs between computers saved cost of building at least one entire data center − centralized, possible bottleneck − hard to adjust for different job types
  16. 16. “Borg” 200x – no Borg paper at all 2011 – Mesos: a platform for fine-grained resource sharing in the data center. • Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. Berkeley, CA, USA
  17. 17. The future? 2013 – Omega: flexible, scalable schedulers for large compute clusters • Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes (SIGOPS European Conference on Computer Systems (EuroSys), ACM, Prague, Czech Republic (2013), pp. 351-364) • compares different schedulers • monolithic (Borg) • Mesos (first public release, version 0.9) • Omega (next generation of Borg)
  18. 18. Benjamin Hindman • PhD. at UC Berkley • research on multi-core processors “Sixty-four cores or 128 cores on a single chip looks a lot like 64 machines or 128 machines in a data center”
  19. 19. “We wanted people to be able to program for the data center just like they program for their laptop.”
  20. 20. Evolution of computing Datacenter • low utilization of nodes • long time to start new node (30 min – 2 h)
  21. 21. Evolution of computing Datacenter Virtual Machines • even more machnines to manage • high virtualization costs • VM licensing
  22. 22. Evolution of computing Datacenter Virtual Machines • sharing resources • fault-tolerant Mesos
  23. 23. Supercomputers • supercomputers aren’t affordable • single point of failure?
  24. 24. IaaC Infrastructure as a computer • build on commodity hardware
  25. 25. Scalablity
  26. 26. First Rule of Distributed Computing “Everything fails all the time.” — Werner Vogels, CTO of Amazon
  27. 27. Slave failure • no problem
  28. 28. Master failure • big problem • single point of failure
  29. 29. Master failure – solution • ok, let’s add secondary master
  30. 30. Secondary master failure × not resistant to secondary master failure × masters IPs are stored in slave’s config • will survive just 1 server failure
  31. 31. Leader election in case of master failure new one is elected
  32. 32. Leader election in case of master failure new one is elected
  33. 33. Leader election • you might think of the system like this:
  34. 34. Mesos scheme • usually 4-6 standby masters are enough • tolerant to |m| failures, |s| |m| |m| . . . number of standby masters |s| . . . number of slaves
  35. 35. Mesos scheme • each slave obtain master’s IP from Zookeeper • e.g. zk://192.168.1.1:2181/mesos
  36. 36. Common delusion 1 2 3 4 Network is reliable Latency is zero Transport cost is zero The Network is homogeneous
  37. 37. Cloud Computing design you application for failure • split application into multiple components • every component must have redundancy • no common points of failure
  38. 38. “If I seen further than others it is by standing upon the shoulders of giants.” Sir Isaac Newton
  39. 39. ZooKeeper
  40. 40. ZooKeeper
  41. 41. ZooKeeper
  42. 42. ZooKeeper • not a key-value storage • not a filesystem • 1 MB item limit
  43. 43. Frameworks • framework – application running on Mesos two components: 1 scheduler 2 executor
  44. 44. Scheduling • two levels scheduler • Dominant Resource Fairness
  45. 45. Mesos architecture • fault tolerant • scalable
  46. 46. Isolation
  47. 47. • using cgroups • isolates: • CPU • memory • I/O • network
  48. 48. Example usage – GitLab CI
  49. 49. AirBnB
  50. 50. Any web application can run on Mesos
  51. 51. Distributed cron
  52. 52. YARN Alternative to Mesos • Yet Another Resource Negotiator
  53. 53. Where to get Mesos? • tarball – http://mesos.apache.org • deb, rpm – http://mesosphere.io/downloads/ • custom package – https: //github.com/deric/mesos-deb-packaging • AWS instance in few seconds – https://elastic.mesosphere.io/
  54. 54. Configuration management • automatic server configuration • portable • should work on most Linux distributions (currently Debian, Ubuntu) • https://github.com/deric/puppet-mesos
  55. 55. Thank you for attention! tomas.barton@fit.cvut.cz
  56. 56. Resources • Containers – Not Virtual Machines – Are the Future Cloud • Managing Twitter clusters with Mesos • Return of the Borg: How Twitter Rebuilt Google’s Secret Weapon • Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center, Hindman, Benjamin and Konwinski, Andrew and Zaharia, Matei and Ghodsi, Ali and Joseph, Anthony D. and Katz, Randy H. and Shenker, Scott and Stoica, Ion; EECS Department, University of California, Berkeley 2010
  • dpapuc

    Sep. 30, 2019
  • eduardo.vinas.mouta

    Feb. 18, 2018
  • paulium

    Dec. 6, 2017
  • whtmdwls

    Nov. 8, 2017
  • hypermin

    Sep. 4, 2017
  • felipemcarneiro

    Aug. 2, 2017
  • YaserNajafi

    Jun. 1, 2017
  • gilbird

    Apr. 25, 2017
  • KunalShah48

    Mar. 8, 2017
  • veeru2k7

    Feb. 25, 2017
  • kitsdk

    Feb. 3, 2017
  • TejasriReddy

    Jan. 23, 2017
  • HakanORCAN

    Jan. 11, 2017
  • QStephen

    Dec. 19, 2016
  • guanlisheng

    Dec. 12, 2016
  • HungH

    Dec. 5, 2016
  • williammdavis

    Dec. 4, 2016
  • chaoszhuo

    Nov. 23, 2016
  • gz_stephen

    Nov. 20, 2016
  • YutakaTakeda

    Nov. 15, 2016

What is Apache Mesos and how to use it. A short introduction to distributed fault-tolerant systems with using ZooKeeper and Mesos. #installfest Prague 2014

Vues

Nombre de vues

25 434

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

5 787

Actions

Téléchargements

978

Partages

0

Commentaires

0

Mentions J'aime

101

×