This presentation is part of my work for the course 'Big Data Seminar' at TU Berlin within the IT4BI (Information Technology for Business Intelligence) master programme.
2. Introduction
YARN
Mesos
Omega
Related work
Conclusions
Table of contents
1
2
3
Introduction
The problem
Solutions
YARN
Architecture
Advantages
Drawbacks
Performance
Mesos
Architecture
Advantages
4
5
6
Jose Luis Lopez Pino
Drawbacks
Performance
Omega
Architecture
Advantages
Drawbacks
Performance
Related work
Resource managers
Scheduling techniques
Conclusions
Scheduling and sharing resources in Data Clusters
18. Introduction
YARN
Mesos
Omega
Related work
Conclusions
Resource managers
Scheduling techniques
Scheduling techniques
Lottery scheduling[11]
Dynamic Proportional Share Scheduling[7]
Calibration: how does a particular task perform in a particular
node?[5]
Stragglers and speculative relaunch[13]
Delay scheduling: achieve locality, relax fairness[12]
Rich resource-requests[2]
Optimize short jobs[3]
Jose Luis Lopez Pino
Scheduling and sharing resources in Data Clusters
20. Introduction
YARN
Mesos
Omega
Related work
Conclusions
References I
[1]
Ronnie Chaiken, Bob Jenkins, Per-˚ke Larson, Bill Ramsey,
A
Darren Shakib, Simon Weaver, and Jingren Zhou.
Scope: easy and efficient parallel processing of massive data
sets.
Proceedings of the VLDB Endowment, 1(2):1265–1276, 2008.
[2]
Carlo Curino, Djellel Difallah, Chris Douglas, Raghu
Ramakrishnan, and Sriram Rao.
Reservation-based scheduling: If youre late dont blame us!
[3]
Khaled Elmeleegy.
Piranha: Optimizing short jobs in hadoop.
Proceedings of the VLDB Endowment, 6(11):985–996, 2013.
Jose Luis Lopez Pino
Scheduling and sharing resources in Data Clusters
21. Introduction
YARN
Mesos
Omega
Related work
Conclusions
References II
[4]
Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder,
Kunal Talwar, and Andrew Goldberg.
Quincy: fair scheduling for distributed computing clusters.
In Proceedings of the ACM SIGOPS 22nd symposium on
Operating systems principles, pages 261–276. ACM, 2009.
[5]
Gunho Lee, Byung-Gon Chun, and Randy H Katz.
Heterogeneity-aware resource allocation and scheduling in the
cloud.
In Proceedings of the 3rd USENIX Workshop on Hot Topics
in Cloud Computing, HotCloud, volume 11, 2011.
Jose Luis Lopez Pino
Scheduling and sharing resources in Data Clusters
22. Introduction
YARN
Mesos
Omega
Related work
Conclusions
References III
[6]
Kyong-Ha Lee, Yoon-Joon Lee, Hyunsik Choi, Yon Dohn
Chung, and Bongki Moon.
Parallel data processing with mapreduce: a survey.
ACM SIGMOD Record, 40(4):11–20, 2012.
[7]
Thomas Sandholm and Kevin Lai.
Dynamic proportional share scheduling in hadoop.
In Job scheduling strategies for parallel processing, pages
110–131. Springer, 2010.
Jose Luis Lopez Pino
Scheduling and sharing resources in Data Clusters
23. Introduction
YARN
Mesos
Omega
Related work
Conclusions
References IV
[8]
Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek,
and John Wilkes.
Omega: Flexible, scalable schedulers for large compute
clusters.
In Proceedings of the 8th ACM European Conference on
Computer Systems, EuroSys ’13, pages 351–364, New York,
NY, USA, 2013. ACM.
[9]
Facebook Engineering Team.
Under the hood: Scheduling mapreduce jobs more efficiently
with corona.
Jose Luis Lopez Pino
Scheduling and sharing resources in Data Clusters
24. Introduction
YARN
Mesos
Omega
Related work
Conclusions
References V
[10] Vinod K. Vavilapalli.
Apache Hadoop YARN: Yet Another Resource Negotiator.
In Proc. SOCC, 2013.
[11] Carl A Waldspurger and William E Weihl.
Lottery scheduling: Flexible proportional-share resource
management.
In Proceedings of the 1st USENIX conference on Operating
Systems Design and Implementation, page 1. USENIX
Association, 1994.
Jose Luis Lopez Pino
Scheduling and sharing resources in Data Clusters
25. Introduction
YARN
Mesos
Omega
Related work
Conclusions
References VI
[12] Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma,
Khaled Elmeleegy, Scott Shenker, and Ion Stoica.
Delay scheduling: a simple technique for achieving locality
and fairness in cluster scheduling.
In Proceedings of the 5th European conference on Computer
systems, pages 265–278. ACM, 2010.
[13] Matei Zaharia, Andy Konwinski, Anthony D Joseph, Randy H
Katz, and Ion Stoica.
Improving mapreduce performance in heterogeneous
environments.
In OSDI, volume 8, page 7, 2008.
Jose Luis Lopez Pino
Scheduling and sharing resources in Data Clusters