SlideShare une entreprise Scribd logo
1  sur  27
Scheduling In Distributed Systems
          Candidacy exam


                              Andrii Vozniuk
                              EPFL
                              July 4, 2012
Big Data
       Data explosion
       Processing gets more complicated




          Generates: 25 TB/day       Generates: 40 TB/day
          Stores:    10 PB/year      Stores:    20 PB/year

            Resources of many computers should be used
    2
Typical Data Processing Pipeline


                     Log              Sensor
                     data              data


ETL-like batch      Clean            Analyze        Using resources of
 processing         data              data          many organizations

                                  Particle found!
Efficient query     Query
  execution         data


                  User model

           No one-size-fits-all system currently exists
 3
Outline
    Ɣ Gamma - parallel database
        MapReduce - data-intensive system

        Condor - compute-intensive system

 Conclusions
 Future Research




4
Scheduling In Distributed Systems
       Scheduling
           Policy: setting an ordering of tasks                            task
                                                       task
           Assigning resources to tasks
                                                       task
                                                       task


                                        How to match resources and tasks?




              Scheduling is challenging in distributed systems
    5
Matching Tasks With Resources
       Perspectives
           Data model
           Execution model


             System/Perspecti   Data model      Execution model
             ve
             Gamma              Relational      Multioperator
             MapReduce          Unconstrained   MapReduce
             Condor             Unconstrained   Unconstrained




            How scheduling is influenced by data and execution
    6                             models?
Gamma                                                Ɣ
       Pioneering parallel database
       Data model: constrained
           Relational data model
           Relations are horizontally partitioned
       Execution model: constrained
           Multioperator queries
           Operators employ hash-based algorithms




    7
Gamma: Scheduler                                                         Ɣ
SELECT r FROM R      Query                                   Host
WHERE r < ‘k’ query Manager          Catalog
                                                             Machine

                                                             Gamma
       Optimizes query                                Schedules
                                  Scheduler                   Database
       Compiles plan                                  operators
                                   Process


                          Operator            Operator
               Node 1     Process              Process        Node 2
         Execution on
         relevant nodes     a-m                 n-z



          Scheduling is done at the operator level
 8
Gamma: Batch Scheduling                                           Ɣ
       Exploit sharing by scheduling in a batch
       Example of selection sharing


                σ1      σ2            σ1       σ2
                                                    Shared scan

                A       A                  A



       Reads of A can be shared applying predicates in turn
       Shared relation A is scanned only once


              Batch scheduling trades latency for throughput
    9
Gamma: Batch Scheduling Joins                                           Ɣ
    Several hash-joins in a batch of queries
    Hash table for the same relation can be shared
    Example assumes 100% selectivity of σ
                                                      Shared hash-table for A


             ⋈            ⋈                   ⋈        ⋈

         σ       σ    σ       σ           σ       σ     σ

         A       Β    A       C           B       A     C


    Sharing reduces I/O and memory usage

             Sharing among joins reduces total execution time
    10
Limitations Of Gamma                                           Ɣ
    Gamma offers
        Efficient query execution
        Sharing in a batch of queries
    Gamma operates on structured data
    Gamma is not suitable for
        Unstructured data processing
        ETL type of workload
        Running on large scale




             A different system for ETL processing is needed
    11
MapReduce
    System for data-intensive applications
    Execution model: constrained
        Job is a set of map and reduce tasks
        Tasks are independent
    Data model: unconstrained
        Arbitrary data format
        Files are partitioned into chunks
        Each chunk is replicated several times




    12
MapReduce: Scheduling
                                    Map
                                    Reduc             Map
                                     1e                2
          Example:
                          Chunk1            Chunk2
         MapReduce job
                          Result1
                          Temp1             Temp2
         4 Map tasks

         2 Reduce task              Map               Reduc
                                                      Map
                                     3                 4e
                          Chunk3            Chunk4
                          Temp3             Result2
                                            Temp4
    Tasks are scheduled close to data
    Execution is scalable and fault-tolerant
    Execution is elastic
           Fine grain scheduling improves fault tolerance and
    13                          elasticity
MapReduce: Speculative Execution
    Nodes may become slow
    Speculative execution minimizes job’s response time
    Launch if progress is 20% less than average
                                        backup
          Normal node


                            straggler

Temporary slow node




         Speculative execution works well in homogeneous
    14                     environment
Emerging Heterogeneous Infrastructures
    Replacement of failed components
    Extending existing cluster with new machines
    Virtualized data centers of cloud providers
        CPU and RAM are isolated
        Contention for disk and network
              IO Performance per




                                   60
                  VM (MB/s)




                                   40

                                   20

                                   0
                                        1   2     3      4      5      6   7
                                                VMs on Physical Host

In many real-life cases the infrastructure is heterogeneous
    15
MapReduce: Heterogeneous Cluster
    Fast node




Slow node



    Performance degrades on heterogeneous cluster
        Slow nodes are wasted
        Backup tasks on slow nodes
        All straggling tasks are treated equally
        Thrashing due to excessive speculative execution

     Speculative execution should be improved for heterogeneous
    16                         cluster
MapReduce: LATE Scheduler
    Idea: back up the task with the largest estimated finish
     time (Longest Approximate Time to End)
                                          progress score
                      progress rate =
                                          execution time

                                         1 – progress score
                estimated time left =
                                           progress rate
    Thresholds
        Limit the number of backup tasks
        Launch backup tasks on fast nodes
        Backup only sufficiently slow tasks
         LATE looks forward to prioritize tasks to speculate
    17
MapReduce: LATE Example
   Back up the task with Longest Approximate Time to End
                                   2 min

1                                                 Estimated time left:
                                                  (1-0.66) / (1/3) = 1
     1 task/min

2                 Progress = 66%
                                                  Estimated time left:
                                                  (1-0.05) / (1/1.9) = 1.8
     3x slower
                            Progress = 5.3%
3
    1.9x slower


                               Time (min)     improvement

LATE correctly identifies task which hurts the response time the
18                             most
Limitations Of MapReduce
    MapReduce offers
        High scalability
        Good fault tolerance
        Handling of unstructured data
    MapReduce is not suitable for
        Running on multi organization infrastructure
        Harvesting idle resources in organization




     A different system for multi organization infrastructure is
    19                       needed
Condor
    Compute-intensive system harvesting idle resources
    Data model: arbitrary
    Execution model: arbitrary
                           How to increase utilization
                           and respect the owners?




                                          job

                                          job
                                                              job
                                          job
       Increase resources utilization by scheduling jobs on idle
    20                         machines
Condor Scheduler: Centralized?
                         Scheduler




                                     job

                                     job
                                                       job
                                     job
     Efficient but not reliable, possible bottleneck
21
Condor Scheduler: Distributed?
                                            Scheduler


     Scheduler




                                            Scheduler

                       Scheduler



                                   job

                                   job
                                                 job
                                   job
                 Reliable but inefficient
22
Condor Scheduler: Hybrid!

Information about tasks            Matchmaker           Information about nodes

      Scheduler           1
                              3                          1
                                          1
                                                    2
                                              3                     Scheduler

                              Scheduler


                              4
                                                  job

                                                  job
                                                                          job
                                                  job
            Hybrid approach has the best of both worlds
 23
ClassAds: Describing Jobs and Resources
          Job Description          Machine Description

          [MyType=“Job”            [MyType=“Machine“
          TargetType = “Machine“   TargetType=“Job“
          Department=“CompSci“     Machine=“nostos.cs.wisc.edu“
          Requirements =           OpSys=“LINUX“
          (other.OpSys==LINUX &&   Disk=3076077
          other.Disk > 10000000)   Requirement = (LoadAvg <= 0.3) &&
          Rank=Memory]             (KeyboardIdle > (15*60))
                                   Rank =
                                   other.Department==self.Department]
    Requirements should be satisfied
    Candidate with the highest rank is returned
         Matchmaker is suitable for heterogeneous shared clusters
    24
Conclusions
    Scheduling done at different levels
        Gamma: operator level scheduling enables sharing
        MR and Condor: arbitrary code => sharing is hard
        Condor: matchmaking gives control on job placement

    Hybrid approaches are promising for big data processing
    Scheduling in heterogeneous deployments is challenging




    25
Thank you for your attention!

        Feedback & Question?
        Andrii.Vozniuk@epfl.ch




26
References
    Matchmaking: Distributed Resource Management for
     High Throughput Computing by Rajesh Raman, Miron
     Livny and Marvin Solomon.
    Batch Scheduling in Parallel Database Systems by Manish
     Mehta, Valery Soloviev and David J. DeWitt.
    Improving MapReduce performance in heterogeneous
     environments by Matei Zaharia, Andy Konwinski, Anthony
     D. Joseph, Randy Katz and Ion Stoica
    Slides 14 and 18 exploit presentation ideas from the LATE
     slides for OSDI 2008 by Matei Zaharia


    27

Contenu connexe

Tendances

Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
Manish Singh
 
Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)
Sri Prasanna
 

Tendances (20)

Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)
 
Communications is distributed systems
Communications is distributed systemsCommunications is distributed systems
Communications is distributed systems
 
Parallel programming model
Parallel programming modelParallel programming model
Parallel programming model
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
multiprocessors and multicomputers
 multiprocessors and multicomputers multiprocessors and multicomputers
multiprocessors and multicomputers
 
Aca2 01 new
Aca2 01 newAca2 01 new
Aca2 01 new
 
Computability - Tractable, Intractable and Non-computable Function
Computability - Tractable, Intractable and Non-computable FunctionComputability - Tractable, Intractable and Non-computable Function
Computability - Tractable, Intractable and Non-computable Function
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
 
Parallel and distributed Computing
Parallel and distributed Computing Parallel and distributed Computing
Parallel and distributed Computing
 
The medium access sublayer
 The medium  access sublayer The medium  access sublayer
The medium access sublayer
 
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed Systems
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems
 
Message and Stream Oriented Communication
Message and Stream Oriented CommunicationMessage and Stream Oriented Communication
Message and Stream Oriented Communication
 
Replication in Distributed Systems
Replication in Distributed SystemsReplication in Distributed Systems
Replication in Distributed Systems
 
Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)
 
Randomized Algorithm
Randomized AlgorithmRandomized Algorithm
Randomized Algorithm
 
message passing
 message passing message passing
message passing
 
Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applications
 

Similaire à Scheduling in distributed systems - Andrii Vozniuk

Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profile
pramodbiligiri
 
The Performance of MapReduce: An In-depth Study
The Performance of MapReduce: An In-depth StudyThe Performance of MapReduce: An In-depth Study
The Performance of MapReduce: An In-depth Study
Kevin Tong
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Yahoo Developer Network
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 

Similaire à Scheduling in distributed systems - Andrii Vozniuk (20)

Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profile
 
The Performance of MapReduce: An In-depth Study
The Performance of MapReduce: An In-depth StudyThe Performance of MapReduce: An In-depth Study
The Performance of MapReduce: An In-depth Study
 
Spark
SparkSpark
Spark
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
 
Hadoop at JavaZone 2010
Hadoop at JavaZone 2010Hadoop at JavaZone 2010
Hadoop at JavaZone 2010
 
Strata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting BoarStrata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting Boar
 
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARNHadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of View
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 

Plus de Andrii Vozniuk

Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Andrii Vozniuk
 
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Andrii Vozniuk
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Andrii Vozniuk
 

Plus de Andrii Vozniuk (11)

Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
 
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
 
Combining content analytics and activity tracking to mine user interests and ...
Combining content analytics and activity tracking to mine user interests and ...Combining content analytics and activity tracking to mine user interests and ...
Combining content analytics and activity tracking to mine user interests and ...
 
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
 
Contextual learning analytics apps to create awareness in blended inquiry lea...
Contextual learning analytics apps to create awareness in blended inquiry lea...Contextual learning analytics apps to create awareness in blended inquiry lea...
Contextual learning analytics apps to create awareness in blended inquiry lea...
 
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
 
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
 
AngeLA: Putting the teacher in control of student privacy in the online class...
AngeLA: Putting the teacher in control of student privacy in the online class...AngeLA: Putting the teacher in control of student privacy in the online class...
AngeLA: Putting the teacher in control of student privacy in the online class...
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Scheduling in distributed systems - Andrii Vozniuk

  • 1. Scheduling In Distributed Systems Candidacy exam  Andrii Vozniuk  EPFL  July 4, 2012
  • 2. Big Data  Data explosion  Processing gets more complicated Generates: 25 TB/day Generates: 40 TB/day Stores: 10 PB/year Stores: 20 PB/year Resources of many computers should be used 2
  • 3. Typical Data Processing Pipeline Log Sensor data data ETL-like batch Clean Analyze Using resources of processing data data many organizations Particle found! Efficient query Query execution data User model No one-size-fits-all system currently exists 3
  • 4. Outline Ɣ Gamma - parallel database MapReduce - data-intensive system Condor - compute-intensive system Conclusions Future Research 4
  • 5. Scheduling In Distributed Systems  Scheduling  Policy: setting an ordering of tasks task task  Assigning resources to tasks task task How to match resources and tasks? Scheduling is challenging in distributed systems 5
  • 6. Matching Tasks With Resources  Perspectives  Data model  Execution model System/Perspecti Data model Execution model ve Gamma Relational Multioperator MapReduce Unconstrained MapReduce Condor Unconstrained Unconstrained How scheduling is influenced by data and execution 6 models?
  • 7. Gamma Ɣ  Pioneering parallel database  Data model: constrained  Relational data model  Relations are horizontally partitioned  Execution model: constrained  Multioperator queries  Operators employ hash-based algorithms 7
  • 8. Gamma: Scheduler Ɣ SELECT r FROM R Query Host WHERE r < ‘k’ query Manager Catalog Machine Gamma Optimizes query Schedules Scheduler Database Compiles plan operators Process Operator Operator Node 1 Process Process Node 2 Execution on relevant nodes a-m n-z Scheduling is done at the operator level 8
  • 9. Gamma: Batch Scheduling Ɣ  Exploit sharing by scheduling in a batch  Example of selection sharing σ1 σ2 σ1 σ2 Shared scan A A A  Reads of A can be shared applying predicates in turn  Shared relation A is scanned only once Batch scheduling trades latency for throughput 9
  • 10. Gamma: Batch Scheduling Joins Ɣ  Several hash-joins in a batch of queries  Hash table for the same relation can be shared  Example assumes 100% selectivity of σ Shared hash-table for A ⋈ ⋈ ⋈ ⋈ σ σ σ σ σ σ σ A Β A C B A C  Sharing reduces I/O and memory usage Sharing among joins reduces total execution time 10
  • 11. Limitations Of Gamma Ɣ  Gamma offers  Efficient query execution  Sharing in a batch of queries  Gamma operates on structured data  Gamma is not suitable for  Unstructured data processing  ETL type of workload  Running on large scale A different system for ETL processing is needed 11
  • 12. MapReduce  System for data-intensive applications  Execution model: constrained  Job is a set of map and reduce tasks  Tasks are independent  Data model: unconstrained  Arbitrary data format  Files are partitioned into chunks  Each chunk is replicated several times 12
  • 13. MapReduce: Scheduling Map Reduc Map 1e 2 Example: Chunk1 Chunk2 MapReduce job Result1 Temp1 Temp2 4 Map tasks 2 Reduce task Map Reduc Map 3 4e Chunk3 Chunk4 Temp3 Result2 Temp4  Tasks are scheduled close to data  Execution is scalable and fault-tolerant  Execution is elastic Fine grain scheduling improves fault tolerance and 13 elasticity
  • 14. MapReduce: Speculative Execution  Nodes may become slow  Speculative execution minimizes job’s response time  Launch if progress is 20% less than average backup Normal node straggler Temporary slow node Speculative execution works well in homogeneous 14 environment
  • 15. Emerging Heterogeneous Infrastructures  Replacement of failed components  Extending existing cluster with new machines  Virtualized data centers of cloud providers  CPU and RAM are isolated  Contention for disk and network IO Performance per 60 VM (MB/s) 40 20 0 1 2 3 4 5 6 7 VMs on Physical Host In many real-life cases the infrastructure is heterogeneous 15
  • 16. MapReduce: Heterogeneous Cluster Fast node Slow node  Performance degrades on heterogeneous cluster  Slow nodes are wasted  Backup tasks on slow nodes  All straggling tasks are treated equally  Thrashing due to excessive speculative execution Speculative execution should be improved for heterogeneous 16 cluster
  • 17. MapReduce: LATE Scheduler  Idea: back up the task with the largest estimated finish time (Longest Approximate Time to End) progress score progress rate = execution time 1 – progress score estimated time left = progress rate  Thresholds  Limit the number of backup tasks  Launch backup tasks on fast nodes  Backup only sufficiently slow tasks LATE looks forward to prioritize tasks to speculate 17
  • 18. MapReduce: LATE Example  Back up the task with Longest Approximate Time to End 2 min 1 Estimated time left: (1-0.66) / (1/3) = 1 1 task/min 2 Progress = 66% Estimated time left: (1-0.05) / (1/1.9) = 1.8 3x slower Progress = 5.3% 3 1.9x slower Time (min) improvement LATE correctly identifies task which hurts the response time the 18 most
  • 19. Limitations Of MapReduce  MapReduce offers  High scalability  Good fault tolerance  Handling of unstructured data  MapReduce is not suitable for  Running on multi organization infrastructure  Harvesting idle resources in organization A different system for multi organization infrastructure is 19 needed
  • 20. Condor  Compute-intensive system harvesting idle resources  Data model: arbitrary  Execution model: arbitrary How to increase utilization and respect the owners? job job job job Increase resources utilization by scheduling jobs on idle 20 machines
  • 21. Condor Scheduler: Centralized? Scheduler job job job job Efficient but not reliable, possible bottleneck 21
  • 22. Condor Scheduler: Distributed? Scheduler Scheduler Scheduler Scheduler job job job job Reliable but inefficient 22
  • 23. Condor Scheduler: Hybrid! Information about tasks Matchmaker Information about nodes Scheduler 1 3 1 1 2 3 Scheduler Scheduler 4 job job job job Hybrid approach has the best of both worlds 23
  • 24. ClassAds: Describing Jobs and Resources Job Description Machine Description [MyType=“Job” [MyType=“Machine“ TargetType = “Machine“ TargetType=“Job“ Department=“CompSci“ Machine=“nostos.cs.wisc.edu“ Requirements = OpSys=“LINUX“ (other.OpSys==LINUX && Disk=3076077 other.Disk > 10000000) Requirement = (LoadAvg <= 0.3) && Rank=Memory] (KeyboardIdle > (15*60)) Rank = other.Department==self.Department]  Requirements should be satisfied  Candidate with the highest rank is returned Matchmaker is suitable for heterogeneous shared clusters 24
  • 25. Conclusions  Scheduling done at different levels  Gamma: operator level scheduling enables sharing  MR and Condor: arbitrary code => sharing is hard  Condor: matchmaking gives control on job placement  Hybrid approaches are promising for big data processing  Scheduling in heterogeneous deployments is challenging 25
  • 26. Thank you for your attention! Feedback & Question? Andrii.Vozniuk@epfl.ch 26
  • 27. References  Matchmaking: Distributed Resource Management for High Throughput Computing by Rajesh Raman, Miron Livny and Marvin Solomon.  Batch Scheduling in Parallel Database Systems by Manish Mehta, Valery Soloviev and David J. DeWitt.  Improving MapReduce performance in heterogeneous environments by Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz and Ion Stoica  Slides 14 and 18 exploit presentation ideas from the LATE slides for OSDI 2008 by Matei Zaharia 27