SlideShare une entreprise Scribd logo
1  sur  13
Adaptive Execution Support for
   Malleable Computation
         Speaker: LIN Qian
http://www.comp.nus.edu.sg/~linqian
Outline
• Introduce the key ideas of 3 selected papers
• Discussion
FORMLESS
• FORMLESS: Scalable Utilization of Embedded
  Manycores in Streaming Applications
  [LCTES’12]
  – Functionally-cOnsistent stRucturally-MalLEabe
    Streaming Specification
  – Actor-oriented specification models
  – Space exploration scheme
     • to customize the application specification to better fit
       the target platform.
FORMLESS (cont.)
• Space exploration for platform-driven
  instantiation
FORMLESS (cont.)
• Example:
Dynamic Load Balancing
• A Distributed and Adaptive Dynamic Load
  Balancing Scheme for Parallel Processing of
  Medium-Grain Tasks
  [IEEE Jounal, 1990]
  – Challenge: Allocate and distribute tasks
    dynamically with minimum run time overhead.
  – Design: A distributed and adaptive load balancing
    scheme for medium-grain tasks
Dynamic Load Balancing (cont.)
• Key idea 1: Neighborhood average strategy
  – Attempts to balance load within a neighborhood
    by distributing tasks
     • such that all neighbors have loads close to the
       neighborhood average.
  – The decision when to balance load is based on the
    neighborhood state information that is checked
    periodically.
     • Each processor maintains status information of all its
       neighbors.
Dynamic Load Balancing (cont.)
• Key idea 2: Grain Size Control
  – If the cost of making work available to another
    processor exceeds the cost of executing it at the
    local processor, then it does not make sense to
    decompose and parallelize work beyond a certain
    size or granularity of work.
  – Granularity control: To determine when to stop
    breaking down a computation into parallel
    computations at a frontier node, treating it as a
    leaf node and executing it sequentially.
Adaptive Load Balancing
• Compiler and Run-Time Support for Adaptive
  Load Balancing in Software Distributed Shared
  Memory Systems
  [1998]
  – Use information provided by the compiler to help
    the run-time system distribute the work of the
    parallel loops
     • according to the relative power of the processors
     • minimize communication and page sharing
Adaptive Load Balancing (cont.)
• Compile-Time Support for Load Balancing
    – The specific compiler adopts SUIF system, which is
      organized as a set of compiler passes.
    – The SUIF pass extracts the shared data access
      patterns in each of the SPMD regions, and feeds
      this information to the run-time system.
        • also responsible for adding hooks in the parallelized
          code to allow run-time library to change the load
          distribution

--------
SUIF: Stanford University Intermediate Format
SPMD: Single-Program Multiple-Data
Adaptive Load Balancing (cont.)
– Access pattern extraction
   • SUIF pass walks through the program looking for
     accesses to shared memory.
– Prefetching
   • Use the access pattern information to prefetch data
     through prefetching calls.
– Load balancing interface and strategy
   • The compiler can direct the run-time to choose
     between two partitioning strategies for distributing the
     parallel loops.
      1.   Shifting of loop boundaries
      2.   Multiple loop bounds
Adaptive Load Balancing (cont.)
• Run-Time Load Balancing Support
  – The run-time library is responsible for keeping
    track of the progress of each process
     • collect statistics about the execution time of each
       parallel task, and
     • adjust the load accordingly
  – Load balancing vs. Locality management
     • need to avoid unnecessary movement of data and
       minimize page sharing
     • Locality-conscious load balancing: the run-time library
       uses the information supplied by the compiler about
       what loop distribution strategy to use.
Algorithms for Scheduling
• Scheduling Malleable Parallel Tasks: An
  Asymptotic Fully Polynomial-Time
  Approximation Scheme [2002]
• Mapping and Scheduling Heterogeneous Tasks
  using Genertic Algorithms [1995]

Contenu connexe

Tendances

program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecturePankaj Kumar Jain
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsQian Lin
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersAbhishek Singh
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepSubhas Kumar Ghosh
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
 
Map reduce advantages over parallel databases
Map reduce advantages over parallel databases Map reduce advantages over parallel databases
Map reduce advantages over parallel databases Ahmad El Tawil
 
Parallel computing
Parallel computingParallel computing
Parallel computingvirend111
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationAhmad El Tawil
 
Flow control in computer
Flow control in computerFlow control in computer
Flow control in computerrud_d_rcks
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesMurtadha Alsabbagh
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processingPage Maker
 
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programmingShaveta Banda
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf pointsdba3003
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computingNiranjana Ambadi
 
Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2Anshul Sharma
 
Memory management based on MCA
Memory management  based on MCAMemory management  based on MCA
Memory management based on MCAAbhiSaxena16
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting techniqueUday Vakalapudi
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performanceSyed Zaid Irshad
 

Tendances (20)

program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecture
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Map reduce advantages over parallel databases
Map reduce advantages over parallel databases Map reduce advantages over parallel databases
Map reduce advantages over parallel databases
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Flow control in computer
Flow control in computerFlow control in computer
Flow control in computer
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
 
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programming
 
Informatica perf points
Informatica perf pointsInformatica perf points
Informatica perf points
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Memory management based on MCA
Memory management  based on MCAMemory management  based on MCA
Memory management based on MCA
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performance
 

Similaire à Adaptive Execution Support for Malleable Computation

SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptRubenGabrielHernande
 
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSFAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSMaurvi04
 
load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940Samsung Electronics
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptxJoeBaker69
 
01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.pptHarshitPal37
 
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptxassignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx23mu36
 
operating system
operating systemoperating system
operating systemshreeuva
 
Data Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelData Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelNikhil Sharma
 
Cloud computing Module 2 First Part
Cloud computing Module 2 First PartCloud computing Module 2 First Part
Cloud computing Module 2 First PartSoumee Maschatak
 
Module2 MultiThreads.ppt
Module2 MultiThreads.pptModule2 MultiThreads.ppt
Module2 MultiThreads.pptshreesha16
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
 
An Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.pptAn Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.pptHarshalUbale2
 
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...Farley Lai
 

Similaire à Adaptive Execution Support for Malleable Computation (20)

J0210053057
J0210053057J0210053057
J0210053057
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.ppt
 
unit 4.pptx
unit 4.pptxunit 4.pptx
unit 4.pptx
 
unit 4.pptx
unit 4.pptxunit 4.pptx
unit 4.pptx
 
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSFAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
 
load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptx
 
01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt
 
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptxassignment_presentaion_jhvvnvhjhbhjhvjh.pptx
assignment_presentaion_jhvvnvhjhbhjhvjh.pptx
 
operating system
operating systemoperating system
operating system
 
Resource management
Resource managementResource management
Resource management
 
Data Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelData Parallel and Object Oriented Model
Data Parallel and Object Oriented Model
 
Unit-3.ppt
Unit-3.pptUnit-3.ppt
Unit-3.ppt
 
Cloud computing Module 2 First Part
Cloud computing Module 2 First PartCloud computing Module 2 First Part
Cloud computing Module 2 First Part
 
Module2 MultiThreads.ppt
Module2 MultiThreads.pptModule2 MultiThreads.ppt
Module2 MultiThreads.ppt
 
Real time operating systems
Real time operating systemsReal time operating systems
Real time operating systems
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
An Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.pptAn Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.ppt
 
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
 
Types of computing
Types of computingTypes of computing
Types of computing
 

Plus de Qian Lin

Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
Fine-Grained, Secure and Efficient Data Provenance on Blockchain SystemsFine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
Fine-Grained, Secure and Efficient Data Provenance on Blockchain SystemsQian Lin
 
PaxosStore: High-availability Storage Made Practical in WeChat
PaxosStore: High-availability Storage Made Practical in WeChatPaxosStore: High-availability Storage Made Practical in WeChat
PaxosStore: High-availability Storage Made Practical in WeChatQian Lin
 
Trinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudQian Lin
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesQian Lin
 
C-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudC-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudQian Lin
 
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldKineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldQian Lin
 
Optimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationOptimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationQian Lin
 
Virtual Machine Performance
Virtual Machine PerformanceVirtual Machine Performance
Virtual Machine PerformanceQian Lin
 
Be an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a WriterBe an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a WriterQian Lin
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsSciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsQian Lin
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
In-situ MapReduce for Log Processing
In-situ MapReduce for Log ProcessingIn-situ MapReduce for Log Processing
In-situ MapReduce for Log ProcessingQian Lin
 

Plus de Qian Lin (12)

Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
Fine-Grained, Secure and Efficient Data Provenance on Blockchain SystemsFine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
 
PaxosStore: High-availability Storage Made Practical in WeChat
PaxosStore: High-availability Storage Made Practical in WeChatPaxosStore: High-availability Storage Made Practical in WeChat
PaxosStore: High-availability Storage Made Practical in WeChat
 
Trinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory Cloud
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
 
C-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudC-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the Cloud
 
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldKineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
 
Optimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationOptimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid Virtualization
 
Virtual Machine Performance
Virtual Machine PerformanceVirtual Machine Performance
Virtual Machine Performance
 
Be an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a WriterBe an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a Writer
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsSciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
In-situ MapReduce for Log Processing
In-situ MapReduce for Log ProcessingIn-situ MapReduce for Log Processing
In-situ MapReduce for Log Processing
 

Adaptive Execution Support for Malleable Computation

  • 1. Adaptive Execution Support for Malleable Computation Speaker: LIN Qian http://www.comp.nus.edu.sg/~linqian
  • 2. Outline • Introduce the key ideas of 3 selected papers • Discussion
  • 3. FORMLESS • FORMLESS: Scalable Utilization of Embedded Manycores in Streaming Applications [LCTES’12] – Functionally-cOnsistent stRucturally-MalLEabe Streaming Specification – Actor-oriented specification models – Space exploration scheme • to customize the application specification to better fit the target platform.
  • 4. FORMLESS (cont.) • Space exploration for platform-driven instantiation
  • 6. Dynamic Load Balancing • A Distributed and Adaptive Dynamic Load Balancing Scheme for Parallel Processing of Medium-Grain Tasks [IEEE Jounal, 1990] – Challenge: Allocate and distribute tasks dynamically with minimum run time overhead. – Design: A distributed and adaptive load balancing scheme for medium-grain tasks
  • 7. Dynamic Load Balancing (cont.) • Key idea 1: Neighborhood average strategy – Attempts to balance load within a neighborhood by distributing tasks • such that all neighbors have loads close to the neighborhood average. – The decision when to balance load is based on the neighborhood state information that is checked periodically. • Each processor maintains status information of all its neighbors.
  • 8. Dynamic Load Balancing (cont.) • Key idea 2: Grain Size Control – If the cost of making work available to another processor exceeds the cost of executing it at the local processor, then it does not make sense to decompose and parallelize work beyond a certain size or granularity of work. – Granularity control: To determine when to stop breaking down a computation into parallel computations at a frontier node, treating it as a leaf node and executing it sequentially.
  • 9. Adaptive Load Balancing • Compiler and Run-Time Support for Adaptive Load Balancing in Software Distributed Shared Memory Systems [1998] – Use information provided by the compiler to help the run-time system distribute the work of the parallel loops • according to the relative power of the processors • minimize communication and page sharing
  • 10. Adaptive Load Balancing (cont.) • Compile-Time Support for Load Balancing – The specific compiler adopts SUIF system, which is organized as a set of compiler passes. – The SUIF pass extracts the shared data access patterns in each of the SPMD regions, and feeds this information to the run-time system. • also responsible for adding hooks in the parallelized code to allow run-time library to change the load distribution -------- SUIF: Stanford University Intermediate Format SPMD: Single-Program Multiple-Data
  • 11. Adaptive Load Balancing (cont.) – Access pattern extraction • SUIF pass walks through the program looking for accesses to shared memory. – Prefetching • Use the access pattern information to prefetch data through prefetching calls. – Load balancing interface and strategy • The compiler can direct the run-time to choose between two partitioning strategies for distributing the parallel loops. 1. Shifting of loop boundaries 2. Multiple loop bounds
  • 12. Adaptive Load Balancing (cont.) • Run-Time Load Balancing Support – The run-time library is responsible for keeping track of the progress of each process • collect statistics about the execution time of each parallel task, and • adjust the load accordingly – Load balancing vs. Locality management • need to avoid unnecessary movement of data and minimize page sharing • Locality-conscious load balancing: the run-time library uses the information supplied by the compiler about what loop distribution strategy to use.
  • 13. Algorithms for Scheduling • Scheduling Malleable Parallel Tasks: An Asymptotic Fully Polynomial-Time Approximation Scheme [2002] • Mapping and Scheduling Heterogeneous Tasks using Genertic Algorithms [1995]

Notes de l'éditeur

  1. Design space exploration for platform-driven instantiation of a FORMLESS specification.
  2. FORMLESS specification of the sort example: A) Actor specifications. B-D) Example instantiations.
  3. The scheme attempts to balance load within a neighborhood by distributing tasks such that all neighbors have loads close to the neighborhood average.
  4. In terms of processing time the average grain size is defined as (Total Sequential Execution Time / Total Number of Message Processed)
  5. The goal is to minimize execution time by considering both communication and the computation components.