Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Adaptive Execution Support for Malleable Computation

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Map reduce
Map reduce
Chargement dans…3
×

Consultez-les par la suite

1 sur 13 Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Adaptive Execution Support for Malleable Computation (20)

Publicité

Plus par Qian Lin (12)

Adaptive Execution Support for Malleable Computation

  1. 1. Adaptive Execution Support for Malleable Computation Speaker: LIN Qian http://www.comp.nus.edu.sg/~linqian
  2. 2. Outline • Introduce the key ideas of 3 selected papers • Discussion
  3. 3. FORMLESS • FORMLESS: Scalable Utilization of Embedded Manycores in Streaming Applications [LCTES’12] – Functionally-cOnsistent stRucturally-MalLEabe Streaming Specification – Actor-oriented specification models – Space exploration scheme • to customize the application specification to better fit the target platform.
  4. 4. FORMLESS (cont.) • Space exploration for platform-driven instantiation
  5. 5. FORMLESS (cont.) • Example:
  6. 6. Dynamic Load Balancing • A Distributed and Adaptive Dynamic Load Balancing Scheme for Parallel Processing of Medium-Grain Tasks [IEEE Jounal, 1990] – Challenge: Allocate and distribute tasks dynamically with minimum run time overhead. – Design: A distributed and adaptive load balancing scheme for medium-grain tasks
  7. 7. Dynamic Load Balancing (cont.) • Key idea 1: Neighborhood average strategy – Attempts to balance load within a neighborhood by distributing tasks • such that all neighbors have loads close to the neighborhood average. – The decision when to balance load is based on the neighborhood state information that is checked periodically. • Each processor maintains status information of all its neighbors.
  8. 8. Dynamic Load Balancing (cont.) • Key idea 2: Grain Size Control – If the cost of making work available to another processor exceeds the cost of executing it at the local processor, then it does not make sense to decompose and parallelize work beyond a certain size or granularity of work. – Granularity control: To determine when to stop breaking down a computation into parallel computations at a frontier node, treating it as a leaf node and executing it sequentially.
  9. 9. Adaptive Load Balancing • Compiler and Run-Time Support for Adaptive Load Balancing in Software Distributed Shared Memory Systems [1998] – Use information provided by the compiler to help the run-time system distribute the work of the parallel loops • according to the relative power of the processors • minimize communication and page sharing
  10. 10. Adaptive Load Balancing (cont.) • Compile-Time Support for Load Balancing – The specific compiler adopts SUIF system, which is organized as a set of compiler passes. – The SUIF pass extracts the shared data access patterns in each of the SPMD regions, and feeds this information to the run-time system. • also responsible for adding hooks in the parallelized code to allow run-time library to change the load distribution -------- SUIF: Stanford University Intermediate Format SPMD: Single-Program Multiple-Data
  11. 11. Adaptive Load Balancing (cont.) – Access pattern extraction • SUIF pass walks through the program looking for accesses to shared memory. – Prefetching • Use the access pattern information to prefetch data through prefetching calls. – Load balancing interface and strategy • The compiler can direct the run-time to choose between two partitioning strategies for distributing the parallel loops. 1. Shifting of loop boundaries 2. Multiple loop bounds
  12. 12. Adaptive Load Balancing (cont.) • Run-Time Load Balancing Support – The run-time library is responsible for keeping track of the progress of each process • collect statistics about the execution time of each parallel task, and • adjust the load accordingly – Load balancing vs. Locality management • need to avoid unnecessary movement of data and minimize page sharing • Locality-conscious load balancing: the run-time library uses the information supplied by the compiler about what loop distribution strategy to use.
  13. 13. Algorithms for Scheduling • Scheduling Malleable Parallel Tasks: An Asymptotic Fully Polynomial-Time Approximation Scheme [2002] • Mapping and Scheduling Heterogeneous Tasks using Genertic Algorithms [1995]

Notes de l'éditeur

  • Design space exploration for platform-driven instantiation of a FORMLESS specification.
  • FORMLESS specification of the sort example: A) Actor specifications. B-D) Example instantiations.
  • The scheme attempts to balance load within a neighborhood by distributing tasks such that all neighbors have loads close to the neighborhood average.
  • In terms of processing time the average grain size is defined as (Total Sequential Execution Time / Total Number of Message Processed)
  • The goal is to minimize execution time by considering both communication and the computation components.

×