SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Many-Task Computing

   By: Harry Sunarsa
Defining MTC
• We want Enable the use of large-scale distributed
  system for task-parallel application.
• Which are linked into useful workflows through the
  looser-coupling model of passing data via files
  between dependent tasks.
• This potentially large class of task-parallel application is
  precluded from leveraging the increasing power of
  modern parallel system such as supercomputers(e.g.
  IBM blue gene/L [1]and IBM blue gene/P [2])
• Because the lack of efficient support in those system
  for “scripting” programming model [3].
various forms of scripting
   to automate end-to-end application processes




Task               Provenance        Bookkeeping
Coordination       Tracking
Database
                              or XML



Via Files                                                   Combination
                                                              of these
                        Approaches



            typically based on a model of loosely coupled
            computation, in which data is exchanged
Vast Increases

                                           Traditional manual
                                           processing and exploration
       Data Volume                             Unfavorable

   Complexity of data
   analysis procedures                     Compared with HPC
                                           process automated by
         Algorithm                         scientific workflow systems


 This presentation focuses on defining many-task computing, and
 the challenges that arise as dataset and computing systems are
 growing exponentially
• High performance computing can be
  considered to be part of the first
  category (denoted by the white area
• High throughput computing [12] can
  be considered to be a subset of the
  third category (denoted by the yellow
  area).
• Many-Task Computing can be
  considered as part of categories three
  and four (denoted by the yellow and
  green areas).
many-task computing
Various properties of a new emerging        New class definition
applications


         Large number of tasks
         (i.e. millions or more)


Relatively short per task execution times      Many -Task
      (i.e. seconds to minutes long)
                                               Computing
  Data intensive tasks (i.e. tens of MB
  of I/O per CPU second of compute)
• MTC emphasizes on using much large
  numbers of computing resources over short
  periods of time to accomplish many
  computational tasks, where the primary
  metrics are seconds
• HTC requires large amounts of computing for
  long periods of time with the primary metrics
  being operations per month [12]
• MTC denotes high-performance computations
  comprising multiple distinct activities, coupled
  via file system operations or message passing.
• MTC composed of many tasks (both
  independent and dependent tasks) that can
  be individually schedule many different
  computing resources across multiple
  administrative boundaries to achieve some
  larger application goal.
Task                                          Set of Task

     Small/Large                                     Static/Dynamic

 Uni/Multi-processor          MTC              Homogeneous/heterogeneous

Compute/data-Intensive                            Loosely/tightly coupled




        The aggregate number of tasks, quantity of computing,
             and volumes of data may be extremely large
MTC for Clusters, Grids, and
                           Supercomputers
• Clusters and Grids have been the preferred
  platform for loosely coupled application that
  have been part of HTC, which are managed
  and executed through workflow systems or
  parallel programming systems
• As the computing and storage scale increases,
  the set of problems that must be overcome to
  make MTC practical (ensuring good efficiency
  and utilization at large-scale) exacerbate.
application scalability


      reliability at scale                                  include local resource manager
                                                             scalability and granularity




                                      The                             efficient utilization of
                                   challenges                         the raw hardware
shared file system contention
                and scalability




    understanding the limitations of the HPC systems in order to identify promising
    and scientifically valuable MTC applications.
• MTC applies to not only traditional HTC
  environments such as clusters and Grids,
  assuming appropriate support in the middleware,
  but also supercomputers.
• Emerging petascale computing systems, such as
  IBM’s Blue Gene/P [2], incorporate high-speed,
  low-latency interconnects and other features
  designed to support tightly coupled parallel
  computations.
• petascale systems are more than just many
  processors with large peak petaflop ratings
• They normally come well balanced, with
  proprietary, highspeed, and low-latency
  network interconnects to give tightly-coupled
  applications good opportunities to scale well
  at full system scales.
Four factors that motivate the support of MTC
applications on petascale HPC systems:



     The I/O subsystems of                      Large-scale systems inevitably
1    petascale systems offer unique   3
                                                    have utilizationissues.
     capabilities needed by MTC
     applications

     The cost to manage and run             Perhaps most importantly, some applications
     on petascale systems like              are so demanding that only petascale systems
2    the Blue Gene/P is less than           have enough compute power to get results in a
                                      4
     that of conventional                   reasonable timeframe, or to exploit new
     clusters or Grids[15].                 opportunities in such applications.
The Data Deluge Challenge and the
           Growing Storage/Compute Gap
• Within the science domain, the data that
  needs to be processed generally grows faster
  than computational resources and their
  speed.
• The scientific community is facing an
  imminent flood of data expected from the
  next generation of experiments, simulations,
  sensors and satellites.
Analyze
               Large                                        Large
               Class                                        Data



           in turn would require that data and computations be distributed
           over many hundreds and thousands of nodes in order
            to achieve rapid turnaround times.




2/6/2013                                   HS                                18
• Many applications in the scientific computing
  generally use a shared infrastructure such as
  – TeraGrid [23]            where data movement relies on
  – Open Science Grid [24]   shared or parallel file systems.

• The rate of increase in the number of
  processors per system is outgrowing the rate
  of performance increase of parallel file
  systems, which requires rethinking existing
  data management techniques.
• data locality is critical to the successful and efficient use
  of large distributed systems for dataintensive
  applications [26, 27] in the face of a growing gap
  between compute power and storage performance.
• Large scale data management needs to be a primary
  objective for any middleware targeting to support MTC
  workloads.
• Ensures data movement is minimized by intelligent
  data-aware scheduling
   – among distributed computing sites (assuming that each site
     has a local area network shared storage infrastructure),
   – and among compute nodes (assuming that data can be
     stored on compute nodes’ local disk and/or memory).
MTC Application

• We have found many applications that are a
  better fit for MTC than HTC or HPC. Their
  characteristics include having a large number
  of small parallel jobs, a common pattern
  observed in many scientific applications [6].
• They also use files (instead of messages, as in
  MPI) for intra-processor communication,
  which tends to make these applications data
  intensive.
• In addition to streamlined task dispatching,
  scalable data management techniques are
  also required in order to support MTC
  applications.
• MTC applications are often data and/or meta-data
  intensive, as each job requires at least one input file
  and one output file, and can sometimes involve many
  files per job.
• These data management techniques need to make
  good utilization of the full network bandwidth of large
  scale systems, which is a function of the number of
  nodes and networking technology employed, as
  opposed to the relatively small number of storage
  servers that are behind a parallel file system or GridFTP
  server.

2/6/2013                     HS                           23
• There are various applications from many disciplines
  that demonstrate characteristics of MTC applications.
 These applications cover a wide range of domains, Such as:
  •astronomy                   •medical imaging
  •Physics                     •Chemistry
  •Astrophysics                •climate modeling
  •Pharmaceuticals             •economics and data analytics.
  •Bioinformatics
  •Biometrics
  •Neuroscience
• They often involve many tasks, ranging from
  tens of thousands to billions of tasks, and have
  a large variance of task execution times
  ranging from hundreds of milliseconds to
  hours.
• Furthermore, each task is involved in multiple
  reads and writes to and from files, which can
  range in size from kilobytes to gigabytes.

2/6/2013                HS                       25
• These characteristics made traditional resource
  management techniques found in HTC inefficient;
• Although some of these applications could be
  coded as HPC applications, due to the wide
  variance of the arrival rate of tasks from many
  users, an HPC implementation would also yield
  poor utilization.
• Furthermore, the data intensive nature of these
  applications can quickly saturate parallel file
  systems at even modest computing scales.
2/6/2013                HS                      26
Conclusion

• MTC – aims to bridge the gap between two
  computing paradigms, HTC and HPC.
• MTC applications are typically loosely coupled
  that are communication-intensive but not
  naturally expressed using standard message
  passing interface commonly found in high
  performance computing, drawing attention to
  the many computations that are
  heterogeneous but not “happily” parallel.

2/6/2013               HS                      27
• MTC is a broader definition than HTC, allowing
  for finer grained tasks, independent tasks as
  well as ones with dependencies, and allowing
  tightly coupled applications and loosely
  coupled applications to coexist on the same
  system.




2/6/2013               HS                      28
• Furthermore, having native support for data
  intensive applications is central to MTC, as there
  is a growing gap between storage performance of
  parallel file systems and the amount of
  processing power.
• As the size of scientific data sets and the
  resources required for analysis increase, data
  locality becomes crucial to the efficient use of
  large scale distributed systems for scientific and
  data-intensive applications [62].

2/6/2013                 HS                        29

Contenu connexe

Tendances

A survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmentA survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmenteSAT Publishing House
 
DYNAMIC ENERGY MANAGEMENT IN CLOUD DATA CENTERS: A SURVEY
DYNAMIC ENERGY MANAGEMENT IN CLOUD DATA CENTERS: A SURVEYDYNAMIC ENERGY MANAGEMENT IN CLOUD DATA CENTERS: A SURVEY
DYNAMIC ENERGY MANAGEMENT IN CLOUD DATA CENTERS: A SURVEYijccsa
 
% Razones para alta disponibilidad
% Razones para alta disponibilidad% Razones para alta disponibilidad
% Razones para alta disponibilidadCPVEN
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Dataijccsa
 
A Survey on Virtualization Data Centers For Green Cloud Computing
A Survey on Virtualization Data Centers For Green Cloud ComputingA Survey on Virtualization Data Centers For Green Cloud Computing
A Survey on Virtualization Data Centers For Green Cloud ComputingIJTET Journal
 
Scheduling in cloud computing
Scheduling in cloud computingScheduling in cloud computing
Scheduling in cloud computingijccsa
 
Survey on Synchronizing File Operations Along with Storage Scalable Mechanism
Survey on Synchronizing File Operations Along with Storage Scalable MechanismSurvey on Synchronizing File Operations Along with Storage Scalable Mechanism
Survey on Synchronizing File Operations Along with Storage Scalable MechanismIRJET Journal
 
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...ijcsit
 
Resource Monitoring Algorithms Evaluation For Cloud Environment
Resource Monitoring Algorithms Evaluation For Cloud EnvironmentResource Monitoring Algorithms Evaluation For Cloud Environment
Resource Monitoring Algorithms Evaluation For Cloud EnvironmentCSCJournals
 
Improve the Offloading Decision by Adaptive Partitioning of Task for Mobile C...
Improve the Offloading Decision by Adaptive Partitioning of Task for Mobile C...Improve the Offloading Decision by Adaptive Partitioning of Task for Mobile C...
Improve the Offloading Decision by Adaptive Partitioning of Task for Mobile C...IJCSIS Research Publications
 
Cloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceCloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceAIRCC Publishing Corporation
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

Tendances (16)

A survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmentA survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environment
 
DYNAMIC ENERGY MANAGEMENT IN CLOUD DATA CENTERS: A SURVEY
DYNAMIC ENERGY MANAGEMENT IN CLOUD DATA CENTERS: A SURVEYDYNAMIC ENERGY MANAGEMENT IN CLOUD DATA CENTERS: A SURVEY
DYNAMIC ENERGY MANAGEMENT IN CLOUD DATA CENTERS: A SURVEY
 
Grid computing for load balancing strategies
Grid computing for load balancing strategiesGrid computing for load balancing strategies
Grid computing for load balancing strategies
 
% Razones para alta disponibilidad
% Razones para alta disponibilidad% Razones para alta disponibilidad
% Razones para alta disponibilidad
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Data
 
A Survey on Virtualization Data Centers For Green Cloud Computing
A Survey on Virtualization Data Centers For Green Cloud ComputingA Survey on Virtualization Data Centers For Green Cloud Computing
A Survey on Virtualization Data Centers For Green Cloud Computing
 
Scheduling in cloud computing
Scheduling in cloud computingScheduling in cloud computing
Scheduling in cloud computing
 
Survey on Synchronizing File Operations Along with Storage Scalable Mechanism
Survey on Synchronizing File Operations Along with Storage Scalable MechanismSurvey on Synchronizing File Operations Along with Storage Scalable Mechanism
Survey on Synchronizing File Operations Along with Storage Scalable Mechanism
 
133
133133
133
 
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
 
Resource Monitoring Algorithms Evaluation For Cloud Environment
Resource Monitoring Algorithms Evaluation For Cloud EnvironmentResource Monitoring Algorithms Evaluation For Cloud Environment
Resource Monitoring Algorithms Evaluation For Cloud Environment
 
Improve the Offloading Decision by Adaptive Partitioning of Task for Mobile C...
Improve the Offloading Decision by Adaptive Partitioning of Task for Mobile C...Improve the Offloading Decision by Adaptive Partitioning of Task for Mobile C...
Improve the Offloading Decision by Adaptive Partitioning of Task for Mobile C...
 
Dremel
DremelDremel
Dremel
 
Cloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceCloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for Mapreduce
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Embedded systems Implementation in Cloud Challenges
Embedded systems Implementation in Cloud ChallengesEmbedded systems Implementation in Cloud Challenges
Embedded systems Implementation in Cloud Challenges
 

En vedette

Cycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC RunCycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC Runinside-BigData.com
 
SERC, IISc CRAY PetaFLOPS System
SERC, IISc CRAY PetaFLOPS SystemSERC, IISc CRAY PetaFLOPS System
SERC, IISc CRAY PetaFLOPS SystemAdarsh Patil
 
The Explosion of Petascale in the Race to Exascale
The Explosion of Petascale in the Race to ExascaleThe Explosion of Petascale in the Race to Exascale
The Explosion of Petascale in the Race to ExascaleIntel IT Center
 
Search algorithms for discrete optimization
Search algorithms for discrete optimizationSearch algorithms for discrete optimization
Search algorithms for discrete optimizationSally Salem
 
Challenges of UAV platforms
Challenges of UAV platformsChallenges of UAV platforms
Challenges of UAV platformsCIMMYT
 
Future of cloud storage
Future of cloud storageFuture of cloud storage
Future of cloud storageGlusterFS
 
Nanotechnology by sanchit sharma
Nanotechnology by sanchit sharmaNanotechnology by sanchit sharma
Nanotechnology by sanchit sharmaSanchit Sharma
 
Optical Computing Technology
Optical Computing TechnologyOptical Computing Technology
Optical Computing TechnologyKanchan Shinde
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems SHATHAN
 
Quantum Computers
Quantum ComputersQuantum Computers
Quantum ComputersDeepti.B
 
UAV Presentation
UAV PresentationUAV Presentation
UAV PresentationRuyyan
 

En vedette (14)

Uav overview
Uav overviewUav overview
Uav overview
 
Cycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC RunCycle Computing Record-breaking Petascale HPC Run
Cycle Computing Record-breaking Petascale HPC Run
 
SERC, IISc CRAY PetaFLOPS System
SERC, IISc CRAY PetaFLOPS SystemSERC, IISc CRAY PetaFLOPS System
SERC, IISc CRAY PetaFLOPS System
 
The Explosion of Petascale in the Race to Exascale
The Explosion of Petascale in the Race to ExascaleThe Explosion of Petascale in the Race to Exascale
The Explosion of Petascale in the Race to Exascale
 
Search algorithms for discrete optimization
Search algorithms for discrete optimizationSearch algorithms for discrete optimization
Search algorithms for discrete optimization
 
Challenges of UAV platforms
Challenges of UAV platformsChallenges of UAV platforms
Challenges of UAV platforms
 
Future of cloud storage
Future of cloud storageFuture of cloud storage
Future of cloud storage
 
Nanotechnology by sanchit sharma
Nanotechnology by sanchit sharmaNanotechnology by sanchit sharma
Nanotechnology by sanchit sharma
 
Opticalcomputing final
Opticalcomputing finalOpticalcomputing final
Opticalcomputing final
 
Optical Computing Technology
Optical Computing TechnologyOptical Computing Technology
Optical Computing Technology
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
 
Quantum Computers
Quantum ComputersQuantum Computers
Quantum Computers
 
UAV Presentation
UAV PresentationUAV Presentation
UAV Presentation
 
Quantum computer ppt
Quantum computer pptQuantum computer ppt
Quantum computer ppt
 

Similaire à many-task computing

Dataintensive
DataintensiveDataintensive
Dataintensivesulfath
 
5. the grid implementing production grid
5. the grid implementing production grid5. the grid implementing production grid
5. the grid implementing production gridDr Sandeep Kumar Poonia
 
Grid and cluster_computing_chapter1
Grid and cluster_computing_chapter1Grid and cluster_computing_chapter1
Grid and cluster_computing_chapter1Bharath Kumar
 
An efficient scheduling policy for load balancing model for computational gri...
An efficient scheduling policy for load balancing model for computational gri...An efficient scheduling policy for load balancing model for computational gri...
An efficient scheduling policy for load balancing model for computational gri...Alexander Decker
 
A Case Study On Implementation Of Grid Computing To Academic Institution
A Case Study On Implementation Of Grid Computing To Academic InstitutionA Case Study On Implementation Of Grid Computing To Academic Institution
A Case Study On Implementation Of Grid Computing To Academic InstitutionArlene Smith
 
Bt9002 grid computing 1
Bt9002 grid computing 1Bt9002 grid computing 1
Bt9002 grid computing 1Techglyphs
 
Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptxRATISHKUMAR32
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
 
CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfHasanAfwaaz1
 
CloudComputing_UNIT5.pdf
CloudComputing_UNIT5.pdfCloudComputing_UNIT5.pdf
CloudComputing_UNIT5.pdfkhan593595
 
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy DatabasesAssisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy DatabasesGihan Wikramanayake
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computinghuda2018
 
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSFAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSMaurvi04
 
A Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving EnvironmentA Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving EnvironmentSheila Sinclair
 
A Comparative Study: Taxonomy of High Performance Computing (HPC)
A Comparative Study: Taxonomy of High Performance Computing (HPC) A Comparative Study: Taxonomy of High Performance Computing (HPC)
A Comparative Study: Taxonomy of High Performance Computing (HPC) IJECEIAES
 

Similaire à many-task computing (20)

Dataintensive
DataintensiveDataintensive
Dataintensive
 
5. the grid implementing production grid
5. the grid implementing production grid5. the grid implementing production grid
5. the grid implementing production grid
 
Grid and cluster_computing_chapter1
Grid and cluster_computing_chapter1Grid and cluster_computing_chapter1
Grid and cluster_computing_chapter1
 
An efficient scheduling policy for load balancing model for computational gri...
An efficient scheduling policy for load balancing model for computational gri...An efficient scheduling policy for load balancing model for computational gri...
An efficient scheduling policy for load balancing model for computational gri...
 
A Case Study On Implementation Of Grid Computing To Academic Institution
A Case Study On Implementation Of Grid Computing To Academic InstitutionA Case Study On Implementation Of Grid Computing To Academic Institution
A Case Study On Implementation Of Grid Computing To Academic Institution
 
Bt9002 grid computing 1
Bt9002 grid computing 1Bt9002 grid computing 1
Bt9002 grid computing 1
 
Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptx
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdf
 
CloudComputing_UNIT5.pdf
CloudComputing_UNIT5.pdfCloudComputing_UNIT5.pdf
CloudComputing_UNIT5.pdf
 
 
40 41
40 4140 41
40 41
 
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy DatabasesAssisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
 
CC unit 1.pptx
CC unit 1.pptxCC unit 1.pptx
CC unit 1.pptx
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSFAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
 
A Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving EnvironmentA Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving Environment
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Grid computing 12
Grid computing 12Grid computing 12
Grid computing 12
 
A Comparative Study: Taxonomy of High Performance Computing (HPC)
A Comparative Study: Taxonomy of High Performance Computing (HPC) A Comparative Study: Taxonomy of High Performance Computing (HPC)
A Comparative Study: Taxonomy of High Performance Computing (HPC)
 

many-task computing

  • 1. Many-Task Computing By: Harry Sunarsa
  • 2. Defining MTC • We want Enable the use of large-scale distributed system for task-parallel application. • Which are linked into useful workflows through the looser-coupling model of passing data via files between dependent tasks. • This potentially large class of task-parallel application is precluded from leveraging the increasing power of modern parallel system such as supercomputers(e.g. IBM blue gene/L [1]and IBM blue gene/P [2]) • Because the lack of efficient support in those system for “scripting” programming model [3].
  • 3. various forms of scripting to automate end-to-end application processes Task Provenance Bookkeeping Coordination Tracking
  • 4. Database or XML Via Files Combination of these Approaches typically based on a model of loosely coupled computation, in which data is exchanged
  • 5. Vast Increases Traditional manual processing and exploration Data Volume Unfavorable Complexity of data analysis procedures Compared with HPC process automated by Algorithm scientific workflow systems This presentation focuses on defining many-task computing, and the challenges that arise as dataset and computing systems are growing exponentially
  • 6. • High performance computing can be considered to be part of the first category (denoted by the white area • High throughput computing [12] can be considered to be a subset of the third category (denoted by the yellow area). • Many-Task Computing can be considered as part of categories three and four (denoted by the yellow and green areas).
  • 8. Various properties of a new emerging New class definition applications Large number of tasks (i.e. millions or more) Relatively short per task execution times Many -Task (i.e. seconds to minutes long) Computing Data intensive tasks (i.e. tens of MB of I/O per CPU second of compute)
  • 9. • MTC emphasizes on using much large numbers of computing resources over short periods of time to accomplish many computational tasks, where the primary metrics are seconds • HTC requires large amounts of computing for long periods of time with the primary metrics being operations per month [12]
  • 10. • MTC denotes high-performance computations comprising multiple distinct activities, coupled via file system operations or message passing. • MTC composed of many tasks (both independent and dependent tasks) that can be individually schedule many different computing resources across multiple administrative boundaries to achieve some larger application goal.
  • 11. Task Set of Task Small/Large Static/Dynamic Uni/Multi-processor MTC Homogeneous/heterogeneous Compute/data-Intensive Loosely/tightly coupled The aggregate number of tasks, quantity of computing, and volumes of data may be extremely large
  • 12. MTC for Clusters, Grids, and Supercomputers • Clusters and Grids have been the preferred platform for loosely coupled application that have been part of HTC, which are managed and executed through workflow systems or parallel programming systems • As the computing and storage scale increases, the set of problems that must be overcome to make MTC practical (ensuring good efficiency and utilization at large-scale) exacerbate.
  • 13. application scalability reliability at scale include local resource manager scalability and granularity The efficient utilization of challenges the raw hardware shared file system contention and scalability understanding the limitations of the HPC systems in order to identify promising and scientifically valuable MTC applications.
  • 14. • MTC applies to not only traditional HTC environments such as clusters and Grids, assuming appropriate support in the middleware, but also supercomputers. • Emerging petascale computing systems, such as IBM’s Blue Gene/P [2], incorporate high-speed, low-latency interconnects and other features designed to support tightly coupled parallel computations. • petascale systems are more than just many processors with large peak petaflop ratings
  • 15. • They normally come well balanced, with proprietary, highspeed, and low-latency network interconnects to give tightly-coupled applications good opportunities to scale well at full system scales.
  • 16. Four factors that motivate the support of MTC applications on petascale HPC systems: The I/O subsystems of Large-scale systems inevitably 1 petascale systems offer unique 3 have utilizationissues. capabilities needed by MTC applications The cost to manage and run Perhaps most importantly, some applications on petascale systems like are so demanding that only petascale systems 2 the Blue Gene/P is less than have enough compute power to get results in a 4 that of conventional reasonable timeframe, or to exploit new clusters or Grids[15]. opportunities in such applications.
  • 17. The Data Deluge Challenge and the Growing Storage/Compute Gap • Within the science domain, the data that needs to be processed generally grows faster than computational resources and their speed. • The scientific community is facing an imminent flood of data expected from the next generation of experiments, simulations, sensors and satellites.
  • 18. Analyze Large Large Class Data in turn would require that data and computations be distributed over many hundreds and thousands of nodes in order to achieve rapid turnaround times. 2/6/2013 HS 18
  • 19. • Many applications in the scientific computing generally use a shared infrastructure such as – TeraGrid [23] where data movement relies on – Open Science Grid [24] shared or parallel file systems. • The rate of increase in the number of processors per system is outgrowing the rate of performance increase of parallel file systems, which requires rethinking existing data management techniques.
  • 20. • data locality is critical to the successful and efficient use of large distributed systems for dataintensive applications [26, 27] in the face of a growing gap between compute power and storage performance. • Large scale data management needs to be a primary objective for any middleware targeting to support MTC workloads. • Ensures data movement is minimized by intelligent data-aware scheduling – among distributed computing sites (assuming that each site has a local area network shared storage infrastructure), – and among compute nodes (assuming that data can be stored on compute nodes’ local disk and/or memory).
  • 21. MTC Application • We have found many applications that are a better fit for MTC than HTC or HPC. Their characteristics include having a large number of small parallel jobs, a common pattern observed in many scientific applications [6].
  • 22. • They also use files (instead of messages, as in MPI) for intra-processor communication, which tends to make these applications data intensive. • In addition to streamlined task dispatching, scalable data management techniques are also required in order to support MTC applications.
  • 23. • MTC applications are often data and/or meta-data intensive, as each job requires at least one input file and one output file, and can sometimes involve many files per job. • These data management techniques need to make good utilization of the full network bandwidth of large scale systems, which is a function of the number of nodes and networking technology employed, as opposed to the relatively small number of storage servers that are behind a parallel file system or GridFTP server. 2/6/2013 HS 23
  • 24. • There are various applications from many disciplines that demonstrate characteristics of MTC applications. These applications cover a wide range of domains, Such as: •astronomy •medical imaging •Physics •Chemistry •Astrophysics •climate modeling •Pharmaceuticals •economics and data analytics. •Bioinformatics •Biometrics •Neuroscience
  • 25. • They often involve many tasks, ranging from tens of thousands to billions of tasks, and have a large variance of task execution times ranging from hundreds of milliseconds to hours. • Furthermore, each task is involved in multiple reads and writes to and from files, which can range in size from kilobytes to gigabytes. 2/6/2013 HS 25
  • 26. • These characteristics made traditional resource management techniques found in HTC inefficient; • Although some of these applications could be coded as HPC applications, due to the wide variance of the arrival rate of tasks from many users, an HPC implementation would also yield poor utilization. • Furthermore, the data intensive nature of these applications can quickly saturate parallel file systems at even modest computing scales. 2/6/2013 HS 26
  • 27. Conclusion • MTC – aims to bridge the gap between two computing paradigms, HTC and HPC. • MTC applications are typically loosely coupled that are communication-intensive but not naturally expressed using standard message passing interface commonly found in high performance computing, drawing attention to the many computations that are heterogeneous but not “happily” parallel. 2/6/2013 HS 27
  • 28. • MTC is a broader definition than HTC, allowing for finer grained tasks, independent tasks as well as ones with dependencies, and allowing tightly coupled applications and loosely coupled applications to coexist on the same system. 2/6/2013 HS 28
  • 29. • Furthermore, having native support for data intensive applications is central to MTC, as there is a growing gap between storage performance of parallel file systems and the amount of processing power. • As the size of scientific data sets and the resources required for analysis increase, data locality becomes crucial to the efficient use of large scale distributed systems for scientific and data-intensive applications [62]. 2/6/2013 HS 29

Notes de l'éditeur

  1. Clusters with 62K processor cores (e.g., TACC Sun Constellation System, Ranger), Grids (e.g., TeraGrid)with over a dozen sites and 161K processors, and supercomputers with 160K processors (e.g., IBM BlueGene/P) are now available to the scientific community.These large HPC systems are considered efficient at executing tightly coupled parallel jobs within aparticular machine using MPI to achieve inter-process communication.We proposed using HPC systems for loosely-coupled applications, which involve the execution of independent, sequential jobs that can beindividually scheduled, and using files for inter-process communication.
  2. Petascale= a computer system that capable of reaching performance in excess of one petaflopPetaflop = 10 ^15 flopsTeraflop= 10^12 flops
  3. IBM has proposed in their internal project Kittyhawk [15] that Blue Gene/Pcan be used to run non-traditional workloads (e.g. HTC).
  4. For instance, in the astronomy domain the Sloan Digital Sky Survey [19] has datasets that exceed 10 terabytes insize. They can reach up to 100 terabytes or even petabytes if we consider multiple surveys and the timedimension. In physics, the CMS detector being built to run at CERN’s Large Hadron Collider [20] is expectedto generate over a petabyte of data per year. In the bioinformatics domain, the rate of growth of DNAdatabases such as GenBank [21] and European Molecular Biology Laboratory (EMBL) [22] has beenfollowing an exponential trend, with a doubling time estimated to be 9-12 months.
  5. For example, a cluster that was placed in service in 2002 with 316 processors has a parallel filesystem (i.e. GPFS [25]) rated at 1GB/s, yielding 3.2MB/s per processor of bandwidth. The second largest open science supercomputer, the IBM Blue Gene/P from Argonne National Laboratory, has 160Kprocessors, and a parallel file system (i.e. also GPFS rated at 8GB/s, yielding a mere 0.05MB/s per processor. That is a 65X reduction in bandwidth between a system from 2002 and one from 2008.Unfortunately, this trend is not bound to stop, as advances multi-core and many-core processors willincrease the number of processor cores one to two orders of magnitude over the next decade. [4]