SlideShare une entreprise Scribd logo
1  sur  28
Introduction to
Parallel Computing



                     Presented by
         Supasit Kajkamhaeng
                                    1
Computational Problem

         Problem



                       …………



        Instructions

                              2
Serial Computing


    Problem



               …
                   CPU

    Instructions
        Time

                         3
What is Parallel Computing?
   A form of computation in which many calculations are
    carried out simultaneously, operating on the principle
    that large problems can often be divided into smaller
    ones, which are then solved concurrently ("in parallel").
                       1
    [Almasi and Gottlieb, 1989]

                                           Problem
                                   Task   Problem
                                          Task Task   Task
                    Instructions




                                    …      …    …      …


                                   CPU    CPU   CPU   CPU
                                                                4
Pattern of Parallelism
 Data   parallelism [Quinn, 2003]     2




  There are independent tasks applying the same
   operation to different elements of a data set.
            for i ← 0 to 99 do
                  a[i] = b[i] + c[i]
            endfor


 Functional       Parallelism [Quinn, 2003]   2




  There are independent tasks applying different
   operations to different data elements.
            a = 2, b=3
            m = (a + b) / 2
            n = a 2 + b2
                                                    5
Data Communications

     Task 1              Task 2




              exchange

                                  6
Why use Parallel Computing?
  Reduce computing time
   More Processor




                              7
Why use Parallel Computing? (1)

  Solve larger problems
   More Memory
                               Problem
                       Task   Problem
                              Task Task   Task
        Instructions




                        …      …    …      …


                       RAM    RAM RAM
                                RAM       RAM


                                                 8
Parallel Computing Systems
      • A single machine with multi-core processors

            Process
                                            Memory


                                    C   C        C   C
            Multithreaded
                                    C   C        C   C
                                            P            P

Problem

 Limits of a single machine (performance, available memory)
                                                              9
What is Cluster?
 A group of linked computers, working together
  closely so that in many respects they from a single
  computer
 To improve performance and/or availability over
  that provided by a single computer     3
 [Webopedia computer dictionary, 2007]




      High-Performance                       High-Availability
                                                                 10
Cluster Architecture




                       11
Message-Passing model
                     The system is assumed to be a collection of processors,
                      each with its own local memory (Distributed memory
                      system)
                     A processor has direct access only to the instructions
                      and data stored in its local memory
                     An interconnection network supports message passing
                      between processors




                                                       MPI Standard
                2
[Quinn, 2003]                                                                   12
Performance metrics
for parallel computing
 • Speedup [Kumar et al., 1994]   4



    How much performance gain is achieved
     parallelizing a given application over a sequential
     implementation

             SP - speedup with p processors
                               TS
                      Sp =               P      Ts   Tp    Sp
                               TP
                                         4      40   15    2.67
     where
             TS - a sequential execution time
             P - a number of processors
             TP - a parallel execution time
                    with p processors
                                                                  13
Speedup




                   5
[Eijkhout, 2011]
                       14
Efficiency
   A measure of processor utilization [Quinn, 2003]   2




                 EP - Efficiency with p processors
                                   SP         P        Sp       Ep
                          Ep =
                                   P          4            2    0.5
                                              8            3   0.375

   In practice, speedup is less than p and efficiency is
    between zero and one, depending on the degree of
    effectiveness with which the processors are utilized
                       5
    [Eijkhout, 2011]



                                                                       15
Effective factors of
Parallel Performance
 • Portion of computation                [Quinn, 2003]
                                                         2




   Computations that must be performed sequentially
   Computations that can be performed in parallel

       fs - Serial fraction of computation
       fp - Parallel fraction of computation

                 TS         TS             1
        Sp =        =                 =
                 TP   fs(Ts) + fp(Ts)   fs + fp
                                  P          P
 TS        fs       fp      fs(TS)    fp(Ts)

 100      10%      90%        10       90                    16
Effective factors of
Parallel Performance (1)
 • Parallel Overhead          [Barney, 2011]
                                               6




   The amount of time required to coordinate
    parallel tasks, as opposed to doing useful
    work
    o Task start-up time
    o Synchronizations
    o Data communications
    o Task termination time

 • Load balancing, etc.

                                                   17
Effective factors of
Parallel Performance (2)

   Tp = (fs)Ts + (1 – fs)Ts + Toverhead
                        P




  Sp   =
         TS =              TS
         TP   (fs)Ts + (1 – fs)Ts + Toverhead
                             P


                                                18
Effective factors of
Parallel Performance (3)

                         Fixed Problem Size



                          Fixed
 Sp = TS =                TS
      TP   (fs)Ts + (1 – fs)Ts + Toverhead
                          P


                                              19
Effective factors of
      Parallel Performance (4)
       Fixed P; Problem Size         => Speedup

                                           P
           Sp = TS =     0
                                  TS
                                                0
                TP   (fs)Ts + (1 – fs)Ts + Toverhead
                                       P


2D grid calculations   85 mins 85%         680 mins 97.84%
Serial fraction        15 mins 15%         15 mins 2.16%


                                                             20
Case Study
 Hardware Configuration
     Linux Cluster (4 compute nodes)
     Detail of Compute node
       o   2x Intel Xeon 2.80 GHz (Single core)
       o   4 GB RAM
       o   Gigabit Ethernet
       o   CentOS 4.3



                                                  21
Case Study - CFD
 Parallel Fluent   Processing [Junhong, 2004]   7




  Run Fluent solver on two or more CPUs
   simultaneously to calculate a computational
   fluid dynamics (CFD) job




                                                     22
Case Study – CFD (1)
   Case Test #1




                       23
Case Study – CFD (2)
   Case Test #1 – Runtime




                             24
Case Study – CFD (3)
   Case Test #1 – Speedup




                             25
Case Study – CFD (4)
   Case Test #1 – Efficiency




                                26
Conclusion
  Parallel computing help to save time of
   computation and solve larger problems over that
   provided by a single computer (sequential
   computing)
  To use parallel computers, then software is
   developed with parallel programming model
  Performance of parallel computing is measured
   with speedup and efficiency




                                                     27
Reference
1.   G.S. Almasi and A. Gottlieb. 1989. Highly Parallel Computing. The
     Benjamin-Cummings publishers, Redwood City, CA.
2.   M.J. Quinn. 2003. Parallel Programming in C with MPI and
     OpenMP. The McGraw-Hill Companies, Inc. NY.
3.   What is clustering?. Webopedia computer dictionary. Retrieved on
     November 7, 2007.
4.   V. Kumar, A. Grama, A. Gupta, and G. Karypis. 1994. Introduction
     to parallel computing: design and analysis of parallel algorithms.
     The Benjamin-Cummings publishers, Redwood City, CA.
5.   V. Eijkhout. 2011. Introduction to Parallel Computing. Texas
     Advanced Computing Center (TACC), The University of Texas at
     Austin.
6.   B. Barney. 2011. Introduction to Parallel Computing. Lawrence
     Livermore National Laboratory.
7.   Junhong, W. 2004. Parallel Fluent Processing. SVU/Academic
     Computing, Computer Centre, National University of Singapore.

                                                                          28

Contenu connexe

Tendances

Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingRoshan Karunarathna
 
Parallel language & compilers
Parallel language & compilersParallel language & compilers
Parallel language & compilersdikshagupta111
 
Interleaved memory
Interleaved memoryInterleaved memory
Interleaved memoryashishgy
 
Process Synchronization And Deadlocks
Process Synchronization And DeadlocksProcess Synchronization And Deadlocks
Process Synchronization And Deadlockstech2click
 
Google File System
Google File SystemGoogle File System
Google File Systemnadikari123
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel ProgrammingUday Sharma
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING Zena Abo-Altaheen
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingAkhila Prabhakaran
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computingpurplesea
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memoryHamza Zahid
 
Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Bhavik Vashi
 
Introduction To Parallel Computing
Introduction To Parallel ComputingIntroduction To Parallel Computing
Introduction To Parallel ComputingJörn Dinkla
 
Process scheduling (CPU Scheduling)
Process scheduling (CPU Scheduling)Process scheduling (CPU Scheduling)
Process scheduling (CPU Scheduling)Mukesh Chinta
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)Dinesh Modak
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computingRashid Ansari
 
Architectural Development Tracks
Architectural Development TracksArchitectural Development Tracks
Architectural Development TracksANJALIG10
 

Tendances (20)

Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Parallel language & compilers
Parallel language & compilersParallel language & compilers
Parallel language & compilers
 
Interleaved memory
Interleaved memoryInterleaved memory
Interleaved memory
 
Process Synchronization And Deadlocks
Process Synchronization And DeadlocksProcess Synchronization And Deadlocks
Process Synchronization And Deadlocks
 
Google File System
Google File SystemGoogle File System
Google File System
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computing
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memory
 
Chapter 8 - Main Memory
Chapter 8 - Main MemoryChapter 8 - Main Memory
Chapter 8 - Main Memory
 
SCHEDULING ALGORITHMS
SCHEDULING ALGORITHMSSCHEDULING ALGORITHMS
SCHEDULING ALGORITHMS
 
Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)
 
Introduction To Parallel Computing
Introduction To Parallel ComputingIntroduction To Parallel Computing
Introduction To Parallel Computing
 
Process scheduling (CPU Scheduling)
Process scheduling (CPU Scheduling)Process scheduling (CPU Scheduling)
Process scheduling (CPU Scheduling)
 
Distributive operating system
Distributive operating systemDistributive operating system
Distributive operating system
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computing
 
Architectural Development Tracks
Architectural Development TracksArchitectural Development Tracks
Architectural Development Tracks
 

En vedette

A STUDY ON JOB SCHEDULING IN CLOUD ENVIRONMENT
A STUDY ON JOB SCHEDULING IN CLOUD ENVIRONMENTA STUDY ON JOB SCHEDULING IN CLOUD ENVIRONMENT
A STUDY ON JOB SCHEDULING IN CLOUD ENVIRONMENTpharmaindexing
 
Nephele efficient parallel data processing in the cloud
Nephele  efficient parallel data processing in the cloudNephele  efficient parallel data processing in the cloud
Nephele efficient parallel data processing in the cloudArshams
 
Genetic Approach to Parallel Scheduling
Genetic Approach to Parallel SchedulingGenetic Approach to Parallel Scheduling
Genetic Approach to Parallel SchedulingIOSR Journals
 
Bi criteria scheduling on parallel machines under fuzzy processing time
Bi criteria scheduling on parallel machines under fuzzy processing timeBi criteria scheduling on parallel machines under fuzzy processing time
Bi criteria scheduling on parallel machines under fuzzy processing timeboujazra
 
Parallel and Distributed Computing: BOINC Grid Implementation Paper
Parallel and Distributed Computing: BOINC Grid Implementation PaperParallel and Distributed Computing: BOINC Grid Implementation Paper
Parallel and Distributed Computing: BOINC Grid Implementation PaperRodrigo Neves
 
Scalable Parallel Computing on Clouds
Scalable Parallel Computing on CloudsScalable Parallel Computing on Clouds
Scalable Parallel Computing on CloudsThilina Gunarathne
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologiesjaliyae
 
Task scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingTask scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingRamandeep Kaur
 
cloud scheduling
cloud schedulingcloud scheduling
cloud schedulingMudit Verma
 
Cloud Computing Ppt
Cloud Computing PptCloud Computing Ppt
Cloud Computing PptAnjoum .
 

En vedette (13)

A STUDY ON JOB SCHEDULING IN CLOUD ENVIRONMENT
A STUDY ON JOB SCHEDULING IN CLOUD ENVIRONMENTA STUDY ON JOB SCHEDULING IN CLOUD ENVIRONMENT
A STUDY ON JOB SCHEDULING IN CLOUD ENVIRONMENT
 
EFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTING
EFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTINGEFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTING
EFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTING
 
Nephele efficient parallel data processing in the cloud
Nephele  efficient parallel data processing in the cloudNephele  efficient parallel data processing in the cloud
Nephele efficient parallel data processing in the cloud
 
Genetic Approach to Parallel Scheduling
Genetic Approach to Parallel SchedulingGenetic Approach to Parallel Scheduling
Genetic Approach to Parallel Scheduling
 
Bi criteria scheduling on parallel machines under fuzzy processing time
Bi criteria scheduling on parallel machines under fuzzy processing timeBi criteria scheduling on parallel machines under fuzzy processing time
Bi criteria scheduling on parallel machines under fuzzy processing time
 
Cloud Computing
Cloud Computing Cloud Computing
Cloud Computing
 
Parallel and Distributed Computing: BOINC Grid Implementation Paper
Parallel and Distributed Computing: BOINC Grid Implementation PaperParallel and Distributed Computing: BOINC Grid Implementation Paper
Parallel and Distributed Computing: BOINC Grid Implementation Paper
 
Scalable Parallel Computing on Clouds
Scalable Parallel Computing on CloudsScalable Parallel Computing on Clouds
Scalable Parallel Computing on Clouds
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologies
 
Task scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingTask scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud Computing
 
cloud scheduling
cloud schedulingcloud scheduling
cloud scheduling
 
Cloud Computing Ppt
Cloud Computing PptCloud Computing Ppt
Cloud Computing Ppt
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 

Similaire à Full introduction to_parallel_computing

Patterns of parallel programming
Patterns of parallel programmingPatterns of parallel programming
Patterns of parallel programmingAlex Tumanoff
 
Lecture 3
Lecture 3Lecture 3
Lecture 3Mr SMAK
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxkrnaween
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
Quantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task BenchQuantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task BenchPatrick Diehl
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
parellel computing
parellel computingparellel computing
parellel computingkatakdound
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterSudhang Shankar
 
Introduction to heterogeneous_computing_for_hpc
Introduction to heterogeneous_computing_for_hpcIntroduction to heterogeneous_computing_for_hpc
Introduction to heterogeneous_computing_for_hpcSupasit Kajkamhaeng
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeRizwan Habib
 

Similaire à Full introduction to_parallel_computing (20)

Patterns of parallel programming
Patterns of parallel programmingPatterns of parallel programming
Patterns of parallel programming
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
 
Aca11 bk2 ch9
Aca11 bk2 ch9Aca11 bk2 ch9
Aca11 bk2 ch9
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
Lecture1
Lecture1Lecture1
Lecture1
 
EEDC Programming Models
EEDC Programming ModelsEEDC Programming Models
EEDC Programming Models
 
Quantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task BenchQuantifying Overheads in Charm++ and HPX using Task Bench
Quantifying Overheads in Charm++ and HPX using Task Bench
 
Computer arch
Computer archComputer arch
Computer arch
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
parellel computing
parellel computingparellel computing
parellel computing
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
 
Ch1
Ch1Ch1
Ch1
 
Ch1
Ch1Ch1
Ch1
 
Introduction to heterogeneous_computing_for_hpc
Introduction to heterogeneous_computing_for_hpcIntroduction to heterogeneous_computing_for_hpc
Introduction to heterogeneous_computing_for_hpc
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
04 performance
04 performance04 performance
04 performance
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
 

Full introduction to_parallel_computing

  • 1. Introduction to Parallel Computing Presented by Supasit Kajkamhaeng 1
  • 2. Computational Problem Problem ………… Instructions 2
  • 3. Serial Computing Problem … CPU Instructions Time 3
  • 4. What is Parallel Computing?  A form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel"). 1 [Almasi and Gottlieb, 1989] Problem Task Problem Task Task Task Instructions … … … … CPU CPU CPU CPU 4
  • 5. Pattern of Parallelism  Data parallelism [Quinn, 2003] 2  There are independent tasks applying the same operation to different elements of a data set. for i ← 0 to 99 do a[i] = b[i] + c[i] endfor  Functional Parallelism [Quinn, 2003] 2  There are independent tasks applying different operations to different data elements. a = 2, b=3 m = (a + b) / 2 n = a 2 + b2 5
  • 6. Data Communications Task 1 Task 2 exchange 6
  • 7. Why use Parallel Computing?  Reduce computing time  More Processor 7
  • 8. Why use Parallel Computing? (1)  Solve larger problems  More Memory Problem Task Problem Task Task Task Instructions … … … … RAM RAM RAM RAM RAM 8
  • 9. Parallel Computing Systems • A single machine with multi-core processors Process Memory C C C C Multithreaded C C C C P P Problem Limits of a single machine (performance, available memory) 9
  • 10. What is Cluster?  A group of linked computers, working together closely so that in many respects they from a single computer  To improve performance and/or availability over that provided by a single computer 3 [Webopedia computer dictionary, 2007] High-Performance High-Availability 10
  • 12. Message-Passing model  The system is assumed to be a collection of processors, each with its own local memory (Distributed memory system)  A processor has direct access only to the instructions and data stored in its local memory  An interconnection network supports message passing between processors MPI Standard 2 [Quinn, 2003] 12
  • 13. Performance metrics for parallel computing • Speedup [Kumar et al., 1994] 4  How much performance gain is achieved parallelizing a given application over a sequential implementation SP - speedup with p processors TS Sp = P Ts Tp Sp TP 4 40 15 2.67 where TS - a sequential execution time P - a number of processors TP - a parallel execution time with p processors 13
  • 14. Speedup 5 [Eijkhout, 2011] 14
  • 15. Efficiency  A measure of processor utilization [Quinn, 2003] 2 EP - Efficiency with p processors SP P Sp Ep Ep = P 4 2 0.5 8 3 0.375  In practice, speedup is less than p and efficiency is between zero and one, depending on the degree of effectiveness with which the processors are utilized 5 [Eijkhout, 2011] 15
  • 16. Effective factors of Parallel Performance • Portion of computation [Quinn, 2003] 2  Computations that must be performed sequentially  Computations that can be performed in parallel fs - Serial fraction of computation fp - Parallel fraction of computation TS TS 1 Sp = = = TP fs(Ts) + fp(Ts) fs + fp P P TS fs fp fs(TS) fp(Ts) 100 10% 90% 10 90 16
  • 17. Effective factors of Parallel Performance (1) • Parallel Overhead [Barney, 2011] 6  The amount of time required to coordinate parallel tasks, as opposed to doing useful work o Task start-up time o Synchronizations o Data communications o Task termination time • Load balancing, etc. 17
  • 18. Effective factors of Parallel Performance (2) Tp = (fs)Ts + (1 – fs)Ts + Toverhead P Sp = TS = TS TP (fs)Ts + (1 – fs)Ts + Toverhead P 18
  • 19. Effective factors of Parallel Performance (3) Fixed Problem Size Fixed Sp = TS = TS TP (fs)Ts + (1 – fs)Ts + Toverhead P 19
  • 20. Effective factors of Parallel Performance (4) Fixed P; Problem Size => Speedup P Sp = TS = 0 TS 0 TP (fs)Ts + (1 – fs)Ts + Toverhead P 2D grid calculations 85 mins 85% 680 mins 97.84% Serial fraction 15 mins 15% 15 mins 2.16% 20
  • 21. Case Study  Hardware Configuration  Linux Cluster (4 compute nodes)  Detail of Compute node o 2x Intel Xeon 2.80 GHz (Single core) o 4 GB RAM o Gigabit Ethernet o CentOS 4.3 21
  • 22. Case Study - CFD  Parallel Fluent Processing [Junhong, 2004] 7  Run Fluent solver on two or more CPUs simultaneously to calculate a computational fluid dynamics (CFD) job 22
  • 23. Case Study – CFD (1)  Case Test #1 23
  • 24. Case Study – CFD (2)  Case Test #1 – Runtime 24
  • 25. Case Study – CFD (3)  Case Test #1 – Speedup 25
  • 26. Case Study – CFD (4)  Case Test #1 – Efficiency 26
  • 27. Conclusion  Parallel computing help to save time of computation and solve larger problems over that provided by a single computer (sequential computing)  To use parallel computers, then software is developed with parallel programming model  Performance of parallel computing is measured with speedup and efficiency 27
  • 28. Reference 1. G.S. Almasi and A. Gottlieb. 1989. Highly Parallel Computing. The Benjamin-Cummings publishers, Redwood City, CA. 2. M.J. Quinn. 2003. Parallel Programming in C with MPI and OpenMP. The McGraw-Hill Companies, Inc. NY. 3. What is clustering?. Webopedia computer dictionary. Retrieved on November 7, 2007. 4. V. Kumar, A. Grama, A. Gupta, and G. Karypis. 1994. Introduction to parallel computing: design and analysis of parallel algorithms. The Benjamin-Cummings publishers, Redwood City, CA. 5. V. Eijkhout. 2011. Introduction to Parallel Computing. Texas Advanced Computing Center (TACC), The University of Texas at Austin. 6. B. Barney. 2011. Introduction to Parallel Computing. Lawrence Livermore National Laboratory. 7. Junhong, W. 2004. Parallel Fluent Processing. SVU/Academic Computing, Computer Centre, National University of Singapore. 28

Notes de l'éditeur

  1. serial computation: To be run on a single computer having a single Central Processing Unit (CPU); A problem is broken into a discrete series of instructions. Instructions are executed one after another. Only one instruction may execute at any moment in time.
  2. Multithreading as a widespread programming and execution model allows multiple threads to exist within the context of a single process. These threads share the process' resources but are able to execute independently. The threaded programming model provides developers with a useful abstraction of concurrent execution. However, perhaps the most interesting application of the technology is when it is applied to a single process to enable parallel execution on a multiprocessor system.-----------------------------------------------------------------------------Shared memory systems (SMPs, cc-NUMAs) have a single address space -----------------------------------------------------------OpenMP is the standard for shared memory programming (compiler directives)
  3. Clusters vs. MPPs The key differences between a cluster and an MPP system are: In a cluster various components or layers can change relatively independently of each other, whereas components in MPP systems are much more tightly integrated. For example, a cluster administrator can choose to upgrade the interconnect, say from fast ethernet to gigabit ethernet, just by adding new network interface cards (NICs) and switches to the cluster. On the other hand, in most cases the administrator for an MPP system cannot do such upgrades without upgrading the whole machine. A cluster decouples the development of system software from innovations in underlying hardware. Cluster management tools and parallel programming libraries can be optimized independent of the changes in the node hardware itself. This results in more mature and reliable cluster middleware software as compared to the system software layer in an MPP class system, which requires at least a major rewrite with each generation of the system hardware. An MPP usually has a single system serial number used for software licensing and support tracking. Clusters and NOW have multiple serial numbers, one for each of their constituent nodes.
  4. MPI is the standard for distributed memory programming (library of subprogram calls)------------------------------------------------------------------------------In computer hardware, shared memory refers to a (typically) large block of random access memory (RAM) that can be accessed by several different central processing units (CPUs) in a multiple-processor computer system.---------------------------------------------------------------------------------------------------------Shared memory systems (SMPs, cc-NUMAs) have a single address space Distributed memory systems have separate address spaces for each processor --------------------------------------------------------------------------------------------------Message Passing Interface (MPI) is a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computers. The standard defines the syntax and semantics of a core of library routines useful to a wide range of users writing portable message-passing programs in Fortran 77 or the C programming language. Several well-tested and efficient implementations of MPI include some that are free and in the public domain. These fostered the development of a parallel software industry, and there encouraged development of portable and scalable large-scale parallel applications.MPI is a library specification for message-passing, proposed as a standard by a broadly based committee of vendors, implementors, and users. ------------------------------------------------------------------------------------------From a programming perspective, message passing implementations usually comprise a library of subroutines. Calls to these subroutines are imbedded in source code. The programmer is responsible for determining all parallelism. Historically, a variety of message passing libraries have been available since the 1980s. These implementations differed substantially from each other making it difficult for programmers to develop portable applications. In 1992, the MPI Forum was formed with the primary goal of establishing a standard interface for message passing implementations.