4. What is Parallel Computing?
A form of computation in which many calculations are
carried out simultaneously, operating on the principle
that large problems can often be divided into smaller
ones, which are then solved concurrently ("in parallel").
1
[Almasi and Gottlieb, 1989]
Problem
Task Problem
Task Task Task
Instructions
… … … …
CPU CPU CPU CPU
4
5. Pattern of Parallelism
Data parallelism [Quinn, 2003] 2
There are independent tasks applying the same
operation to different elements of a data set.
for i ← 0 to 99 do
a[i] = b[i] + c[i]
endfor
Functional Parallelism [Quinn, 2003] 2
There are independent tasks applying different
operations to different data elements.
a = 2, b=3
m = (a + b) / 2
n = a 2 + b2
5
7. Why use Parallel Computing?
Reduce computing time
More Processor
7
8. Why use Parallel Computing? (1)
Solve larger problems
More Memory
Problem
Task Problem
Task Task Task
Instructions
… … … …
RAM RAM RAM
RAM RAM
8
9. Parallel Computing Systems
• A single machine with multi-core processors
Process
Memory
C C C C
Multithreaded
C C C C
P P
Problem
Limits of a single machine (performance, available memory)
9
10. What is Cluster?
A group of linked computers, working together
closely so that in many respects they from a single
computer
To improve performance and/or availability over
that provided by a single computer 3
[Webopedia computer dictionary, 2007]
High-Performance High-Availability
10
12. Message-Passing model
The system is assumed to be a collection of processors,
each with its own local memory (Distributed memory
system)
A processor has direct access only to the instructions
and data stored in its local memory
An interconnection network supports message passing
between processors
MPI Standard
2
[Quinn, 2003] 12
13. Performance metrics
for parallel computing
• Speedup [Kumar et al., 1994] 4
How much performance gain is achieved
parallelizing a given application over a sequential
implementation
SP - speedup with p processors
TS
Sp = P Ts Tp Sp
TP
4 40 15 2.67
where
TS - a sequential execution time
P - a number of processors
TP - a parallel execution time
with p processors
13
15. Efficiency
A measure of processor utilization [Quinn, 2003] 2
EP - Efficiency with p processors
SP P Sp Ep
Ep =
P 4 2 0.5
8 3 0.375
In practice, speedup is less than p and efficiency is
between zero and one, depending on the degree of
effectiveness with which the processors are utilized
5
[Eijkhout, 2011]
15
16. Effective factors of
Parallel Performance
• Portion of computation [Quinn, 2003]
2
Computations that must be performed sequentially
Computations that can be performed in parallel
fs - Serial fraction of computation
fp - Parallel fraction of computation
TS TS 1
Sp = = =
TP fs(Ts) + fp(Ts) fs + fp
P P
TS fs fp fs(TS) fp(Ts)
100 10% 90% 10 90 16
17. Effective factors of
Parallel Performance (1)
• Parallel Overhead [Barney, 2011]
6
The amount of time required to coordinate
parallel tasks, as opposed to doing useful
work
o Task start-up time
o Synchronizations
o Data communications
o Task termination time
• Load balancing, etc.
17
19. Effective factors of
Parallel Performance (3)
Fixed Problem Size
Fixed
Sp = TS = TS
TP (fs)Ts + (1 – fs)Ts + Toverhead
P
19
20. Effective factors of
Parallel Performance (4)
Fixed P; Problem Size => Speedup
P
Sp = TS = 0
TS
0
TP (fs)Ts + (1 – fs)Ts + Toverhead
P
2D grid calculations 85 mins 85% 680 mins 97.84%
Serial fraction 15 mins 15% 15 mins 2.16%
20
21. Case Study
Hardware Configuration
Linux Cluster (4 compute nodes)
Detail of Compute node
o 2x Intel Xeon 2.80 GHz (Single core)
o 4 GB RAM
o Gigabit Ethernet
o CentOS 4.3
21
22. Case Study - CFD
Parallel Fluent Processing [Junhong, 2004] 7
Run Fluent solver on two or more CPUs
simultaneously to calculate a computational
fluid dynamics (CFD) job
22
24. Case Study – CFD (2)
Case Test #1 – Runtime
24
25. Case Study – CFD (3)
Case Test #1 – Speedup
25
26. Case Study – CFD (4)
Case Test #1 – Efficiency
26
27. Conclusion
Parallel computing help to save time of
computation and solve larger problems over that
provided by a single computer (sequential
computing)
To use parallel computers, then software is
developed with parallel programming model
Performance of parallel computing is measured
with speedup and efficiency
27
28. Reference
1. G.S. Almasi and A. Gottlieb. 1989. Highly Parallel Computing. The
Benjamin-Cummings publishers, Redwood City, CA.
2. M.J. Quinn. 2003. Parallel Programming in C with MPI and
OpenMP. The McGraw-Hill Companies, Inc. NY.
3. What is clustering?. Webopedia computer dictionary. Retrieved on
November 7, 2007.
4. V. Kumar, A. Grama, A. Gupta, and G. Karypis. 1994. Introduction
to parallel computing: design and analysis of parallel algorithms.
The Benjamin-Cummings publishers, Redwood City, CA.
5. V. Eijkhout. 2011. Introduction to Parallel Computing. Texas
Advanced Computing Center (TACC), The University of Texas at
Austin.
6. B. Barney. 2011. Introduction to Parallel Computing. Lawrence
Livermore National Laboratory.
7. Junhong, W. 2004. Parallel Fluent Processing. SVU/Academic
Computing, Computer Centre, National University of Singapore.
28
Notes de l'éditeur
serial computation: To be run on a single computer having a single Central Processing Unit (CPU); A problem is broken into a discrete series of instructions. Instructions are executed one after another. Only one instruction may execute at any moment in time.
Multithreading as a widespread programming and execution model allows multiple threads to exist within the context of a single process. These threads share the process' resources but are able to execute independently. The threaded programming model provides developers with a useful abstraction of concurrent execution. However, perhaps the most interesting application of the technology is when it is applied to a single process to enable parallel execution on a multiprocessor system.-----------------------------------------------------------------------------Shared memory systems (SMPs, cc-NUMAs) have a single address space -----------------------------------------------------------OpenMP is the standard for shared memory programming (compiler directives)
Clusters vs. MPPs The key differences between a cluster and an MPP system are: In a cluster various components or layers can change relatively independently of each other, whereas components in MPP systems are much more tightly integrated. For example, a cluster administrator can choose to upgrade the interconnect, say from fast ethernet to gigabit ethernet, just by adding new network interface cards (NICs) and switches to the cluster. On the other hand, in most cases the administrator for an MPP system cannot do such upgrades without upgrading the whole machine. A cluster decouples the development of system software from innovations in underlying hardware. Cluster management tools and parallel programming libraries can be optimized independent of the changes in the node hardware itself. This results in more mature and reliable cluster middleware software as compared to the system software layer in an MPP class system, which requires at least a major rewrite with each generation of the system hardware. An MPP usually has a single system serial number used for software licensing and support tracking. Clusters and NOW have multiple serial numbers, one for each of their constituent nodes.
MPI is the standard for distributed memory programming (library of subprogram calls)------------------------------------------------------------------------------In computer hardware, shared memory refers to a (typically) large block of random access memory (RAM) that can be accessed by several different central processing units (CPUs) in a multiple-processor computer system.---------------------------------------------------------------------------------------------------------Shared memory systems (SMPs, cc-NUMAs) have a single address space Distributed memory systems have separate address spaces for each processor --------------------------------------------------------------------------------------------------Message Passing Interface (MPI) is a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computers. The standard defines the syntax and semantics of a core of library routines useful to a wide range of users writing portable message-passing programs in Fortran 77 or the C programming language. Several well-tested and efficient implementations of MPI include some that are free and in the public domain. These fostered the development of a parallel software industry, and there encouraged development of portable and scalable large-scale parallel applications.MPI is a library specification for message-passing, proposed as a standard by a broadly based committee of vendors, implementors, and users. ------------------------------------------------------------------------------------------From a programming perspective, message passing implementations usually comprise a library of subroutines. Calls to these subroutines are imbedded in source code. The programmer is responsible for determining all parallelism. Historically, a variety of message passing libraries have been available since the 1980s. These implementations differed substantially from each other making it difficult for programmers to develop portable applications. In 1992, the MPI Forum was formed with the primary goal of establishing a standard interface for message passing implementations.