SlideShare a Scribd company logo
1 of 12
TM

Parallel Concepts
Dr. C.V. Suresh Babu
TM

The Goal of Parallelization
• Reduction of elapsed time of a program
• Reduction in turnaround time of jobs
cpu
time

1 processor

communication
overhead

4 processors

• Overhead:
–
–
–
–
–

total increase in cpu time
communication
synchronization
additional work in algorithm
non-parallel part of the program
• (one processor works, others spin idle)

pr
oc
s

finish

2

Elapsed time

8 proc
s
4 p
roc
s

start

1

or
s
es
c
ro
p

Reduction in
elapsed time

Elapsed time
TM

Speedup and Efficiency
Both measure the parallelization properties of a program
• Let T(p) be the elapsed time on p processors
• The Speedup S(p) and the Efficiency E(p) are defined as:
S(p) = T(1)/T(p)
E(p) = S(p)/p
• for ideal parallel speedup we get:
Speedup

ideal

T(p) = T(1)/p
S(p) = T(1)/T(p) = p
E(p) = S(p)/p = 1 or 100%
Efficiency
1

Super-linear
Saturation
Disaster
Number of processors

Number of processors
Amdahl’s Law
This rule states the following for parallel programs:
The non-parallel fraction of the code (I.e. overhead)
imposes the upper limit on the scalability of the code
• the non-parallel (serial) fraction s of the program includes the
(1)
1 = s + f
! program has serial
communication and synchronization overhead and parallel fractions
(2)
(3)
(4)

T(1) =
=
=
T(p) =
S(p) =
=
<

(5)

T(parallel) + T(serial)
T(1) *(f + s)
T(1) *(f + (1-f))
T(1) *(f/p + (1-f))
T(1)/T(p)
1/(f/p + 1-f)
1/(1-f)
! for p-> inf.

S(p) < 1/(1-f)

TM
Amdahl’s Law: Time to Solution

T(p) = T(1)/S(p)
S(p) = 1/(f/p + (1-f))

Hypothetical program run time as function of #processors for several
parallel fractions f. Note the log-log plot

TM
TM

Fine-Grained Vs Coarse-Grained
• Fine-grain parallelism (typically loop level)
–
–
–
–

can be done incrementally, one loop at a time
does not require deep knowledge of the code
a lot of loops have to be parallel for decent speedup
potentially many synchronization points
MAIN
(at the end of each parallel loop)
A
E

B

F

C

G
K

• Coarse-grain parallelism
– make larger loops parallel at higher call-tree level
potentially in-closing many small loops
– more code is parallel at once
– fewer synchronization points, reducing overhead
– requires deeper knowledge of the code

H

L
p

Coarse-grained
D
I

J
N

M

O

q
r

s
t

Fine-grained
TM

Other Impediments to Scalability
Load imbalance:

p0
p1
p2
p3

• the time to complete a parallel
execution of a code segment is
start
determined by the longest running thread

Elapsed time

finish

• unequal work load distribution leads to some processors being
idle, while others work too much

with coarse grain parallelization, more opportunities for load
imbalance exist

Too many synchronization points
• compiler will put synchronization points at the start and exit of
each parallel region
Computing π with DPL
π=

1

4 dx
(1+x2)
0

=Σ
0<i<N

4
N(1+((i+0.5)/N)2)

PROGRAM PIPROG
INTEGER, PARAMETER:: N = 1000000
REAL (KIND=8):: LS,PI, W = 1.0/N
PI = SUM( (/ (4.0*W/(1.0+((I+0.5)*W)**2),I=1,N) /) )
PRINT *, PI
END

Notes:
–
–
–
–
–

essentially sequential form
automatic detection of parallelism
automatic work sharing
all variables shared by default
number of processors specified outside of the code

compile with:

TM
Computing π with Shared Memory
π=

1

4 dx
(1+x2)
0

=Σ
0<i<N

4
N(1+((i+0.5)/N)2)

#define n 1000000
main()
{
double pi, l, ls = 0.0, w = 1.0/n;
int i;
#pragma omp parallel for private(i,l) reduction(+:ls)
for(i=0; i<n; i++) {
l = (i+0.5)*w;
ls += 4.0/(1.0+l*l);
}
printf(“pi is %fn”,ls*w);
}

Notes:
– essentially sequential form
– automatic work sharing

TM
Computing π with Message Passing
1
#include <mpi.h>
4 dx
#define N 1000000
(1+x2)
main()
0
{
double pi, l, ls = 0.0, w = 1.0/N;

π=

=Σ
0<i<N

4
N(1+((i+0.5)/N)2)

int i, mid, nth;

MPI_init(&argc, &argv);
MPI_comm_rank(MPI_COMM_WORLD,&mid);
MPI_comm_size(MPI_COMM_WORLD,&nth);

}

for(i=mid; i<N; i += nth) {
l = (i+0.5)*w;
ls += 4.0/(1.0+l*l);
}
MPI_reduce(&ls,&pi,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD);
if(mid == 0) printf(“pi is %fn”,pi*w);
MPI_finalize();

Notes:

TM
Comparing Parallel Paradigms
• Automatic parallelization combined with explicit Shared Variable
programming (compiler directives) used on machines with global
memory

– Symmetric Multi-Processors, CC-NUMA, PVP
– These methods collectively known as Shared Memory Programming (SMP)
– SMP programming model works at loop level, and coarse level parallelism:
• the coarse level parallelism has to be specified explicitly
• loop level parallelism can be found by the compiler (implicitly)

– Explicit Message Passing Methods are necessary with machines that
have no global memory addressability:
• clusters of all sort, NOW & COW

– Message Passing Methods require coarse level parallelism to be scalable

•Choosing programming model is largely a matter of the application,
personal preference and the target machine.

•it has nothing to do with scalability.
limitations:
– communication overhead
– process synchronization

Scalability

function of
•scalability is mainly aparallelism the hardware and (your)
implementation of the

TM
Summary

TM

• The serial part or the communication overhead of the code limits the
scalability of the code (Amdahl Law)
• programs have to be >99% parallel to use large (>30 proc) machines
• several Programming Models are in use today:
– Shared Memory programming (SMP) (with Automatic Compiler
parallelization, Data-Parallel and explicit Shared Memory models)
– Message Passing model
• Choosing a Programming Model is largely a matter of the application,
personal choice and target machine. It has nothing to do with scalability.
– Don’t confuse Algorithm and implementation
• machines with a global address space can run applications based on
both, SMP and Message Passing programming models

More Related Content

What's hot

12109 microprocessor & programming
12109 microprocessor & programming12109 microprocessor & programming
12109 microprocessor & programming
Gaurang Thakar
 

What's hot (19)

Computer Organozation
Computer OrganozationComputer Organozation
Computer Organozation
 
Lecture7
Lecture7Lecture7
Lecture7
 
Peephole optimization techniques in compiler design
Peephole optimization techniques in compiler designPeephole optimization techniques in compiler design
Peephole optimization techniques in compiler design
 
Lecture6
Lecture6Lecture6
Lecture6
 
Embedded programming u3 part 1
Embedded programming u3 part 1Embedded programming u3 part 1
Embedded programming u3 part 1
 
Performance measures
Performance measuresPerformance measures
Performance measures
 
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
 
ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 release
 
Unit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processingUnit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processing
 
Runtimeenvironment
RuntimeenvironmentRuntimeenvironment
Runtimeenvironment
 
TensorRT survey
TensorRT surveyTensorRT survey
TensorRT survey
 
12109 microprocessor & programming
12109 microprocessor & programming12109 microprocessor & programming
12109 microprocessor & programming
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2
 
parallel language and compiler
parallel language and compilerparallel language and compiler
parallel language and compiler
 
Move Message Passing Interface Applications to the Next Level
Move Message Passing Interface Applications to the Next LevelMove Message Passing Interface Applications to the Next Level
Move Message Passing Interface Applications to the Next Level
 
Caap presentation by me
Caap presentation by meCaap presentation by me
Caap presentation by me
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer Architecture
 
MLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performanceMLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performance
 

Similar to Parallel concepts1

Similar to Parallel concepts1 (20)

Lecture1
Lecture1Lecture1
Lecture1
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
 
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
 
Chap5 slides
Chap5 slidesChap5 slides
Chap5 slides
 
Lecture5
Lecture5Lecture5
Lecture5
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computing
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
 
Lec 4 (program and network properties)
Lec 4 (program and network properties)Lec 4 (program and network properties)
Lec 4 (program and network properties)
 
CUDA
CUDACUDA
CUDA
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Esctp snir
Esctp snirEsctp snir
Esctp snir
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 

More from Dr. C.V. Suresh Babu

More from Dr. C.V. Suresh Babu (20)

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Association rules
Association rulesAssociation rules
Association rules
 
Clustering
ClusteringClustering
Clustering
 
Classification
ClassificationClassification
Classification
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
DART
DARTDART
DART
 
Mycin
MycinMycin
Mycin
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Bayes network
Bayes networkBayes network
Bayes network
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
Rule based system
Rule based systemRule based system
Rule based system
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
 
Production based system
Production based systemProduction based system
Production based system
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 

Recently uploaded

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Recently uploaded (20)

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 

Parallel concepts1

  • 2. TM The Goal of Parallelization • Reduction of elapsed time of a program • Reduction in turnaround time of jobs cpu time 1 processor communication overhead 4 processors • Overhead: – – – – – total increase in cpu time communication synchronization additional work in algorithm non-parallel part of the program • (one processor works, others spin idle) pr oc s finish 2 Elapsed time 8 proc s 4 p roc s start 1 or s es c ro p Reduction in elapsed time Elapsed time
  • 3. TM Speedup and Efficiency Both measure the parallelization properties of a program • Let T(p) be the elapsed time on p processors • The Speedup S(p) and the Efficiency E(p) are defined as: S(p) = T(1)/T(p) E(p) = S(p)/p • for ideal parallel speedup we get: Speedup ideal T(p) = T(1)/p S(p) = T(1)/T(p) = p E(p) = S(p)/p = 1 or 100% Efficiency 1 Super-linear Saturation Disaster Number of processors Number of processors
  • 4. Amdahl’s Law This rule states the following for parallel programs: The non-parallel fraction of the code (I.e. overhead) imposes the upper limit on the scalability of the code • the non-parallel (serial) fraction s of the program includes the (1) 1 = s + f ! program has serial communication and synchronization overhead and parallel fractions (2) (3) (4) T(1) = = = T(p) = S(p) = = < (5) T(parallel) + T(serial) T(1) *(f + s) T(1) *(f + (1-f)) T(1) *(f/p + (1-f)) T(1)/T(p) 1/(f/p + 1-f) 1/(1-f) ! for p-> inf. S(p) < 1/(1-f) TM
  • 5. Amdahl’s Law: Time to Solution T(p) = T(1)/S(p) S(p) = 1/(f/p + (1-f)) Hypothetical program run time as function of #processors for several parallel fractions f. Note the log-log plot TM
  • 6. TM Fine-Grained Vs Coarse-Grained • Fine-grain parallelism (typically loop level) – – – – can be done incrementally, one loop at a time does not require deep knowledge of the code a lot of loops have to be parallel for decent speedup potentially many synchronization points MAIN (at the end of each parallel loop) A E B F C G K • Coarse-grain parallelism – make larger loops parallel at higher call-tree level potentially in-closing many small loops – more code is parallel at once – fewer synchronization points, reducing overhead – requires deeper knowledge of the code H L p Coarse-grained D I J N M O q r s t Fine-grained
  • 7. TM Other Impediments to Scalability Load imbalance: p0 p1 p2 p3 • the time to complete a parallel execution of a code segment is start determined by the longest running thread Elapsed time finish • unequal work load distribution leads to some processors being idle, while others work too much with coarse grain parallelization, more opportunities for load imbalance exist Too many synchronization points • compiler will put synchronization points at the start and exit of each parallel region
  • 8. Computing π with DPL π= 1 4 dx (1+x2) 0 =Σ 0<i<N 4 N(1+((i+0.5)/N)2) PROGRAM PIPROG INTEGER, PARAMETER:: N = 1000000 REAL (KIND=8):: LS,PI, W = 1.0/N PI = SUM( (/ (4.0*W/(1.0+((I+0.5)*W)**2),I=1,N) /) ) PRINT *, PI END Notes: – – – – – essentially sequential form automatic detection of parallelism automatic work sharing all variables shared by default number of processors specified outside of the code compile with: TM
  • 9. Computing π with Shared Memory π= 1 4 dx (1+x2) 0 =Σ 0<i<N 4 N(1+((i+0.5)/N)2) #define n 1000000 main() { double pi, l, ls = 0.0, w = 1.0/n; int i; #pragma omp parallel for private(i,l) reduction(+:ls) for(i=0; i<n; i++) { l = (i+0.5)*w; ls += 4.0/(1.0+l*l); } printf(“pi is %fn”,ls*w); } Notes: – essentially sequential form – automatic work sharing TM
  • 10. Computing π with Message Passing 1 #include <mpi.h> 4 dx #define N 1000000 (1+x2) main() 0 { double pi, l, ls = 0.0, w = 1.0/N; π= =Σ 0<i<N 4 N(1+((i+0.5)/N)2) int i, mid, nth; MPI_init(&argc, &argv); MPI_comm_rank(MPI_COMM_WORLD,&mid); MPI_comm_size(MPI_COMM_WORLD,&nth); } for(i=mid; i<N; i += nth) { l = (i+0.5)*w; ls += 4.0/(1.0+l*l); } MPI_reduce(&ls,&pi,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD); if(mid == 0) printf(“pi is %fn”,pi*w); MPI_finalize(); Notes: TM
  • 11. Comparing Parallel Paradigms • Automatic parallelization combined with explicit Shared Variable programming (compiler directives) used on machines with global memory – Symmetric Multi-Processors, CC-NUMA, PVP – These methods collectively known as Shared Memory Programming (SMP) – SMP programming model works at loop level, and coarse level parallelism: • the coarse level parallelism has to be specified explicitly • loop level parallelism can be found by the compiler (implicitly) – Explicit Message Passing Methods are necessary with machines that have no global memory addressability: • clusters of all sort, NOW & COW – Message Passing Methods require coarse level parallelism to be scalable •Choosing programming model is largely a matter of the application, personal preference and the target machine. •it has nothing to do with scalability. limitations: – communication overhead – process synchronization Scalability function of •scalability is mainly aparallelism the hardware and (your) implementation of the TM
  • 12. Summary TM • The serial part or the communication overhead of the code limits the scalability of the code (Amdahl Law) • programs have to be >99% parallel to use large (>30 proc) machines • several Programming Models are in use today: – Shared Memory programming (SMP) (with Automatic Compiler parallelization, Data-Parallel and explicit Shared Memory models) – Message Passing model • Choosing a Programming Model is largely a matter of the application, personal choice and target machine. It has nothing to do with scalability. – Don’t confuse Algorithm and implementation • machines with a global address space can run applications based on both, SMP and Message Passing programming models