SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
Introduction to OpenMP
OpenMP is an Application Program Interface (API) for
• explicit
• portable
• shared-memory parallel programming
• in C/C++ and Fortran.
OpenMP consists of
• compiler directives,
• runtime calls and
• environment variables.
It is supported by all major compilers on Unix and
Windows platforms
GNU, IBM, Oracle, Intel, PGI, Absoft, Lahey/Fujitsu,
PathScale, HP, MS, Cray
OpenMP : What is it?
OpenMP Programming Model
➢ Designed for multi-processor/core, shared
memory machines.
➢ OpenMP programs accomplish parallelism
exclusively through the use of threads.
➢ Programmer has full control over
parallelization.
➢ Consists of a set of #pragmas (Compiler
Instructions/ Directives) that control how the
program works.
OpenMP: Core Elements
 Directives & Pragmas
▪ Forking Threads (parallel region)
▪ Work Sharing
▪ Synchronization
▪ Data Environment
 User level runtime functions & Env. variables
Thread Creation/Fork-Join
All OpenMP programs begin as a single process: the master
thread.
The master thread executes sequentially until the
first parallel region construct is encountered.
FORK: the master thread then creates a team of
parallel threads.
The statements in the program that are enclosed by the
parallel region construct are then executed in parallel
among the various team threads.
JOIN: When the team threads complete the statements in
the parallel region construct, they synchronize and
terminate, leaving only the master thread.
Thread Creation/Fork-Join
Master thread spawns a team of threads as needed.
Parallelism added incrementally until performance goals are
met: i.e. the sequential program evolves into a parallel
program
OpenMP Run Time Variables
❖Modify/check/get info about the number of threads
omp_get_num_threads() //number of threads in use
omp_get_thread_num() //tells which thread you are
omp_get_max_threads() //max threads that can be used
❖Are we in a parallel region?
omp_in_parallel()
❖How many processors in the system?
omp_get_num_procs()
❖Explicit locks
omp_[set|unset]_lock()
And several more...
OpenMP: Few Syntax Details
❖Most of the constructs in OpenMP are compiler directives or
pragmas
For C/C++ the pragmas take the form
#pragma omp construct [clause [clause]…]
For Fortran, the directives take one of the forms
C$OMP construct [clause [clause]…]
!$OMP construct [clause [clause]…]
*$OMP construct [clause [clause]…]
❖Header File or Fortran 90 module
#include omp.h
use omp_lib
Parallel Region and basic functions
Compiling OpenMP code
❖Same code can run on single-core or multi-core machines
❖Compiler directives are picked up ONLY when thee
program is instructed to be compiled in OpenMP mode.
❖Method depends on the compiler
G++
$ g++ -o foo foo.c -fopenmp
ICC
$ icc -o foo foo.c -fopenmp
Running OpenMP code
❖Controlling the number of threads at runtime
 The default number of threads = number of online
processors on the machine.
 C shell : setenv OMP_NUM_THREADS number
 Bash shell: export OMP_NUM_THREADS = number
 Runtime OpenMP function omp_set_num_threads(4)
 Clause in #pragma for parallel region
❖Execution Timing #include omp.h
stime = omp_get_wtime();
longfunction();
etime = omp_get_wtime();
total = etime-stime;
To create a 4 thread Parallel region :
Each thread calls pooh(ID,A) for ID = 0 to 3
Thread Creation/Fork-Join
OpenMP: Core Elements
 Directives & Pragmas
▪ Forking Threads (parallel region)
▪ Work Sharing
▪ Synchronization
▪ Data Environment
 User level runtime functions & Env. variables
Data vs. Task Parallelism
Data parallelism
Large amount of data elements and each data element
(or possibly a subset of elements) needs to be processed
to produce a result. When this processing can be done in
parallel, we have data parallelism
Task parallelism
A collection of tasks that need to be completed. If
these tasks can be performed in parallel you are faced
with a task parallel job
OpenMP: Work Sharing
A work-sharing construct divides the
execution of the enclosed code region among
different Threads
categories of work sharing in OpenMP
• omp for
• omp sections
Threads are assigned
independent sets of iterations.
Threads must wait at the end
of the work sharing construct.
#pragma omp for
#pragma omp parallel for
Work Sharing: omp for
Schedule Clause
Data
Sharing/Scope
Schedule Clause
How is the work is divided among threads?
Directives for work distribution
OpenMP for Parallelization
for (int i = 2; i < 10; i++)
{
x[i] = a * x[i-1] + b
}
Can all loops be parallelized?
Loop iterations have to be independent.
Simple Test: If the results differ when the code is executed
backwards, the loop cannot by parallelized!
Between 2 Synchronization points, if atleast 1 thread
writes to a memory location, that atleast 1 other thread
reads from => The result is non-deterministic
Work Sharing: sections
SECTIONS directive is a non-iterative work-sharing
construct.
➢ It specifies that the enclosed section(s) of code are to be
divided among the threads in the team.
➢ Each SECTION is executed ONCE by a thread in the
team.
Work Sharing: sections
OpenMP: Core Elements
 Directives & Pragmas
▪ Forking Threads (parallel region)
▪ Work Sharing
▪ Synchronization
▪ Data Environment
 User level runtime functions & Env. variables
Synchronization Constructs
Synchronization is achieved by
1) Barriers (Task Dependencies)
Implicit : Sync points exist at the end of
parallel –necessary barrier – cant be removed
for – can be removed by using the nowait clause
sections – can be removed by using the nowait clause
single – can be removed by using the nowait clause
Explicit : Must be used when ordering is required
#pragma omp barrier
each thread waits until all threads arrive at the barrier
Explicit Barrier
Implicit Barrier at end
of parallel region
No Barrier
nowait cancels barrier
creation
Synchronization: Barrier
Data Dependencies
OpenMP assumes that there is NO data-
dependency across jobs running in parallel
When the omp parallel directive is placed around
a code block, it is the programmer’s
responsibility to make sure data dependency is
ruled out
Race Condition
Non Deterministic Behaviour
Two or more threads access a shared variable at the same time.
Both Threads A and B are executing
Synchronization Constructs
2) Mutual Exclusion (Data Dependencies)
Critical Sections : Protect access to shared & modifiable data,
allowing ONLY ONE thread to enter it at a given time
#pragma omp critical
#pragma omp atomic – special case of critical, less overhead
Locks
Only one thread
updates this at a
time
Synchronization Constructs
A section of code can only be
executed by one thread at a time
OpenMP: Core Elements
 Directives & Pragmas
▪ Forking Threads (parallel region)
▪ Work Sharing
▪ Synchronization
▪ Data Environment
 User level runtime functions & Env. variables
OpenMP: Data Scoping
Challenge in Shared Memory Parallelization => Managing Data Environment
Scoping
OpenMP Shared variable : Can be Read/Written by all Threads in the team.
OpenMP Private variable : Each Thread has its own local copy of this variable
int i;
int j;
#pragma omp parallel private(j)
{
int k;
i = …….
j = ……..
k = …
}
Private
Shared
Loop variables in an omp for are private;
Local variables in the parallel region are private.
Alter default behaviour with the {default}
clause:
#pragma omp parallel default(shared)
private(x)
{ ... }
#pragma omp parallel default(private) shared
(matrix)
{ ... }
OpenMP: private Clause
• Reproduce the private variable for each thread.
• Variables are not initialized.
• The value that Thread1 stores in x is different from
the value Thread2 stores in x
OpenMP Parallel Programming
➢ Start with a parallelizable algorithm
Loop level parallelism
➢ Implement Serially : Optimized Serial Program
➢ Test, Debug & Time to solution
➢ Annotate the code with parallelization and
Synchronization directives
➢ Remove Race Conditions, False Sharing***
➢ Test and Debug
➢ Measure speed-up
Problem: Count the Number of times each ASCII character occurs in page of text
Input; ASCII text, stored as an ARRAY of characters, Number of bins (128)
Output: Histogram with 128 buckets – one for each ASCII character
➢Start with a parallelizable algorithm
▪Loop level parallelism?
void compute_histogram_st(char *page, int page_size, int
*histogram)
{
for(int i = 0; i < page_size; i++){
char read_character = page[i];
histogram[read_character]++;
}
}
Can this loop be
parallelized?
Annotate the code with parallelization and
Synchronization directives
void compute_histogram_st(char *page, int page_size, int
*histogram)
{
#pragma omp parallel for
for(int i = 0; i < page_size; i++) {
char read_character = page[i];
histogram[read_character]++;
}
}
omp parallel for
This will not work! Why?
Shared
Mutual Exclusion
Private variable
Critical Section
Problem: Count the Number of times each ASCII character occurs in page of text
Input; ASCII text, stored as an ARRAY of characters, Number of bins (128)
Output: Histogram with 128 buckets – one for each ASCII character
Could be slower than the Serial Code.
Overhead = Critical Section + Parallelization
void compute_histogram_st(char *page, int page_size, int
*histogram)
{
#pragma omp parallel for
for(int i = 0; i < page_size; i++){
char read_character = page[i];
#pragma omp atomic
histogram[read_character]++;
}
}
void compute_histogram (char *page, int page_size, int *histogram, int num_bins)
{
int num_threads = omp_get_max_threads();
#pragma omp parallel
{
int local_histogram [num_bins] = {0};
#pragma omp for
for(int i = 0; i < page_size; i++){
char read_character = page[i];
local_histogram [read_character]++;
}
#pragma omp critical
for(int i = 0; i < num_bins; i++){
histogram[i] += local_histogram [i];
}
}
}
Each Thread Updates
its local copy
Combine from thread locals
to shared variable
local_histogram
Thread0
Thread1
Thread2
Bins 1,2,3,….num_bins ------>
OpenMP: Reduction
One or more variables that are private to each thread are subject of
reduction operation at the end of the parallel region.
#pragma omp for reduction(operator : var)
Operator: + , * , - , & , | , && , ||, ^
Combines multiple local copies of the var from threads into a single
copy at the master.
sum = 0;
#pragma omp parallel for
for (int i = 0; i < 9; i++)
{
sum += a[i]
}
OpenMP: Reduction
sum = 0;
#pragma omp parallel for shared(sum, a) reduction(+: sum)
for (int i = 0; i < 9; i++)
{
sum += a[i]
}
sumloc_1 = a[0] + a[1] + a[2]
sumloc_2 = a[3] + a[4] + a[5]
sumloc_3 = a[6] + a[7] + a[8]
3 Threads
sum = sum_loc1 + sum_loc2 + sum_loc3
Computing ∏ by method of Numerical Integration
static long num_steps = 100000;
double step;
void main ()
{
int i; double x, pi, sum = 0.0;
step = 1.0 / (double) num_steps;
for (I = 0; I <= num_steps; i++)
{
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}
pi = step * sum
}
Serial Code
Loop
static long num_steps = 100000;
double step;
void main ()
{
int i; double x, pi, sum = 0.0;
step = 1.0 / (double) num_steps;
for (I = 0; I <= num_steps; i++) {
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}
pi = step * sum
}
Computing ∏ by method of Numerical Integration
#include <omp.h>
#define NUM_THREADS 4
static long num_steps = 100000;
double step;
void main ()
{
int i; double x, pi, sum = 0.0;
step = 1.0 / (double) num_steps;
omp_set_num_threads(NUM_THREADS);
#pragma omp parallel for reduction(+:sum)
private(x)
for (I = 0; I <= num_steps; i++) {
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}
pi = step * sum
}
Serial Code Parallel Code
Thank You

Contenu connexe

Tendances (20)

System Programming Overview
System Programming OverviewSystem Programming Overview
System Programming Overview
 
Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computing
 
Introduction to MPI
Introduction to MPI Introduction to MPI
Introduction to MPI
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
MPI
MPIMPI
MPI
 
Processes and threads
Processes and threadsProcesses and threads
Processes and threads
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computing
 
My ppt hpc u4
My ppt hpc u4My ppt hpc u4
My ppt hpc u4
 
Fundamentals of Language Processing
Fundamentals of Language ProcessingFundamentals of Language Processing
Fundamentals of Language Processing
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)
 
Mainframe systems
Mainframe systemsMainframe systems
Mainframe systems
 
Loops in flow
Loops in flowLoops in flow
Loops in flow
 
System call (Fork +Exec)
System call (Fork +Exec)System call (Fork +Exec)
System call (Fork +Exec)
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Virtual machine
Virtual machineVirtual machine
Virtual machine
 
COMPILER DESIGN OPTIONS
COMPILER DESIGN OPTIONSCOMPILER DESIGN OPTIONS
COMPILER DESIGN OPTIONS
 
parallel processing
parallel processingparallel processing
parallel processing
 

Similaire à Introduction to OpenMP

Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5AbdullahMunir32
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacGanesan Narayanasamy
 
Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open MpAnshul Sharma
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersDhanashree Prasad
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingVengada Karthik Rangaraju
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5Jeff Larkin
 
introduction to server-side scripting
introduction to server-side scriptingintroduction to server-side scripting
introduction to server-side scriptingAmirul Shafeeq
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPAnil Bohare
 

Similaire à Introduction to OpenMP (20)

Open mp intro_01
Open mp intro_01Open mp intro_01
Open mp intro_01
 
OpenMP.pptx
OpenMP.pptxOpenMP.pptx
OpenMP.pptx
 
openmpfinal.pdf
openmpfinal.pdfopenmpfinal.pdf
openmpfinal.pdf
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdac
 
MPI n OpenMP
MPI n OpenMPMPI n OpenMP
MPI n OpenMP
 
Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
 
Lecture6
Lecture6Lecture6
Lecture6
 
Lecture8
Lecture8Lecture8
Lecture8
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
 
25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
 
introduction to server-side scripting
introduction to server-side scriptingintroduction to server-side scripting
introduction to server-side scripting
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
OpenMp
OpenMpOpenMp
OpenMp
 
Open mp directives
Open mp directivesOpen mp directives
Open mp directives
 
Parllelizaion
ParllelizaionParllelizaion
Parllelizaion
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 

Plus de Akhila Prabhakaran

Plus de Akhila Prabhakaran (7)

Re Imagining Education
Re Imagining EducationRe Imagining Education
Re Imagining Education
 
Hypothesis testing Part1
Hypothesis testing Part1Hypothesis testing Part1
Hypothesis testing Part1
 
Statistical Analysis with R- III
Statistical Analysis with R- IIIStatistical Analysis with R- III
Statistical Analysis with R- III
 
Statistical Analysis with R -II
Statistical Analysis with R -IIStatistical Analysis with R -II
Statistical Analysis with R -II
 
Statistical Analysis with R -I
Statistical Analysis with R -IStatistical Analysis with R -I
Statistical Analysis with R -I
 
Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPI
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 

Dernier

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 

Dernier (20)

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 

Introduction to OpenMP

  • 2. OpenMP is an Application Program Interface (API) for • explicit • portable • shared-memory parallel programming • in C/C++ and Fortran. OpenMP consists of • compiler directives, • runtime calls and • environment variables. It is supported by all major compilers on Unix and Windows platforms GNU, IBM, Oracle, Intel, PGI, Absoft, Lahey/Fujitsu, PathScale, HP, MS, Cray OpenMP : What is it?
  • 3. OpenMP Programming Model ➢ Designed for multi-processor/core, shared memory machines. ➢ OpenMP programs accomplish parallelism exclusively through the use of threads. ➢ Programmer has full control over parallelization. ➢ Consists of a set of #pragmas (Compiler Instructions/ Directives) that control how the program works.
  • 4. OpenMP: Core Elements  Directives & Pragmas ▪ Forking Threads (parallel region) ▪ Work Sharing ▪ Synchronization ▪ Data Environment  User level runtime functions & Env. variables
  • 5. Thread Creation/Fork-Join All OpenMP programs begin as a single process: the master thread. The master thread executes sequentially until the first parallel region construct is encountered. FORK: the master thread then creates a team of parallel threads. The statements in the program that are enclosed by the parallel region construct are then executed in parallel among the various team threads. JOIN: When the team threads complete the statements in the parallel region construct, they synchronize and terminate, leaving only the master thread.
  • 6. Thread Creation/Fork-Join Master thread spawns a team of threads as needed. Parallelism added incrementally until performance goals are met: i.e. the sequential program evolves into a parallel program
  • 7. OpenMP Run Time Variables ❖Modify/check/get info about the number of threads omp_get_num_threads() //number of threads in use omp_get_thread_num() //tells which thread you are omp_get_max_threads() //max threads that can be used ❖Are we in a parallel region? omp_in_parallel() ❖How many processors in the system? omp_get_num_procs() ❖Explicit locks omp_[set|unset]_lock() And several more...
  • 8. OpenMP: Few Syntax Details ❖Most of the constructs in OpenMP are compiler directives or pragmas For C/C++ the pragmas take the form #pragma omp construct [clause [clause]…] For Fortran, the directives take one of the forms C$OMP construct [clause [clause]…] !$OMP construct [clause [clause]…] *$OMP construct [clause [clause]…] ❖Header File or Fortran 90 module #include omp.h use omp_lib
  • 9. Parallel Region and basic functions
  • 10. Compiling OpenMP code ❖Same code can run on single-core or multi-core machines ❖Compiler directives are picked up ONLY when thee program is instructed to be compiled in OpenMP mode. ❖Method depends on the compiler G++ $ g++ -o foo foo.c -fopenmp ICC $ icc -o foo foo.c -fopenmp
  • 11. Running OpenMP code ❖Controlling the number of threads at runtime  The default number of threads = number of online processors on the machine.  C shell : setenv OMP_NUM_THREADS number  Bash shell: export OMP_NUM_THREADS = number  Runtime OpenMP function omp_set_num_threads(4)  Clause in #pragma for parallel region ❖Execution Timing #include omp.h stime = omp_get_wtime(); longfunction(); etime = omp_get_wtime(); total = etime-stime;
  • 12. To create a 4 thread Parallel region : Each thread calls pooh(ID,A) for ID = 0 to 3 Thread Creation/Fork-Join
  • 13. OpenMP: Core Elements  Directives & Pragmas ▪ Forking Threads (parallel region) ▪ Work Sharing ▪ Synchronization ▪ Data Environment  User level runtime functions & Env. variables
  • 14. Data vs. Task Parallelism Data parallelism Large amount of data elements and each data element (or possibly a subset of elements) needs to be processed to produce a result. When this processing can be done in parallel, we have data parallelism Task parallelism A collection of tasks that need to be completed. If these tasks can be performed in parallel you are faced with a task parallel job
  • 15. OpenMP: Work Sharing A work-sharing construct divides the execution of the enclosed code region among different Threads categories of work sharing in OpenMP • omp for • omp sections
  • 16. Threads are assigned independent sets of iterations. Threads must wait at the end of the work sharing construct. #pragma omp for #pragma omp parallel for
  • 17. Work Sharing: omp for Schedule Clause Data Sharing/Scope
  • 18. Schedule Clause How is the work is divided among threads? Directives for work distribution
  • 19. OpenMP for Parallelization for (int i = 2; i < 10; i++) { x[i] = a * x[i-1] + b } Can all loops be parallelized? Loop iterations have to be independent. Simple Test: If the results differ when the code is executed backwards, the loop cannot by parallelized! Between 2 Synchronization points, if atleast 1 thread writes to a memory location, that atleast 1 other thread reads from => The result is non-deterministic
  • 20. Work Sharing: sections SECTIONS directive is a non-iterative work-sharing construct. ➢ It specifies that the enclosed section(s) of code are to be divided among the threads in the team. ➢ Each SECTION is executed ONCE by a thread in the team.
  • 22. OpenMP: Core Elements  Directives & Pragmas ▪ Forking Threads (parallel region) ▪ Work Sharing ▪ Synchronization ▪ Data Environment  User level runtime functions & Env. variables
  • 23. Synchronization Constructs Synchronization is achieved by 1) Barriers (Task Dependencies) Implicit : Sync points exist at the end of parallel –necessary barrier – cant be removed for – can be removed by using the nowait clause sections – can be removed by using the nowait clause single – can be removed by using the nowait clause Explicit : Must be used when ordering is required #pragma omp barrier each thread waits until all threads arrive at the barrier
  • 24. Explicit Barrier Implicit Barrier at end of parallel region No Barrier nowait cancels barrier creation Synchronization: Barrier
  • 25. Data Dependencies OpenMP assumes that there is NO data- dependency across jobs running in parallel When the omp parallel directive is placed around a code block, it is the programmer’s responsibility to make sure data dependency is ruled out
  • 26. Race Condition Non Deterministic Behaviour Two or more threads access a shared variable at the same time. Both Threads A and B are executing
  • 27. Synchronization Constructs 2) Mutual Exclusion (Data Dependencies) Critical Sections : Protect access to shared & modifiable data, allowing ONLY ONE thread to enter it at a given time #pragma omp critical #pragma omp atomic – special case of critical, less overhead Locks Only one thread updates this at a time
  • 28. Synchronization Constructs A section of code can only be executed by one thread at a time
  • 29. OpenMP: Core Elements  Directives & Pragmas ▪ Forking Threads (parallel region) ▪ Work Sharing ▪ Synchronization ▪ Data Environment  User level runtime functions & Env. variables
  • 30. OpenMP: Data Scoping Challenge in Shared Memory Parallelization => Managing Data Environment Scoping OpenMP Shared variable : Can be Read/Written by all Threads in the team. OpenMP Private variable : Each Thread has its own local copy of this variable int i; int j; #pragma omp parallel private(j) { int k; i = ……. j = …….. k = … } Private Shared Loop variables in an omp for are private; Local variables in the parallel region are private. Alter default behaviour with the {default} clause: #pragma omp parallel default(shared) private(x) { ... } #pragma omp parallel default(private) shared (matrix) { ... }
  • 31. OpenMP: private Clause • Reproduce the private variable for each thread. • Variables are not initialized. • The value that Thread1 stores in x is different from the value Thread2 stores in x
  • 32. OpenMP Parallel Programming ➢ Start with a parallelizable algorithm Loop level parallelism ➢ Implement Serially : Optimized Serial Program ➢ Test, Debug & Time to solution ➢ Annotate the code with parallelization and Synchronization directives ➢ Remove Race Conditions, False Sharing*** ➢ Test and Debug ➢ Measure speed-up
  • 33. Problem: Count the Number of times each ASCII character occurs in page of text Input; ASCII text, stored as an ARRAY of characters, Number of bins (128) Output: Histogram with 128 buckets – one for each ASCII character ➢Start with a parallelizable algorithm ▪Loop level parallelism? void compute_histogram_st(char *page, int page_size, int *histogram) { for(int i = 0; i < page_size; i++){ char read_character = page[i]; histogram[read_character]++; } } Can this loop be parallelized?
  • 34. Annotate the code with parallelization and Synchronization directives void compute_histogram_st(char *page, int page_size, int *histogram) { #pragma omp parallel for for(int i = 0; i < page_size; i++) { char read_character = page[i]; histogram[read_character]++; } } omp parallel for This will not work! Why? Shared Mutual Exclusion Private variable Critical Section
  • 35. Problem: Count the Number of times each ASCII character occurs in page of text Input; ASCII text, stored as an ARRAY of characters, Number of bins (128) Output: Histogram with 128 buckets – one for each ASCII character Could be slower than the Serial Code. Overhead = Critical Section + Parallelization void compute_histogram_st(char *page, int page_size, int *histogram) { #pragma omp parallel for for(int i = 0; i < page_size; i++){ char read_character = page[i]; #pragma omp atomic histogram[read_character]++; } }
  • 36. void compute_histogram (char *page, int page_size, int *histogram, int num_bins) { int num_threads = omp_get_max_threads(); #pragma omp parallel { int local_histogram [num_bins] = {0}; #pragma omp for for(int i = 0; i < page_size; i++){ char read_character = page[i]; local_histogram [read_character]++; } #pragma omp critical for(int i = 0; i < num_bins; i++){ histogram[i] += local_histogram [i]; } } } Each Thread Updates its local copy Combine from thread locals to shared variable local_histogram Thread0 Thread1 Thread2 Bins 1,2,3,….num_bins ------>
  • 37. OpenMP: Reduction One or more variables that are private to each thread are subject of reduction operation at the end of the parallel region. #pragma omp for reduction(operator : var) Operator: + , * , - , & , | , && , ||, ^ Combines multiple local copies of the var from threads into a single copy at the master. sum = 0; #pragma omp parallel for for (int i = 0; i < 9; i++) { sum += a[i] }
  • 38. OpenMP: Reduction sum = 0; #pragma omp parallel for shared(sum, a) reduction(+: sum) for (int i = 0; i < 9; i++) { sum += a[i] } sumloc_1 = a[0] + a[1] + a[2] sumloc_2 = a[3] + a[4] + a[5] sumloc_3 = a[6] + a[7] + a[8] 3 Threads sum = sum_loc1 + sum_loc2 + sum_loc3
  • 39. Computing ∏ by method of Numerical Integration
  • 40. static long num_steps = 100000; double step; void main () { int i; double x, pi, sum = 0.0; step = 1.0 / (double) num_steps; for (I = 0; I <= num_steps; i++) { x = (I + 0.5) * step; sum = sum + 4.0 / (1.0 + x*x); } pi = step * sum } Serial Code Loop
  • 41. static long num_steps = 100000; double step; void main () { int i; double x, pi, sum = 0.0; step = 1.0 / (double) num_steps; for (I = 0; I <= num_steps; i++) { x = (I + 0.5) * step; sum = sum + 4.0 / (1.0 + x*x); } pi = step * sum } Computing ∏ by method of Numerical Integration #include <omp.h> #define NUM_THREADS 4 static long num_steps = 100000; double step; void main () { int i; double x, pi, sum = 0.0; step = 1.0 / (double) num_steps; omp_set_num_threads(NUM_THREADS); #pragma omp parallel for reduction(+:sum) private(x) for (I = 0; I <= num_steps; i++) { x = (I + 0.5) * step; sum = sum + 4.0 / (1.0 + x*x); } pi = step * sum } Serial Code Parallel Code