SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
Introduction to OpenMP
OpenMP is an Application Program Interface (API) for
• explicit
• portable
• shared-memory parallel programming
• in C/C++ and Fortran.
OpenMP consists of
• compiler directives,
• runtime calls and
• environment variables.
It is supported by all major compilers on Unix and
Windows platforms
GNU, IBM, Oracle, Intel, PGI, Absoft, Lahey/Fujitsu,
PathScale, HP, MS, Cray
OpenMP : What is it?
OpenMP Programming Model
➢ Designed for multi-processor/core, shared
memory machines.
➢ OpenMP programs accomplish parallelism
exclusively through the use of threads.
➢ Programmer has full control over
parallelization.
➢ Consists of a set of #pragmas (Compiler
Instructions/ Directives) that control how the
program works.
OpenMP: Core Elements
 Directives & Pragmas
▪ Forking Threads (parallel region)
▪ Work Sharing
▪ Synchronization
▪ Data Environment
 User level runtime functions & Env. variables
Thread Creation/Fork-Join
All OpenMP programs begin as a single process: the master
thread.
The master thread executes sequentially until the
first parallel region construct is encountered.
FORK: the master thread then creates a team of
parallel threads.
The statements in the program that are enclosed by the
parallel region construct are then executed in parallel
among the various team threads.
JOIN: When the team threads complete the statements in
the parallel region construct, they synchronize and
terminate, leaving only the master thread.
Thread Creation/Fork-Join
Master thread spawns a team of threads as needed.
Parallelism added incrementally until performance goals are
met: i.e. the sequential program evolves into a parallel
program
OpenMP Run Time Variables
❖Modify/check/get info about the number of threads
omp_get_num_threads() //number of threads in use
omp_get_thread_num() //tells which thread you are
omp_get_max_threads() //max threads that can be used
❖Are we in a parallel region?
omp_in_parallel()
❖How many processors in the system?
omp_get_num_procs()
❖Explicit locks
omp_[set|unset]_lock()
And several more...
OpenMP: Few Syntax Details
❖Most of the constructs in OpenMP are compiler directives or
pragmas
For C/C++ the pragmas take the form
#pragma omp construct [clause [clause]…]
For Fortran, the directives take one of the forms
C$OMP construct [clause [clause]…]
!$OMP construct [clause [clause]…]
*$OMP construct [clause [clause]…]
❖Header File or Fortran 90 module
#include omp.h
use omp_lib
Parallel Region and basic functions
Compiling OpenMP code
❖Same code can run on single-core or multi-core machines
❖Compiler directives are picked up ONLY when thee
program is instructed to be compiled in OpenMP mode.
❖Method depends on the compiler
G++
$ g++ -o foo foo.c -fopenmp
ICC
$ icc -o foo foo.c -fopenmp
Running OpenMP code
❖Controlling the number of threads at runtime
 The default number of threads = number of online
processors on the machine.
 C shell : setenv OMP_NUM_THREADS number
 Bash shell: export OMP_NUM_THREADS = number
 Runtime OpenMP function omp_set_num_threads(4)
 Clause in #pragma for parallel region
❖Execution Timing #include omp.h
stime = omp_get_wtime();
longfunction();
etime = omp_get_wtime();
total = etime-stime;
To create a 4 thread Parallel region :
Each thread calls pooh(ID,A) for ID = 0 to 3
Thread Creation/Fork-Join
OpenMP: Core Elements
 Directives & Pragmas
▪ Forking Threads (parallel region)
▪ Work Sharing
▪ Synchronization
▪ Data Environment
 User level runtime functions & Env. variables
Data vs. Task Parallelism
Data parallelism
Large amount of data elements and each data element
(or possibly a subset of elements) needs to be processed
to produce a result. When this processing can be done in
parallel, we have data parallelism
Task parallelism
A collection of tasks that need to be completed. If
these tasks can be performed in parallel you are faced
with a task parallel job
OpenMP: Work Sharing
A work-sharing construct divides the
execution of the enclosed code region among
different Threads
categories of work sharing in OpenMP
• omp for
• omp sections
Threads are assigned
independent sets of iterations.
Threads must wait at the end
of the work sharing construct.
#pragma omp for
#pragma omp parallel for
Work Sharing: omp for
Schedule Clause
Data
Sharing/Scope
Schedule Clause
How is the work is divided among threads?
Directives for work distribution
OpenMP for Parallelization
for (int i = 2; i < 10; i++)
{
x[i] = a * x[i-1] + b
}
Can all loops be parallelized?
Loop iterations have to be independent.
Simple Test: If the results differ when the code is executed
backwards, the loop cannot by parallelized!
Between 2 Synchronization points, if atleast 1 thread
writes to a memory location, that atleast 1 other thread
reads from => The result is non-deterministic
Work Sharing: sections
SECTIONS directive is a non-iterative work-sharing
construct.
➢ It specifies that the enclosed section(s) of code are to be
divided among the threads in the team.
➢ Each SECTION is executed ONCE by a thread in the
team.
Work Sharing: sections
OpenMP: Core Elements
 Directives & Pragmas
▪ Forking Threads (parallel region)
▪ Work Sharing
▪ Synchronization
▪ Data Environment
 User level runtime functions & Env. variables
Synchronization Constructs
Synchronization is achieved by
1) Barriers (Task Dependencies)
Implicit : Sync points exist at the end of
parallel –necessary barrier – cant be removed
for – can be removed by using the nowait clause
sections – can be removed by using the nowait clause
single – can be removed by using the nowait clause
Explicit : Must be used when ordering is required
#pragma omp barrier
each thread waits until all threads arrive at the barrier
Explicit Barrier
Implicit Barrier at end
of parallel region
No Barrier
nowait cancels barrier
creation
Synchronization: Barrier
Data Dependencies
OpenMP assumes that there is NO data-
dependency across jobs running in parallel
When the omp parallel directive is placed around
a code block, it is the programmer’s
responsibility to make sure data dependency is
ruled out
Race Condition
Non Deterministic Behaviour
Two or more threads access a shared variable at the same time.
Both Threads A and B are executing
Synchronization Constructs
2) Mutual Exclusion (Data Dependencies)
Critical Sections : Protect access to shared & modifiable data,
allowing ONLY ONE thread to enter it at a given time
#pragma omp critical
#pragma omp atomic – special case of critical, less overhead
Locks
Only one thread
updates this at a
time
Synchronization Constructs
A section of code can only be
executed by one thread at a time
OpenMP: Core Elements
 Directives & Pragmas
▪ Forking Threads (parallel region)
▪ Work Sharing
▪ Synchronization
▪ Data Environment
 User level runtime functions & Env. variables
OpenMP: Data Scoping
Challenge in Shared Memory Parallelization => Managing Data Environment
Scoping
OpenMP Shared variable : Can be Read/Written by all Threads in the team.
OpenMP Private variable : Each Thread has its own local copy of this variable
int i;
int j;
#pragma omp parallel private(j)
{
int k;
i = …….
j = ……..
k = …
}
Private
Shared
Loop variables in an omp for are private;
Local variables in the parallel region are private.
Alter default behaviour with the {default}
clause:
#pragma omp parallel default(shared)
private(x)
{ ... }
#pragma omp parallel default(private) shared
(matrix)
{ ... }
OpenMP: private Clause
• Reproduce the private variable for each thread.
• Variables are not initialized.
• The value that Thread1 stores in x is different from
the value Thread2 stores in x
OpenMP Parallel Programming
➢ Start with a parallelizable algorithm
Loop level parallelism
➢ Implement Serially : Optimized Serial Program
➢ Test, Debug & Time to solution
➢ Annotate the code with parallelization and
Synchronization directives
➢ Remove Race Conditions, False Sharing***
➢ Test and Debug
➢ Measure speed-up
Problem: Count the Number of times each ASCII character occurs in page of text
Input; ASCII text, stored as an ARRAY of characters, Number of bins (128)
Output: Histogram with 128 buckets – one for each ASCII character
➢Start with a parallelizable algorithm
▪Loop level parallelism?
void compute_histogram_st(char *page, int page_size, int
*histogram)
{
for(int i = 0; i < page_size; i++){
char read_character = page[i];
histogram[read_character]++;
}
}
Can this loop be
parallelized?
Annotate the code with parallelization and
Synchronization directives
void compute_histogram_st(char *page, int page_size, int
*histogram)
{
#pragma omp parallel for
for(int i = 0; i < page_size; i++) {
char read_character = page[i];
histogram[read_character]++;
}
}
omp parallel for
This will not work! Why?
Shared
Mutual Exclusion
Private variable
Critical Section
Problem: Count the Number of times each ASCII character occurs in page of text
Input; ASCII text, stored as an ARRAY of characters, Number of bins (128)
Output: Histogram with 128 buckets – one for each ASCII character
Could be slower than the Serial Code.
Overhead = Critical Section + Parallelization
void compute_histogram_st(char *page, int page_size, int
*histogram)
{
#pragma omp parallel for
for(int i = 0; i < page_size; i++){
char read_character = page[i];
#pragma omp atomic
histogram[read_character]++;
}
}
void compute_histogram (char *page, int page_size, int *histogram, int num_bins)
{
int num_threads = omp_get_max_threads();
#pragma omp parallel
{
int local_histogram [num_bins] = {0};
#pragma omp for
for(int i = 0; i < page_size; i++){
char read_character = page[i];
local_histogram [read_character]++;
}
#pragma omp critical
for(int i = 0; i < num_bins; i++){
histogram[i] += local_histogram [i];
}
}
}
Each Thread Updates
its local copy
Combine from thread locals
to shared variable
local_histogram
Thread0
Thread1
Thread2
Bins 1,2,3,….num_bins ------>
OpenMP: Reduction
One or more variables that are private to each thread are subject of
reduction operation at the end of the parallel region.
#pragma omp for reduction(operator : var)
Operator: + , * , - , & , | , && , ||, ^
Combines multiple local copies of the var from threads into a single
copy at the master.
sum = 0;
#pragma omp parallel for
for (int i = 0; i < 9; i++)
{
sum += a[i]
}
OpenMP: Reduction
sum = 0;
#pragma omp parallel for shared(sum, a) reduction(+: sum)
for (int i = 0; i < 9; i++)
{
sum += a[i]
}
sumloc_1 = a[0] + a[1] + a[2]
sumloc_2 = a[3] + a[4] + a[5]
sumloc_3 = a[6] + a[7] + a[8]
3 Threads
sum = sum_loc1 + sum_loc2 + sum_loc3
Computing ∏ by method of Numerical Integration
static long num_steps = 100000;
double step;
void main ()
{
int i; double x, pi, sum = 0.0;
step = 1.0 / (double) num_steps;
for (I = 0; I <= num_steps; i++)
{
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}
pi = step * sum
}
Serial Code
Loop
static long num_steps = 100000;
double step;
void main ()
{
int i; double x, pi, sum = 0.0;
step = 1.0 / (double) num_steps;
for (I = 0; I <= num_steps; i++) {
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}
pi = step * sum
}
Computing ∏ by method of Numerical Integration
#include <omp.h>
#define NUM_THREADS 4
static long num_steps = 100000;
double step;
void main ()
{
int i; double x, pi, sum = 0.0;
step = 1.0 / (double) num_steps;
omp_set_num_threads(NUM_THREADS);
#pragma omp parallel for reduction(+:sum)
private(x)
for (I = 0; I <= num_steps; i++) {
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}
pi = step * sum
}
Serial Code Parallel Code
Thank You

Contenu connexe

Tendances

All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations Syed Zaid Irshad
 
High Performance Computing using MPI
High Performance Computing using MPIHigh Performance Computing using MPI
High Performance Computing using MPIAnkit Mahato
 
Multithreading
MultithreadingMultithreading
MultithreadingA B Shinde
 
Parallel Programing Model
Parallel Programing ModelParallel Programing Model
Parallel Programing ModelAdlin Jeena
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computingVajira Thambawita
 
Introduction to Compiler Construction
Introduction to Compiler Construction Introduction to Compiler Construction
Introduction to Compiler Construction Sarmad Ali
 
Communication model of parallel platforms
Communication model of parallel platformsCommunication model of parallel platforms
Communication model of parallel platformsSyed Zaid Irshad
 
Code generation in Compiler Design
Code generation in Compiler DesignCode generation in Compiler Design
Code generation in Compiler DesignKuppusamy P
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Akhila Prabhakaran
 
Introduction to MPI
Introduction to MPI Introduction to MPI
Introduction to MPI Hanif Durad
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 

Tendances (20)

All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations
 
Types of Compilers
Types of CompilersTypes of Compilers
Types of Compilers
 
Monitors
MonitorsMonitors
Monitors
 
High Performance Computing using MPI
High Performance Computing using MPIHigh Performance Computing using MPI
High Performance Computing using MPI
 
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
 
Multithreading
MultithreadingMultithreading
Multithreading
 
Parallel Programing Model
Parallel Programing ModelParallel Programing Model
Parallel Programing Model
 
Chapter 4 pc
Chapter 4 pcChapter 4 pc
Chapter 4 pc
 
Semaphore
SemaphoreSemaphore
Semaphore
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
Mainframe systems
Mainframe systemsMainframe systems
Mainframe systems
 
Introduction to Compiler Construction
Introduction to Compiler Construction Introduction to Compiler Construction
Introduction to Compiler Construction
 
Phases of Compiler
Phases of CompilerPhases of Compiler
Phases of Compiler
 
Communication model of parallel platforms
Communication model of parallel platformsCommunication model of parallel platforms
Communication model of parallel platforms
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 
Kernel module in linux os.
Kernel module in linux os.Kernel module in linux os.
Kernel module in linux os.
 
Code generation in Compiler Design
Code generation in Compiler DesignCode generation in Compiler Design
Code generation in Compiler Design
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)
 
Introduction to MPI
Introduction to MPI Introduction to MPI
Introduction to MPI
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 

Similaire à Introduction to OpenMP

Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5AbdullahMunir32
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacGanesan Narayanasamy
 
Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open MpAnshul Sharma
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersDhanashree Prasad
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingVengada Karthik Rangaraju
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5Jeff Larkin
 
introduction to server-side scripting
introduction to server-side scriptingintroduction to server-side scripting
introduction to server-side scriptingAmirul Shafeeq
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPAnil Bohare
 

Similaire à Introduction to OpenMP (20)

Open mp intro_01
Open mp intro_01Open mp intro_01
Open mp intro_01
 
OpenMP.pptx
OpenMP.pptxOpenMP.pptx
OpenMP.pptx
 
openmpfinal.pdf
openmpfinal.pdfopenmpfinal.pdf
openmpfinal.pdf
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdac
 
MPI n OpenMP
MPI n OpenMPMPI n OpenMP
MPI n OpenMP
 
Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
 
Lecture6
Lecture6Lecture6
Lecture6
 
Lecture8
Lecture8Lecture8
Lecture8
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
 
25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
 
introduction to server-side scripting
introduction to server-side scriptingintroduction to server-side scripting
introduction to server-side scripting
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
OpenMp
OpenMpOpenMp
OpenMp
 
Open mp directives
Open mp directivesOpen mp directives
Open mp directives
 
Parllelizaion
ParllelizaionParllelizaion
Parllelizaion
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 

Plus de Akhila Prabhakaran

Plus de Akhila Prabhakaran (7)

Re Imagining Education
Re Imagining EducationRe Imagining Education
Re Imagining Education
 
Hypothesis testing Part1
Hypothesis testing Part1Hypothesis testing Part1
Hypothesis testing Part1
 
Statistical Analysis with R- III
Statistical Analysis with R- IIIStatistical Analysis with R- III
Statistical Analysis with R- III
 
Statistical Analysis with R -II
Statistical Analysis with R -IIStatistical Analysis with R -II
Statistical Analysis with R -II
 
Statistical Analysis with R -I
Statistical Analysis with R -IStatistical Analysis with R -I
Statistical Analysis with R -I
 
Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPI
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 

Dernier

Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 

Dernier (20)

Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 

Introduction to OpenMP

  • 2. OpenMP is an Application Program Interface (API) for • explicit • portable • shared-memory parallel programming • in C/C++ and Fortran. OpenMP consists of • compiler directives, • runtime calls and • environment variables. It is supported by all major compilers on Unix and Windows platforms GNU, IBM, Oracle, Intel, PGI, Absoft, Lahey/Fujitsu, PathScale, HP, MS, Cray OpenMP : What is it?
  • 3. OpenMP Programming Model ➢ Designed for multi-processor/core, shared memory machines. ➢ OpenMP programs accomplish parallelism exclusively through the use of threads. ➢ Programmer has full control over parallelization. ➢ Consists of a set of #pragmas (Compiler Instructions/ Directives) that control how the program works.
  • 4. OpenMP: Core Elements  Directives & Pragmas ▪ Forking Threads (parallel region) ▪ Work Sharing ▪ Synchronization ▪ Data Environment  User level runtime functions & Env. variables
  • 5. Thread Creation/Fork-Join All OpenMP programs begin as a single process: the master thread. The master thread executes sequentially until the first parallel region construct is encountered. FORK: the master thread then creates a team of parallel threads. The statements in the program that are enclosed by the parallel region construct are then executed in parallel among the various team threads. JOIN: When the team threads complete the statements in the parallel region construct, they synchronize and terminate, leaving only the master thread.
  • 6. Thread Creation/Fork-Join Master thread spawns a team of threads as needed. Parallelism added incrementally until performance goals are met: i.e. the sequential program evolves into a parallel program
  • 7. OpenMP Run Time Variables ❖Modify/check/get info about the number of threads omp_get_num_threads() //number of threads in use omp_get_thread_num() //tells which thread you are omp_get_max_threads() //max threads that can be used ❖Are we in a parallel region? omp_in_parallel() ❖How many processors in the system? omp_get_num_procs() ❖Explicit locks omp_[set|unset]_lock() And several more...
  • 8. OpenMP: Few Syntax Details ❖Most of the constructs in OpenMP are compiler directives or pragmas For C/C++ the pragmas take the form #pragma omp construct [clause [clause]…] For Fortran, the directives take one of the forms C$OMP construct [clause [clause]…] !$OMP construct [clause [clause]…] *$OMP construct [clause [clause]…] ❖Header File or Fortran 90 module #include omp.h use omp_lib
  • 9. Parallel Region and basic functions
  • 10. Compiling OpenMP code ❖Same code can run on single-core or multi-core machines ❖Compiler directives are picked up ONLY when thee program is instructed to be compiled in OpenMP mode. ❖Method depends on the compiler G++ $ g++ -o foo foo.c -fopenmp ICC $ icc -o foo foo.c -fopenmp
  • 11. Running OpenMP code ❖Controlling the number of threads at runtime  The default number of threads = number of online processors on the machine.  C shell : setenv OMP_NUM_THREADS number  Bash shell: export OMP_NUM_THREADS = number  Runtime OpenMP function omp_set_num_threads(4)  Clause in #pragma for parallel region ❖Execution Timing #include omp.h stime = omp_get_wtime(); longfunction(); etime = omp_get_wtime(); total = etime-stime;
  • 12. To create a 4 thread Parallel region : Each thread calls pooh(ID,A) for ID = 0 to 3 Thread Creation/Fork-Join
  • 13. OpenMP: Core Elements  Directives & Pragmas ▪ Forking Threads (parallel region) ▪ Work Sharing ▪ Synchronization ▪ Data Environment  User level runtime functions & Env. variables
  • 14. Data vs. Task Parallelism Data parallelism Large amount of data elements and each data element (or possibly a subset of elements) needs to be processed to produce a result. When this processing can be done in parallel, we have data parallelism Task parallelism A collection of tasks that need to be completed. If these tasks can be performed in parallel you are faced with a task parallel job
  • 15. OpenMP: Work Sharing A work-sharing construct divides the execution of the enclosed code region among different Threads categories of work sharing in OpenMP • omp for • omp sections
  • 16. Threads are assigned independent sets of iterations. Threads must wait at the end of the work sharing construct. #pragma omp for #pragma omp parallel for
  • 17. Work Sharing: omp for Schedule Clause Data Sharing/Scope
  • 18. Schedule Clause How is the work is divided among threads? Directives for work distribution
  • 19. OpenMP for Parallelization for (int i = 2; i < 10; i++) { x[i] = a * x[i-1] + b } Can all loops be parallelized? Loop iterations have to be independent. Simple Test: If the results differ when the code is executed backwards, the loop cannot by parallelized! Between 2 Synchronization points, if atleast 1 thread writes to a memory location, that atleast 1 other thread reads from => The result is non-deterministic
  • 20. Work Sharing: sections SECTIONS directive is a non-iterative work-sharing construct. ➢ It specifies that the enclosed section(s) of code are to be divided among the threads in the team. ➢ Each SECTION is executed ONCE by a thread in the team.
  • 22. OpenMP: Core Elements  Directives & Pragmas ▪ Forking Threads (parallel region) ▪ Work Sharing ▪ Synchronization ▪ Data Environment  User level runtime functions & Env. variables
  • 23. Synchronization Constructs Synchronization is achieved by 1) Barriers (Task Dependencies) Implicit : Sync points exist at the end of parallel –necessary barrier – cant be removed for – can be removed by using the nowait clause sections – can be removed by using the nowait clause single – can be removed by using the nowait clause Explicit : Must be used when ordering is required #pragma omp barrier each thread waits until all threads arrive at the barrier
  • 24. Explicit Barrier Implicit Barrier at end of parallel region No Barrier nowait cancels barrier creation Synchronization: Barrier
  • 25. Data Dependencies OpenMP assumes that there is NO data- dependency across jobs running in parallel When the omp parallel directive is placed around a code block, it is the programmer’s responsibility to make sure data dependency is ruled out
  • 26. Race Condition Non Deterministic Behaviour Two or more threads access a shared variable at the same time. Both Threads A and B are executing
  • 27. Synchronization Constructs 2) Mutual Exclusion (Data Dependencies) Critical Sections : Protect access to shared & modifiable data, allowing ONLY ONE thread to enter it at a given time #pragma omp critical #pragma omp atomic – special case of critical, less overhead Locks Only one thread updates this at a time
  • 28. Synchronization Constructs A section of code can only be executed by one thread at a time
  • 29. OpenMP: Core Elements  Directives & Pragmas ▪ Forking Threads (parallel region) ▪ Work Sharing ▪ Synchronization ▪ Data Environment  User level runtime functions & Env. variables
  • 30. OpenMP: Data Scoping Challenge in Shared Memory Parallelization => Managing Data Environment Scoping OpenMP Shared variable : Can be Read/Written by all Threads in the team. OpenMP Private variable : Each Thread has its own local copy of this variable int i; int j; #pragma omp parallel private(j) { int k; i = ……. j = …….. k = … } Private Shared Loop variables in an omp for are private; Local variables in the parallel region are private. Alter default behaviour with the {default} clause: #pragma omp parallel default(shared) private(x) { ... } #pragma omp parallel default(private) shared (matrix) { ... }
  • 31. OpenMP: private Clause • Reproduce the private variable for each thread. • Variables are not initialized. • The value that Thread1 stores in x is different from the value Thread2 stores in x
  • 32. OpenMP Parallel Programming ➢ Start with a parallelizable algorithm Loop level parallelism ➢ Implement Serially : Optimized Serial Program ➢ Test, Debug & Time to solution ➢ Annotate the code with parallelization and Synchronization directives ➢ Remove Race Conditions, False Sharing*** ➢ Test and Debug ➢ Measure speed-up
  • 33. Problem: Count the Number of times each ASCII character occurs in page of text Input; ASCII text, stored as an ARRAY of characters, Number of bins (128) Output: Histogram with 128 buckets – one for each ASCII character ➢Start with a parallelizable algorithm ▪Loop level parallelism? void compute_histogram_st(char *page, int page_size, int *histogram) { for(int i = 0; i < page_size; i++){ char read_character = page[i]; histogram[read_character]++; } } Can this loop be parallelized?
  • 34. Annotate the code with parallelization and Synchronization directives void compute_histogram_st(char *page, int page_size, int *histogram) { #pragma omp parallel for for(int i = 0; i < page_size; i++) { char read_character = page[i]; histogram[read_character]++; } } omp parallel for This will not work! Why? Shared Mutual Exclusion Private variable Critical Section
  • 35. Problem: Count the Number of times each ASCII character occurs in page of text Input; ASCII text, stored as an ARRAY of characters, Number of bins (128) Output: Histogram with 128 buckets – one for each ASCII character Could be slower than the Serial Code. Overhead = Critical Section + Parallelization void compute_histogram_st(char *page, int page_size, int *histogram) { #pragma omp parallel for for(int i = 0; i < page_size; i++){ char read_character = page[i]; #pragma omp atomic histogram[read_character]++; } }
  • 36. void compute_histogram (char *page, int page_size, int *histogram, int num_bins) { int num_threads = omp_get_max_threads(); #pragma omp parallel { int local_histogram [num_bins] = {0}; #pragma omp for for(int i = 0; i < page_size; i++){ char read_character = page[i]; local_histogram [read_character]++; } #pragma omp critical for(int i = 0; i < num_bins; i++){ histogram[i] += local_histogram [i]; } } } Each Thread Updates its local copy Combine from thread locals to shared variable local_histogram Thread0 Thread1 Thread2 Bins 1,2,3,….num_bins ------>
  • 37. OpenMP: Reduction One or more variables that are private to each thread are subject of reduction operation at the end of the parallel region. #pragma omp for reduction(operator : var) Operator: + , * , - , & , | , && , ||, ^ Combines multiple local copies of the var from threads into a single copy at the master. sum = 0; #pragma omp parallel for for (int i = 0; i < 9; i++) { sum += a[i] }
  • 38. OpenMP: Reduction sum = 0; #pragma omp parallel for shared(sum, a) reduction(+: sum) for (int i = 0; i < 9; i++) { sum += a[i] } sumloc_1 = a[0] + a[1] + a[2] sumloc_2 = a[3] + a[4] + a[5] sumloc_3 = a[6] + a[7] + a[8] 3 Threads sum = sum_loc1 + sum_loc2 + sum_loc3
  • 39. Computing ∏ by method of Numerical Integration
  • 40. static long num_steps = 100000; double step; void main () { int i; double x, pi, sum = 0.0; step = 1.0 / (double) num_steps; for (I = 0; I <= num_steps; i++) { x = (I + 0.5) * step; sum = sum + 4.0 / (1.0 + x*x); } pi = step * sum } Serial Code Loop
  • 41. static long num_steps = 100000; double step; void main () { int i; double x, pi, sum = 0.0; step = 1.0 / (double) num_steps; for (I = 0; I <= num_steps; i++) { x = (I + 0.5) * step; sum = sum + 4.0 / (1.0 + x*x); } pi = step * sum } Computing ∏ by method of Numerical Integration #include <omp.h> #define NUM_THREADS 4 static long num_steps = 100000; double step; void main () { int i; double x, pi, sum = 0.0; step = 1.0 / (double) num_steps; omp_set_num_threads(NUM_THREADS); #pragma omp parallel for reduction(+:sum) private(x) for (I = 0; I <= num_steps; i++) { x = (I + 0.5) * step; sum = sum + 4.0 / (1.0 + x*x); } pi = step * sum } Serial Code Parallel Code