SlideShare une entreprise Scribd logo
1  sur  51
Télécharger pour lire hors ligne
Introduction to Parallel
      Computing
         Part IIb
What is MPI?
Message Passing Interface (MPI) is a
standardised interface. Using this interface,
several implementations have been made.
The MPI standard specifies three forms of
subroutine interfaces:
(1) Language independent notation;
(2) Fortran notation;
(3) C notation.
MPI Features
MPI implementations provide:

•   Abstraction of hardware implementation
•   Synchronous communication
•   Asynchronous communication
•   File operations
•   Time measurement operations
Implementations

MPICH       Unix / Windows NT
MPICH-T3E   Cray T3E
LAM         Unix/SGI Irix/IBM AIX
Chimp       SunOS/AIX/Irix/HP-UX
WinMPI      Windows 3.1 (no network req.)
Programming with MPI
What is the difference between programming
using the traditional approach and the MPI
approach:

1. Use of MPI library
2. Compiling
3. Running
Compiling (1)
When a program is written, compiling it
should be done a little bit different from the
normal situation. Although details differ for
various MPI implementations, there are
two frequently used approaches.
Compiling (2)
First approach
 $ gcc myprogram.c –o myexecutable -lmpi



Second approach
  $ mpicc myprogram.c –o myexecutable
Running (1)
In order to run an MPI-Enabled application
we should generally use the command
‘mpirun’:
 $ mpirun –np x myexecutable <parameters>


Where x is the number of processes to use,
and <parameters> are the arguments to the
Executable, if any.
Running (2)
The ‘mpirun’ program will take care of the
creation of processes on selected processors.
By default, ‘mpirun’ will decide which
processors to use, this is usually determined
by a global configuration file. It is possible
to specify processors, but they may only be
used as a hint.
MPI Programming (1)
Implementations of MPI support Fortran, C,
or both. Here we only consider programming
using the C Libraries. The first step in writing
a program using MPI is to include the correct
header:
               #include “mpi.h”
MPI Programming (2)

#include “mpi.h”

int main (int argc, char *argv[])
{ …
   MPI_Init(&argc, &argv);
   …
   MPI_Finalize();
   return …;
}
MPI_Init
int MPI_Init (int *argc, char ***argv)

The MPI_Init procedure should be called
before any other MPI procedure (except
MPI_Initialized). It must be called exactly
once, at program initialisation. If removes
the arguments that are used by MPI from the
argument array.
MPI_Finalize
int MPI_Finalize (void)

This routine cleans up all MPI states. It
  should
be the last MPI routine to be called in a
program; no other MPI routine may be called
after MPI_Finalize. Pending communication
should be finished before finalisation.
Using multiple processes
When running an MPI enabled program using
multiple processes, each process will run an
identical copy of the program. So there must
be a way to know which process we are.
This situation is comparable to that of
programming using the ‘fork’ statement. MPI
defines two subroutines that can be used.
MPI_Comm_size
int MPI_Comm_size (MPI_Comm comm, int *size)


This call returns the number of processes
involved in a communicator. To find out how
many processes are used in total, call this
function with the predefined global
communicator MPI_COMM_WORLD.
MPI_Comm_rank
int MPI_Comm_rank (MPI_Comm comm, int *rank)


This procedure determines the rank (index) of
the calling process in the communicator. Each
process is assigned a unique number within a
communicator.
MPI_COMM_WORLD
MPI communicators are used to specify to
what processes communication applies to.
A communicator is shared by a group of
processes. The predefined MPI_COMM_WORLD
applies to all processes. Communicators can
be duplicated, created and deleted. For most
application, use of MPI_COMM_WORLD
suffices.
Example ‘Hello World!’
#include <stdio.h>
#include "mpi.h"

int main (int argc, char *argv[])
{ int size, rank;

    MPI_Init (&argc, &argv);
    MPI_Comm_size (MPI_COMM_WORLD, &size);
    MPI_Comm_rank (MPI_COMM_WORLD, &rank);

    printf ("Hello world! from processor (%d/%d)n", rank+1, size);

    MPI_Finalize();

    return 0;
}
Running ‘Hello World!’
$ mpicc -o hello hello.c
$ mpirun -np 3 hello
Hello world! from processor (1/3)
Hello world! from processor (2/3)
Hello world! from processor (3/3)
$ _
MPI_Send
int MPI_Send (void *buf, int count, MPI_Datatype datatype,
              int dest, int tag, MPI_Comm comm )


Synchronously sends a message to dest. Data
is found in buf, that contains count elements
of datatype. To identify the send, a tag has to
be specified. The destination dest is the
processor rank in communicator comm.
MPI_Recv
int MPI_Recv (void *buf, int count, MPI_Datatype datatype,
              int source, int tag, MPI_Comm comm,
              MPI_Status *status)


Synchronously receives a message from
  source.
Buffer must be able to hold count elements of
datatype. The status field is filled with status
information. MPI_Recv and MPI_Send calls
should match; equal tag, count, datatype.
Datatypes
MPI_CHAR             signed char
MPI_SHORT            signed short int
MPI_INT              signed int
MPI_LONG             signed long int
MPI_UNSIGNED_CHAR    unsigned char
MPI_UNSIGNED_SHORT   unsigned short int
MPI_UNSIGNED         unsigned int
MPI_UNSIGNED_LONG    unsigned long int
MPI_FLOAT            float
MPI_DOUBLE           double
MPI_LONG_DOUBLE      long double

(http://www-jics.cs.utk.edu/MPI/MPIguide/MPIguide.html)
Example send / receive
#include <stdio.h>
#include "mpi.h"

int main (int argc, char *argv[])
{ MPI_Status s;
   int        size, rank, i, j;


    MPI_Init (&argc, &argv);
    MPI_Comm_size (MPI_COMM_WORLD, &size);
    MPI_Comm_rank (MPI_COMM_WORLD, &rank);

    if (rank == 0) // Master process
    { printf ("Receiving data . . .n");
       for (i = 1; i < size; i++)
       { MPI_Recv ((void *)&j, 1, MPI_INT, i, 0xACE5, MPI_COMM_WORLD, &s);
          printf ("[%d] sent %dn", i, j);
       }
    }
    else
    { j = rank * rank;
       MPI_Send ((void *)&j, 1, MPI_INT, 0, 0xACE5, MPI_COMM_WORLD);
    }

    MPI_Finalize();
    return 0;
}
Running send / receive
$ mpicc -o sendrecv sendrecv.c
$ mpirun -np 4 sendrecv
Receiving data . . .
[1] sent 1
[2] sent 4
[3] sent 9
$ _
MPI_Bcast
int MPI_Bcast (void *buffer, int count, MPI_Datatype datatype,
               int root, MPI_Comm comm)


Synchronously broadcasts a message from
root, to all processors in communicator comm
(including itself). Buffer is used as source in
root processor, as destination in others.
MPI_Barrier
int MPI_Barrier (MPI_Comm comm)

Blocks until all processes defined in comm
have reached this routine. Use this routine to
synchronize processes.
Example broadcast / barrier
int main (int argc, char *argv[])
{ int rank, i;

    MPI_Init (&argc, &argv);
    MPI_Comm_rank (MPI_COMM_WORLD, &rank);

    if (rank == 0) i = 27;
    MPI_Bcast ((void *)&i, 1, MPI_INT, 0, MPI_COMM_WORLD);
    printf ("[%d] i = %dn", rank, i);

    // Wait for every process to reach this code
    MPI_Barrier (MPI_COMM_WORLD);

    MPI_Finalize();

    return 0;
}
Running broadcast / barrier
$ mpicc -o broadcast broadcast.c
$ mpirun -np 3 broadcast
[0] i = 27
[1] i = 27
[2] i = 27
$ _
MPI_Sendrecv
int MPI_Sendrecv (void *sendbuf, int sendcount, MPI_Datatype
    sendtype,
          int dest, int sendtag,
          void *recvbuf, int recvcount, MPI_Datatype recvtype,
          int source, int recvtag, MPI_Comm comm, MPI_Status *status)


int MPI_Sendrecv_replace( void *buf, int count, MPI_Datatype datatype,
                          int dest, int sendtag, int source, int recvtag,
                          MPI_Comm comm, MPI_Status *status )



Send and receive (2nd, using only one buffer).
Other useful routines
•   MPI_Scatter
•   MPI_Gather
•   MPI_Type_vector
•   MPI_Type_commit
•   MPI_Reduce / MPI_Allreduce
•   MPI_Op_create
Example scatter / reduce
int main (int argc, char *argv[])
{ int data[] = {1, 2, 3, 4, 5, 6, 7}; // Size must be >= #processors
   int rank, i = -1, j = -1;

    MPI_Init (&argc, &argv);
    MPI_Comm_rank (MPI_COMM_WORLD, &rank);

    MPI_Scatter ((void *)data, 1, MPI_INT,
                 (void *)&i , 1, MPI_INT,
                 0, MPI_COMM_WORLD);

    printf ("[%d] Received i = %dn", rank, i);

    MPI_Reduce ((void *)&i, (void *)&j, 1, MPI_INT,
                MPI_PROD, 0, MPI_COMM_WORLD);

    printf ("[%d] j = %dn", rank, j);

    MPI_Finalize();

    return 0;
}
Running scatter / reduce
$ mpicc -o scatterreduce scatterreduce.c
$ mpirun -np 4 scatterreduce
[0] Received i = 1
[0] j = 24
[1] Received i = 2
[1] j = -1
[2] Received i = 3
[2] j = -1
[3] Received i = 4
[3] j = -1
$ _
Some reduce operations
MPI_MAX     Maximum value
MPI_MIN     Minimum value
MPI_SUM     Sum of values
MPI_PROD    Product of values
MPI_LAND    Logical AND
MPI_BAND    Boolean AND
MPI_LOR     Logical OR
MPI_BOR     Boolean OR
MPI_LXOR    Logical Exclusive OR
MPI_BXOR    Boolean Exclusive OR
Measuring running time
double MPI_Wtime (void);

 double timeStart, timeEnd;
 ...
 timeStart = MPI_Wtime();
     // Code to measure time for goes here.
 timeEnd = MPI_Wtime()
 ...
 printf (“Running time = %f secondsn”,
           timeEnd – timeStart);
Parallel sorting (1)
Sorting an sequence of numbers using the
binary–sort method. This method divides
a given sequence into two halves (until
only one element remains) and sorts both
halves recursively. The two halves are then
merged together to form a sorted sequence.
Binary sort pseudo-code
sorted-sequence BinarySort (sequence)
{ if (# elements in sequence > 1)
   { seqA = first half of sequence
      seqB = second half of sequence
      BinarySort (seqA);
      BinarySort (seqB);
      sorted-sequence = merge (seqA, seqB);
   }
   else sorted-sequence = sequence
}
Merge two sorted sequences
  1   2   5   7                   3   4   6   8




          1   2   3   4   5   6   7   8
Example binary – sort
             1   2
                 7       3
                         5       4
                                 2   5
                                     8    6
                                          4       7
                                                  6      8
                                                         3


     1       2
             7   5    7
                      2                      3
                                             8       4   6       8
                                                                 3


 1       7           2
                     5       5
                             2           4
                                         8       8
                                                 4            3
                                                              6      6
                                                                     3


1         7       5           2       8           4          6        3
Parallel sorting (2)
This way of dividing work and gathering the
results is a quite natural way to use for a
parallel implementation. Divide work in two
to two processors. Have each of these
processors divide their work again, until
  either
no data can be split again or no processors are
available anymore.
Implementation problems
•   Number of processors may not be a power of two
•   Number of elements may not be a power of two
•   How to achieve an even workload?
•   Data size is less than number of processors
Parallel matrix multiplication
We use the following partitioning of data (p=4)

      P1                                P1
      P2                                P2
      P3                                P3
      P4                                P4
Implementation
1. Master (process 0) reads data
2. Master sends size of data to slaves
3. Slaves allocate memory
4. Master broadcasts second matrix to all other
   processes
5. Master sends respective parts of first matrix to
   all other processes
6. Every process performs its local multiplication
7. All slave processes send back their result.
Multiplication 1000 x 1000
                          1000 x 1000 Matrix multiplication

           140

           120

           100

           80
Time (s)




           60

           40

           20

            0
                 0   10        20             30            40   50   60

                                         Processors

                                         Tp        T1 / p
Multiplication 5000 x 5000
                           5000 x 5000 Matrix multiplication

           90000

           80000

           70000

           60000
Time (s)




           50000

           40000

           30000

           20000

           10000

              0
                   0   5      10         15                20   25   30   35

                                              Processors

                                              Tp      T1 / p
Gaussian elimination
We use the following partitioning of data (p=4)

               P1             P1
               P2             P2
               P3             P3
               P4             P4
Implementation (1)
1. Master reads both matrices
2. Master sends size of matrices to slaves
3. Slaves calculate their part and allocate
   memory
4. Master sends each slave its respective part
5. Set sweeping row to 0 in all processes
6. Sweep matrix (see next sheet)
7. Slave send back their result
Implementation (2)
While sweeping row not past final row do
A. Have every process decide whether they
   own the current sweeping row
B. The owner sends a copy of the row to
   every other process
C. All processes sweep their part of the
   matrix using the current row
D. Sweeping row is incremented
Programming hints
• Keep it simple!
• Avoid deadlocks
• Write robust code even at cost of speed
• Design in advance, debugging is more
  difficult (printing output is different)
• Error handing requires synchronisation, you
  can’t just exit the program.
References (1)
MPI Forum Home Page
  http://www.mpi-forum.org/index.html


Beginners guide to MPI (see also /MPI/)
  http://www-jics.cs.utk.edu/MPI/MPIguide/MPIguide.html


MPICH
  http://www-unix.mcs.anl.gov/mpi/mpich/
References (2)
Miscellaneous

http://www.erc.msstate.edu/labs/hpcl/projects/mpi/
http://nexus.cs.usfca.edu/mpi/
http://www-unix.mcs.anl.gov/~gropp/
http://www.epm.ornl.gov/~walker/mpitutorial/
http://www.lam-mpi.org/
http://epcc.ed.ac.uk/chimp/
http://www-unix.mcs.anl.gov/mpi/www/www3/
Parallel computing(2)

Contenu connexe

Tendances

C programming language tutorial
C programming language tutorial C programming language tutorial
C programming language tutorial javaTpoint s
 
C Programming Language Tutorial for beginners - JavaTpoint
C Programming Language Tutorial for beginners - JavaTpointC Programming Language Tutorial for beginners - JavaTpoint
C Programming Language Tutorial for beginners - JavaTpointJavaTpoint.Com
 
FUNCTION IN C PROGRAMMING UNIT -6 (BCA I SEM)
 FUNCTION IN C PROGRAMMING UNIT -6 (BCA I SEM) FUNCTION IN C PROGRAMMING UNIT -6 (BCA I SEM)
FUNCTION IN C PROGRAMMING UNIT -6 (BCA I SEM)Mansi Tyagi
 
Programming using c++ tool
Programming using c++ toolProgramming using c++ tool
Programming using c++ toolAbdullah Jan
 
Introduction to c programming
Introduction to c programmingIntroduction to c programming
Introduction to c programminggajendra singh
 
C Programming Language Step by Step Part 1
C Programming Language Step by Step Part 1C Programming Language Step by Step Part 1
C Programming Language Step by Step Part 1Rumman Ansari
 
Hands-on Introduction to the C Programming Language
Hands-on Introduction to the C Programming LanguageHands-on Introduction to the C Programming Language
Hands-on Introduction to the C Programming LanguageVincenzo De Florio
 
Fundamental of C Programming Language and Basic Input/Output Function
  Fundamental of C Programming Language and Basic Input/Output Function  Fundamental of C Programming Language and Basic Input/Output Function
Fundamental of C Programming Language and Basic Input/Output Functionimtiazalijoono
 

Tendances (20)

C introduction by piyushkumar
C introduction by piyushkumarC introduction by piyushkumar
C introduction by piyushkumar
 
C programming language tutorial
C programming language tutorial C programming language tutorial
C programming language tutorial
 
Introduction Of C++
Introduction Of C++Introduction Of C++
Introduction Of C++
 
Chap 3 c++
Chap 3 c++Chap 3 c++
Chap 3 c++
 
C Programming Language Tutorial for beginners - JavaTpoint
C Programming Language Tutorial for beginners - JavaTpointC Programming Language Tutorial for beginners - JavaTpoint
C Programming Language Tutorial for beginners - JavaTpoint
 
FUNCTION IN C PROGRAMMING UNIT -6 (BCA I SEM)
 FUNCTION IN C PROGRAMMING UNIT -6 (BCA I SEM) FUNCTION IN C PROGRAMMING UNIT -6 (BCA I SEM)
FUNCTION IN C PROGRAMMING UNIT -6 (BCA I SEM)
 
Intro to C++ - language
Intro to C++ - languageIntro to C++ - language
Intro to C++ - language
 
C++ How to program
C++ How to programC++ How to program
C++ How to program
 
Programming in C
Programming in CProgramming in C
Programming in C
 
Programming using c++ tool
Programming using c++ toolProgramming using c++ tool
Programming using c++ tool
 
Function C programming
Function C programmingFunction C programming
Function C programming
 
Introduction to c programming
Introduction to c programmingIntroduction to c programming
Introduction to c programming
 
C language
C languageC language
C language
 
C++ Presentation
C++ PresentationC++ Presentation
C++ Presentation
 
C Programming Unit-1
C Programming Unit-1C Programming Unit-1
C Programming Unit-1
 
C Programming Language Step by Step Part 1
C Programming Language Step by Step Part 1C Programming Language Step by Step Part 1
C Programming Language Step by Step Part 1
 
C programming language
C programming languageC programming language
C programming language
 
Hands-on Introduction to the C Programming Language
Hands-on Introduction to the C Programming LanguageHands-on Introduction to the C Programming Language
Hands-on Introduction to the C Programming Language
 
C tutorial
C tutorialC tutorial
C tutorial
 
Fundamental of C Programming Language and Basic Input/Output Function
  Fundamental of C Programming Language and Basic Input/Output Function  Fundamental of C Programming Language and Basic Input/Output Function
Fundamental of C Programming Language and Basic Input/Output Function
 

En vedette (20)

Parallel computing(1)
Parallel computing(1)Parallel computing(1)
Parallel computing(1)
 
Parallel computing chapter 2
Parallel computing chapter 2Parallel computing chapter 2
Parallel computing chapter 2
 
Parallel computing chapter 3
Parallel computing chapter 3Parallel computing chapter 3
Parallel computing chapter 3
 
Bengali optical character recognition system
Bengali optical character recognition systemBengali optical character recognition system
Bengali optical character recognition system
 
Mediator pattern
Mediator patternMediator pattern
Mediator pattern
 
Clustering manual
Clustering manualClustering manual
Clustering manual
 
Observer pattern
Observer patternObserver pattern
Observer pattern
 
Parallel searching
Parallel searchingParallel searching
Parallel searching
 
Introduction To Parallel Computing
Introduction To Parallel ComputingIntroduction To Parallel Computing
Introduction To Parallel Computing
 
Higher nab preparation
Higher nab preparationHigher nab preparation
Higher nab preparation
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Map reduce
Map reduceMap reduce
Map reduce
 
Apache hadoop & map reduce
Apache hadoop & map reduceApache hadoop & map reduce
Apache hadoop & map reduce
 
R with excel
R with excelR with excel
R with excel
 
New microsoft office word 97 2003 document
New microsoft office word 97   2003 documentNew microsoft office word 97   2003 document
New microsoft office word 97 2003 document
 
Understanding Your Credit Report
Understanding Your Credit ReportUnderstanding Your Credit Report
Understanding Your Credit Report
 
Twitter
TwitterTwitter
Twitter
 
Big data
Big dataBig data
Big data
 
Strategy pattern.pdf
Strategy pattern.pdfStrategy pattern.pdf
Strategy pattern.pdf
 
Job search_resume
Job search_resumeJob search_resume
Job search_resume
 

Similaire à Parallel computing(2)

Parallel programming using MPI
Parallel programming using MPIParallel programming using MPI
Parallel programming using MPIAjit Nayak
 
High Performance Computing using MPI
High Performance Computing using MPIHigh Performance Computing using MPI
High Performance Computing using MPIAnkit Mahato
 
Parallel and Distributed Computing Chapter 10
Parallel and Distributed Computing Chapter 10Parallel and Distributed Computing Chapter 10
Parallel and Distributed Computing Chapter 10AbdullahMunir32
 
MPI Introduction
MPI IntroductionMPI Introduction
MPI IntroductionRohit Banga
 
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docxeugeniadean34240
 
I have come code already but I cant quite get the output rig.pdf
I have come code already but I cant quite get the output rig.pdfI have come code already but I cant quite get the output rig.pdf
I have come code already but I cant quite get the output rig.pdfkashishkochhar5
 
MPI message passing interface
MPI message passing interfaceMPI message passing interface
MPI message passing interfaceMohit Raghuvanshi
 
Message Passing Interface (MPI)-A means of machine communication
Message Passing Interface (MPI)-A means of machine communicationMessage Passing Interface (MPI)-A means of machine communication
Message Passing Interface (MPI)-A means of machine communicationHimanshi Kathuria
 
cs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdfcs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdfssuserada6a9
 

Similaire à Parallel computing(2) (20)

Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPI
 
Parallel programming using MPI
Parallel programming using MPIParallel programming using MPI
Parallel programming using MPI
 
High Performance Computing using MPI
High Performance Computing using MPIHigh Performance Computing using MPI
High Performance Computing using MPI
 
25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx
 
Parallel and Distributed Computing Chapter 10
Parallel and Distributed Computing Chapter 10Parallel and Distributed Computing Chapter 10
Parallel and Distributed Computing Chapter 10
 
MPI Introduction
MPI IntroductionMPI Introduction
MPI Introduction
 
MPI
MPIMPI
MPI
 
MPI
MPIMPI
MPI
 
My ppt hpc u4
My ppt hpc u4My ppt hpc u4
My ppt hpc u4
 
Lecture9
Lecture9Lecture9
Lecture9
 
hybrid-programming.pptx
hybrid-programming.pptxhybrid-programming.pptx
hybrid-programming.pptx
 
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
 
BASIC_MPI.ppt
BASIC_MPI.pptBASIC_MPI.ppt
BASIC_MPI.ppt
 
I have come code already but I cant quite get the output rig.pdf
I have come code already but I cant quite get the output rig.pdfI have come code already but I cant quite get the output rig.pdf
I have come code already but I cant quite get the output rig.pdf
 
Open MPI
Open MPIOpen MPI
Open MPI
 
Using MPI
Using MPIUsing MPI
Using MPI
 
MPI message passing interface
MPI message passing interfaceMPI message passing interface
MPI message passing interface
 
Lecture10
Lecture10Lecture10
Lecture10
 
Message Passing Interface (MPI)-A means of machine communication
Message Passing Interface (MPI)-A means of machine communicationMessage Passing Interface (MPI)-A means of machine communication
Message Passing Interface (MPI)-A means of machine communication
 
cs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdfcs556-2nd-tutorial.pdf
cs556-2nd-tutorial.pdf
 

Plus de Md. Mahedi Mahfuj

Plus de Md. Mahedi Mahfuj (16)

Advanced computer architecture
Advanced computer architectureAdvanced computer architecture
Advanced computer architecture
 
Matrix multiplication graph
Matrix multiplication graphMatrix multiplication graph
Matrix multiplication graph
 
Strategy pattern
Strategy patternStrategy pattern
Strategy pattern
 
Database management system chapter16
Database management system chapter16Database management system chapter16
Database management system chapter16
 
Database management system chapter15
Database management system chapter15Database management system chapter15
Database management system chapter15
 
Database management system chapter12
Database management system chapter12Database management system chapter12
Database management system chapter12
 
Strategies in job search process
Strategies in job search processStrategies in job search process
Strategies in job search process
 
Report writing(short)
Report writing(short)Report writing(short)
Report writing(short)
 
Report writing(long)
Report writing(long)Report writing(long)
Report writing(long)
 
Job search_interview
Job search_interviewJob search_interview
Job search_interview
 
Basic and logical implementation of r language
Basic and logical implementation of r language Basic and logical implementation of r language
Basic and logical implementation of r language
 
R language
R languageR language
R language
 
Chatbot Artificial Intelligence
Chatbot Artificial IntelligenceChatbot Artificial Intelligence
Chatbot Artificial Intelligence
 
Cloud testing v1
Cloud testing v1Cloud testing v1
Cloud testing v1
 
Distributed deadlock
Distributed deadlockDistributed deadlock
Distributed deadlock
 
Paper review
Paper review Paper review
Paper review
 

Dernier

Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 

Dernier (20)

Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 

Parallel computing(2)

  • 1. Introduction to Parallel Computing Part IIb
  • 2. What is MPI? Message Passing Interface (MPI) is a standardised interface. Using this interface, several implementations have been made. The MPI standard specifies three forms of subroutine interfaces: (1) Language independent notation; (2) Fortran notation; (3) C notation.
  • 3. MPI Features MPI implementations provide: • Abstraction of hardware implementation • Synchronous communication • Asynchronous communication • File operations • Time measurement operations
  • 4. Implementations MPICH Unix / Windows NT MPICH-T3E Cray T3E LAM Unix/SGI Irix/IBM AIX Chimp SunOS/AIX/Irix/HP-UX WinMPI Windows 3.1 (no network req.)
  • 5. Programming with MPI What is the difference between programming using the traditional approach and the MPI approach: 1. Use of MPI library 2. Compiling 3. Running
  • 6. Compiling (1) When a program is written, compiling it should be done a little bit different from the normal situation. Although details differ for various MPI implementations, there are two frequently used approaches.
  • 7. Compiling (2) First approach $ gcc myprogram.c –o myexecutable -lmpi Second approach $ mpicc myprogram.c –o myexecutable
  • 8. Running (1) In order to run an MPI-Enabled application we should generally use the command ‘mpirun’: $ mpirun –np x myexecutable <parameters> Where x is the number of processes to use, and <parameters> are the arguments to the Executable, if any.
  • 9. Running (2) The ‘mpirun’ program will take care of the creation of processes on selected processors. By default, ‘mpirun’ will decide which processors to use, this is usually determined by a global configuration file. It is possible to specify processors, but they may only be used as a hint.
  • 10. MPI Programming (1) Implementations of MPI support Fortran, C, or both. Here we only consider programming using the C Libraries. The first step in writing a program using MPI is to include the correct header: #include “mpi.h”
  • 11. MPI Programming (2) #include “mpi.h” int main (int argc, char *argv[]) { … MPI_Init(&argc, &argv); … MPI_Finalize(); return …; }
  • 12. MPI_Init int MPI_Init (int *argc, char ***argv) The MPI_Init procedure should be called before any other MPI procedure (except MPI_Initialized). It must be called exactly once, at program initialisation. If removes the arguments that are used by MPI from the argument array.
  • 13. MPI_Finalize int MPI_Finalize (void) This routine cleans up all MPI states. It should be the last MPI routine to be called in a program; no other MPI routine may be called after MPI_Finalize. Pending communication should be finished before finalisation.
  • 14. Using multiple processes When running an MPI enabled program using multiple processes, each process will run an identical copy of the program. So there must be a way to know which process we are. This situation is comparable to that of programming using the ‘fork’ statement. MPI defines two subroutines that can be used.
  • 15. MPI_Comm_size int MPI_Comm_size (MPI_Comm comm, int *size) This call returns the number of processes involved in a communicator. To find out how many processes are used in total, call this function with the predefined global communicator MPI_COMM_WORLD.
  • 16. MPI_Comm_rank int MPI_Comm_rank (MPI_Comm comm, int *rank) This procedure determines the rank (index) of the calling process in the communicator. Each process is assigned a unique number within a communicator.
  • 17. MPI_COMM_WORLD MPI communicators are used to specify to what processes communication applies to. A communicator is shared by a group of processes. The predefined MPI_COMM_WORLD applies to all processes. Communicators can be duplicated, created and deleted. For most application, use of MPI_COMM_WORLD suffices.
  • 18. Example ‘Hello World!’ #include <stdio.h> #include "mpi.h" int main (int argc, char *argv[]) { int size, rank; MPI_Init (&argc, &argv); MPI_Comm_size (MPI_COMM_WORLD, &size); MPI_Comm_rank (MPI_COMM_WORLD, &rank); printf ("Hello world! from processor (%d/%d)n", rank+1, size); MPI_Finalize(); return 0; }
  • 19. Running ‘Hello World!’ $ mpicc -o hello hello.c $ mpirun -np 3 hello Hello world! from processor (1/3) Hello world! from processor (2/3) Hello world! from processor (3/3) $ _
  • 20. MPI_Send int MPI_Send (void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm ) Synchronously sends a message to dest. Data is found in buf, that contains count elements of datatype. To identify the send, a tag has to be specified. The destination dest is the processor rank in communicator comm.
  • 21. MPI_Recv int MPI_Recv (void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) Synchronously receives a message from source. Buffer must be able to hold count elements of datatype. The status field is filled with status information. MPI_Recv and MPI_Send calls should match; equal tag, count, datatype.
  • 22. Datatypes MPI_CHAR signed char MPI_SHORT signed short int MPI_INT signed int MPI_LONG signed long int MPI_UNSIGNED_CHAR unsigned char MPI_UNSIGNED_SHORT unsigned short int MPI_UNSIGNED unsigned int MPI_UNSIGNED_LONG unsigned long int MPI_FLOAT float MPI_DOUBLE double MPI_LONG_DOUBLE long double (http://www-jics.cs.utk.edu/MPI/MPIguide/MPIguide.html)
  • 23. Example send / receive #include <stdio.h> #include "mpi.h" int main (int argc, char *argv[]) { MPI_Status s; int size, rank, i, j; MPI_Init (&argc, &argv); MPI_Comm_size (MPI_COMM_WORLD, &size); MPI_Comm_rank (MPI_COMM_WORLD, &rank); if (rank == 0) // Master process { printf ("Receiving data . . .n"); for (i = 1; i < size; i++) { MPI_Recv ((void *)&j, 1, MPI_INT, i, 0xACE5, MPI_COMM_WORLD, &s); printf ("[%d] sent %dn", i, j); } } else { j = rank * rank; MPI_Send ((void *)&j, 1, MPI_INT, 0, 0xACE5, MPI_COMM_WORLD); } MPI_Finalize(); return 0; }
  • 24. Running send / receive $ mpicc -o sendrecv sendrecv.c $ mpirun -np 4 sendrecv Receiving data . . . [1] sent 1 [2] sent 4 [3] sent 9 $ _
  • 25. MPI_Bcast int MPI_Bcast (void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm) Synchronously broadcasts a message from root, to all processors in communicator comm (including itself). Buffer is used as source in root processor, as destination in others.
  • 26. MPI_Barrier int MPI_Barrier (MPI_Comm comm) Blocks until all processes defined in comm have reached this routine. Use this routine to synchronize processes.
  • 27. Example broadcast / barrier int main (int argc, char *argv[]) { int rank, i; MPI_Init (&argc, &argv); MPI_Comm_rank (MPI_COMM_WORLD, &rank); if (rank == 0) i = 27; MPI_Bcast ((void *)&i, 1, MPI_INT, 0, MPI_COMM_WORLD); printf ("[%d] i = %dn", rank, i); // Wait for every process to reach this code MPI_Barrier (MPI_COMM_WORLD); MPI_Finalize(); return 0; }
  • 28. Running broadcast / barrier $ mpicc -o broadcast broadcast.c $ mpirun -np 3 broadcast [0] i = 27 [1] i = 27 [2] i = 27 $ _
  • 29. MPI_Sendrecv int MPI_Sendrecv (void *sendbuf, int sendcount, MPI_Datatype sendtype, int dest, int sendtag, void *recvbuf, int recvcount, MPI_Datatype recvtype, int source, int recvtag, MPI_Comm comm, MPI_Status *status) int MPI_Sendrecv_replace( void *buf, int count, MPI_Datatype datatype, int dest, int sendtag, int source, int recvtag, MPI_Comm comm, MPI_Status *status ) Send and receive (2nd, using only one buffer).
  • 30. Other useful routines • MPI_Scatter • MPI_Gather • MPI_Type_vector • MPI_Type_commit • MPI_Reduce / MPI_Allreduce • MPI_Op_create
  • 31. Example scatter / reduce int main (int argc, char *argv[]) { int data[] = {1, 2, 3, 4, 5, 6, 7}; // Size must be >= #processors int rank, i = -1, j = -1; MPI_Init (&argc, &argv); MPI_Comm_rank (MPI_COMM_WORLD, &rank); MPI_Scatter ((void *)data, 1, MPI_INT, (void *)&i , 1, MPI_INT, 0, MPI_COMM_WORLD); printf ("[%d] Received i = %dn", rank, i); MPI_Reduce ((void *)&i, (void *)&j, 1, MPI_INT, MPI_PROD, 0, MPI_COMM_WORLD); printf ("[%d] j = %dn", rank, j); MPI_Finalize(); return 0; }
  • 32. Running scatter / reduce $ mpicc -o scatterreduce scatterreduce.c $ mpirun -np 4 scatterreduce [0] Received i = 1 [0] j = 24 [1] Received i = 2 [1] j = -1 [2] Received i = 3 [2] j = -1 [3] Received i = 4 [3] j = -1 $ _
  • 33. Some reduce operations MPI_MAX Maximum value MPI_MIN Minimum value MPI_SUM Sum of values MPI_PROD Product of values MPI_LAND Logical AND MPI_BAND Boolean AND MPI_LOR Logical OR MPI_BOR Boolean OR MPI_LXOR Logical Exclusive OR MPI_BXOR Boolean Exclusive OR
  • 34. Measuring running time double MPI_Wtime (void); double timeStart, timeEnd; ... timeStart = MPI_Wtime(); // Code to measure time for goes here. timeEnd = MPI_Wtime() ... printf (“Running time = %f secondsn”, timeEnd – timeStart);
  • 35. Parallel sorting (1) Sorting an sequence of numbers using the binary–sort method. This method divides a given sequence into two halves (until only one element remains) and sorts both halves recursively. The two halves are then merged together to form a sorted sequence.
  • 36. Binary sort pseudo-code sorted-sequence BinarySort (sequence) { if (# elements in sequence > 1) { seqA = first half of sequence seqB = second half of sequence BinarySort (seqA); BinarySort (seqB); sorted-sequence = merge (seqA, seqB); } else sorted-sequence = sequence }
  • 37. Merge two sorted sequences 1 2 5 7 3 4 6 8 1 2 3 4 5 6 7 8
  • 38. Example binary – sort 1 2 7 3 5 4 2 5 8 6 4 7 6 8 3 1 2 7 5 7 2 3 8 4 6 8 3 1 7 2 5 5 2 4 8 8 4 3 6 6 3 1 7 5 2 8 4 6 3
  • 39. Parallel sorting (2) This way of dividing work and gathering the results is a quite natural way to use for a parallel implementation. Divide work in two to two processors. Have each of these processors divide their work again, until either no data can be split again or no processors are available anymore.
  • 40. Implementation problems • Number of processors may not be a power of two • Number of elements may not be a power of two • How to achieve an even workload? • Data size is less than number of processors
  • 41. Parallel matrix multiplication We use the following partitioning of data (p=4) P1 P1 P2 P2 P3 P3 P4 P4
  • 42. Implementation 1. Master (process 0) reads data 2. Master sends size of data to slaves 3. Slaves allocate memory 4. Master broadcasts second matrix to all other processes 5. Master sends respective parts of first matrix to all other processes 6. Every process performs its local multiplication 7. All slave processes send back their result.
  • 43. Multiplication 1000 x 1000 1000 x 1000 Matrix multiplication 140 120 100 80 Time (s) 60 40 20 0 0 10 20 30 40 50 60 Processors Tp T1 / p
  • 44. Multiplication 5000 x 5000 5000 x 5000 Matrix multiplication 90000 80000 70000 60000 Time (s) 50000 40000 30000 20000 10000 0 0 5 10 15 20 25 30 35 Processors Tp T1 / p
  • 45. Gaussian elimination We use the following partitioning of data (p=4) P1 P1 P2 P2 P3 P3 P4 P4
  • 46. Implementation (1) 1. Master reads both matrices 2. Master sends size of matrices to slaves 3. Slaves calculate their part and allocate memory 4. Master sends each slave its respective part 5. Set sweeping row to 0 in all processes 6. Sweep matrix (see next sheet) 7. Slave send back their result
  • 47. Implementation (2) While sweeping row not past final row do A. Have every process decide whether they own the current sweeping row B. The owner sends a copy of the row to every other process C. All processes sweep their part of the matrix using the current row D. Sweeping row is incremented
  • 48. Programming hints • Keep it simple! • Avoid deadlocks • Write robust code even at cost of speed • Design in advance, debugging is more difficult (printing output is different) • Error handing requires synchronisation, you can’t just exit the program.
  • 49. References (1) MPI Forum Home Page http://www.mpi-forum.org/index.html Beginners guide to MPI (see also /MPI/) http://www-jics.cs.utk.edu/MPI/MPIguide/MPIguide.html MPICH http://www-unix.mcs.anl.gov/mpi/mpich/