SlideShare une entreprise Scribd logo
1  sur  13
Télécharger pour lire hors ligne
Parallel Programming

       By Roman Okolovich
Overview
   Traditionally, computer software has been written for serial
    computation. To solve a problem, an algorithm is constructed
    and implemented as a serial stream of instructions. These
    instructions are executed on a central processing unit on one
    computer. Only one instruction may execute at a time—after
    that instruction is finished, the next is executed.
   Nowadays one single machine (PC) can have multi-core
    and/or multi-processor computer architecture.
   A multiprocessor computer architecture where two or more
    identical processors can connect to a single shared main
    memory. Most common multiprocessor systems today use an
    SMP (symmetric multiprocessing) architecture. In the case of
    multi-core processors, the SMP architecture applies to the
    cores, treating them as separate processors.
Speedup
   The amount of performance gained by
    the use of a multi-core processor is
    strongly dependent on the software
    algorithms and implementation. In
    particular, the possible gains are limited
    by the fraction of the software that can
    be "parallelized" to run on multiple cores
    simultaneously; this effect is described
    by Amdahl's law. In the best case, so-
    called embarrassingly parallel problems
    may realize speedup factors near the
    number of cores. Many typical
    applications, however, do not realize
    such large speedup factors and thus,
    the parallelization of software is a
    significant on-going topic of research.
Intel Atom
   Nokia Booklet 3G - Intel® Atom™ Z530, 1.6 GHz
   Intel Atom is the brand name for a line of ultra-low-voltage x86
    and x86-64 CPUs (or microprocessors) from Intel, designed in 45
    nm CMOS and used mainly in Netbooks, Nettops and MIDs.
   Intel Atom can execute up to two instructions per cycle. The
    performance of a single core Atom is equal to around half that
    offered by an equivalent Celeron.
   Hyper-threading (officially termed Hyper-Threading Technology
    or HTT) is an Intel-proprietary technology used to improve
    parallelization of computations (doing multiple tasks at once)
    performed on PC microprocessors.
   A processor with hyper-threading enabled is treated by the
    operating system as two processors instead of one. This means
    that only one processor is physically present but the operating
    system sees two virtual processors, and shares the workload
    between them.
   The advantages of hyper-threading are listed as: improved
    support for multi-threaded code, allowing multiple threads to run
    simultaneously, improved reaction and response time.
Instruction level parallelism
   Instruction-level parallelism (ILP) is a measure of how
    many of the operations in a computer program can be
    performed simultaneously. Consider the following
    program:
   1. e = a + b
    2. f = c + d
    3. g = e * f
   Operation 3 depends on the results of operations 1 and
    2, so it cannot be calculated until both of them are
    completed. However, operations 1 and 2 do not depend
    on any other operation, so they can be calculated
    simultaneously. (See also: Data dependency) If we
    assume that each operation can be completed in one unit
    of time then these three instructions can be completed in
    a total of two units of time, giving an ILP of 3/2.
Qt 4's Multithreading
   Qt provides thread support in the form of platform-independent threading
    classes, a thread-safe way of posting events, and signal-slot connections
    across threads. This makes it easy to develop portable multithreaded Qt
    applications and take advantage of multiprocessor machines.
       QThread provides the means to start a new thread.
       QThreadStorage provides per-thread data storage.
       QThreadPool manages a pool of threads that run QRunnable objects.
       QRunnable is an abstract class representing a runnable object.
       QMutex provides a mutual exclusion lock, or mutex.
       QMutexLocker is a convenience class that automatically locks and unlocks a
        QMutex.
       QReadWriteLock provides a lock that allows simultaneous read access.
       QReadLocker and QWriteLocker are convenience classes that automatically lock
        and unlock a QReadWriteLock.
       QSemaphore provides an integer semaphore (a generalization of a mutex).
       QWaitCondition provides a way for threads to go to sleep until woken up by
        another thread.
       QAtomicInt provides atomic operations on integers.
       QAtomicPointer provides atomic operations on pointers.
OpenMP
   The OpenMP Application Program Interface (API) supports multi-platform
    shared-memory parallel programming in C/C++ and Fortran on all
    architectures, including Unix platforms and Windows NT platforms.
   OpenMP is a portable, scalable model that gives shared-memory parallel
    programmers a simple and flexible interface for developing parallel
    applications for platforms ranging from the desktop to the supercomputer.
   The designers of OpenMP wanted to provide an easy method to thread
    applications without requiring that the programmer know how to create,
    synchronize, and destroy threads or even requiring him or her to determine
    how many threads to create. To achieve these ends, the OpenMP designers
    developed a platform-independent set of compiler pragmas, directives,
    function calls, and environment variables that explicitly instruct the compiler
    how and where to insert threads into the application.
   Most loops can be threaded by inserting only one pragma right before the
    loop. Further, by leaving the nitty-gritty details to the compiler and OpenMP,
    you can spend more time determining which loops should be threaded and
    how to best restructure the algorithms for maximum performance.
OpenMP Example                                   • OpenMP places the following five restrictions on
#include <omp.h>                                      which loops can be threaded:
#include <stdio.h>                                       • The loop variable must be of type signed
int main() {                                               integer. Unsigned integers, such as
#pragma omp parallel                                       DWORD's, will not work.
printf("Hello from thread %d, nthreads %dn",            • The comparison operation must be in the
    omp_get_thread_num(), omp_get_num_threads());          form loop_variable <, <=, >, or >=
}                                                          loop_invariant_integer
                                                         • The third expression or increment portion of
//-------------------------------------------              the for loop must be either integer addition
#pragma omp parallel shared(n,a,b)                         or integer subtraction and by a loop
{                                                          invariant value.
  #pragma omp for                                        • If the comparison operation is < or <=, the
  for (int i=0; i<n; i++)                                  loop variable must increment on every
  {                                                        iteration, and conversely, if the comparison
   a[i] = i + 1;                                           operation is > or >=, the loop variable must
   #pragma omp parallel for                                decrement on every iteration.
   /*-- Okay - This is a parallel region --*/            • The loop must be a basic block, meaning
   for (int j=0; j<n; j++)                                 no jumps from the inside of the loop to the
    b[i][j] = a[i];                                        outside are permitted with the exception of
  }
                                                           the exit statement, which terminates the
} /*-- End of parallel region --*/
                                                           whole application. If the statements goto or
//-------------------------------------------
                                                           break are used, they must jump within the
#pragma omp parallel for
                                                           loop, not outside it. The same goes for
for (i=0; i < numPixels; i++)
                                                           exception handling; exceptions must be
{                                                          caught within the loop.
   pGrayScaleBitmap[i] = (unsigned BYTE)
            (pRGBBitmap[i].red * 0.299 +
             pRGBBitmap[i].green * 0.587 +
             pRGBBitmap[i].blue * 0.114);
}
OpenMP and Visual Studio
Intel Threading Building Blocks (TBB)
   Intel® Threading Building Blocks (Intel® TBB) is an award-winning C++ template
    library that abstracts threads to tasks to create reliable, portable, and scalable
    parallel applications. Just as the C++ Standard Template Library (STL) extends the
    core language, Intel TBB offers C++ users a higher level abstraction for parallelism.
    To implement Intel TBB, developers use familiar C++ templates and coding style,
    leaving low-level threading details to the library. It is also portable between
    architectures and operating systems.
   Intel® TBB for Windows (Linux, Mac OS) costs $299 per sit.


     #include   <iostream>
     #include   <string>
     #include   “tbb/parallel_for.h”
     #include   “tbb/blocked_range.h”
     using namespace tbb;
     using namespace std;
     int main() {
       //...
       parallel_for(blocked_range<size_t>(0, to_scan.size() ),
                    SubStringFinder( to_scan, max, pos ));
       //...
       return 0;
     }
Parallel Pattern Library (PPL)
   The Concurrency Runtime is a concurrent programming framework for C++.
    The Concurrency Runtime simplifies parallel programming and helps you
    write robust, scalable, and responsive parallel applications.
   The features that the Concurrency Runtime provides are unified by a
    common work scheduler. This work scheduler implements a work-stealing
    algorithm that enables your application to scale as the number of available
    processors increases.
   The Concurrency Runtime enables the following programming patterns and
    concepts:
       Imperative data parallelism: Parallel algorithms distribute computations on
        collections or on sets of data across multiple processors.
       Task parallelism: Task objects distribute multiple independent operations across
        processors.
       Declarative data parallelism: Asynchronous agents and message passing enable
        you to declare what computation has to be performed, but not how it is performed.
       Asynchrony: Asynchronous agents make productive use of latency by doing work
        while waiting for data.
   The Concurrency Runtime is provided as part of the C Runtime Library
    (CRT).
   Only Visual Studio 2010 supports PPL
Concurrency Runtime Architecture
   The Concurrency Runtime is divided into four components: the
    Parallel Patterns Library (PPL), the Asynchronous Agents Library,
    the work scheduler, and the resource manager. These components
    reside between the operating system and applications. The
    following illustration shows how the Concurrency Runtime
    components interact among the operating system and applications:
                               struct LongRunningOperationMsg{
                                       LongRunningOperationMsg (int x, int y)
                                       : m_x(x),m_y(y){}
                                       int m_x;
                                       int m_y;
                               }
                               call<LongRunningOperationMsg>*
                                LongRunningOperationCall = new
                                  call<LongRunningOperationMsg>([](
                                LongRunningOperationMsg msg)
                                {
                                 LongRunningOperation(msg.x, msg.y);
                                })
                               void SomeFunction(int x, int y){
                                   asend(LongRunningOperationCall,
                                         LongRunningOperationMsg(x,y));
                               }
References
   Parallel computing
   Superscalar
   Simultaneous multithreading
   Hyper-threading
   Thread Support in Qt
   OpenMP
   Intel: Getting Started with OpenMP
   Intel® Threading Building Blocks (Intel® TBB)
   Intel® Threading Building Blocks 2.2 for Open Source
   Concurrency Runtime Library
   Four Ways to Use the Concurrency Runtime in Your C++
    Projects
   Parallel Programming in Native Code blog

Contenu connexe

Tendances

Parallelization using open mp
Parallelization using open mpParallelization using open mp
Parallelization using open mpranjit banshpal
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Akhila Prabhakaran
 
Open mp library functions and environment variables
Open mp library functions and environment variablesOpen mp library functions and environment variables
Open mp library functions and environment variablesSuveeksha
 
Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open MpAnshul Sharma
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingRuymán Reyes
 
.Net Multithreading and Parallelization
.Net Multithreading and Parallelization.Net Multithreading and Parallelization
.Net Multithreading and ParallelizationDmitri Nesteruk
 
Suyash Thesis Presentation
Suyash Thesis PresentationSuyash Thesis Presentation
Suyash Thesis PresentationTanvee Katyal
 
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...Seunghwa Song
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-optJeff Larkin
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Intel Software Brasil
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimizationliu_ming50
 
Automatic Generation of Peephole Superoptimizers
Automatic Generation of Peephole SuperoptimizersAutomatic Generation of Peephole Superoptimizers
Automatic Generation of Peephole Superoptimizerskeanumit
 

Tendances (20)

Open mp intro_01
Open mp intro_01Open mp intro_01
Open mp intro_01
 
Parallelization using open mp
Parallelization using open mpParallelization using open mp
Parallelization using open mp
 
OpenMP And C++
OpenMP And C++OpenMP And C++
OpenMP And C++
 
MPI n OpenMP
MPI n OpenMPMPI n OpenMP
MPI n OpenMP
 
openmp
openmpopenmp
openmp
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
OpenMp
OpenMpOpenMp
OpenMp
 
Open mp library functions and environment variables
Open mp library functions and environment variablesOpen mp library functions and environment variables
Open mp library functions and environment variables
 
Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
 
Openmp
OpenmpOpenmp
Openmp
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
 
.Net Multithreading and Parallelization
.Net Multithreading and Parallelization.Net Multithreading and Parallelization
.Net Multithreading and Parallelization
 
Suyash Thesis Presentation
Suyash Thesis PresentationSuyash Thesis Presentation
Suyash Thesis Presentation
 
CFD - OpenFOAM
CFD - OpenFOAMCFD - OpenFOAM
CFD - OpenFOAM
 
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimization
 
Automatic Generation of Peephole Superoptimizers
Automatic Generation of Peephole SuperoptimizersAutomatic Generation of Peephole Superoptimizers
Automatic Generation of Peephole Superoptimizers
 

En vedette

process models- software engineering
process models- software engineeringprocess models- software engineering
process models- software engineeringArun Nair
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processingPage Maker
 
Agile vs Iterative vs Waterfall models
Agile vs Iterative vs Waterfall models Agile vs Iterative vs Waterfall models
Agile vs Iterative vs Waterfall models Marraju Bollapragada V
 
List of Software Development Model and Methods
List of Software Development Model and MethodsList of Software Development Model and Methods
List of Software Development Model and MethodsRiant Soft
 

En vedette (8)

code analysis for c++
code analysis for c++code analysis for c++
code analysis for c++
 
SDLC, Iterative Model
SDLC, Iterative ModelSDLC, Iterative Model
SDLC, Iterative Model
 
process models- software engineering
process models- software engineeringprocess models- software engineering
process models- software engineering
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
 
Agile vs Iterative vs Waterfall models
Agile vs Iterative vs Waterfall models Agile vs Iterative vs Waterfall models
Agile vs Iterative vs Waterfall models
 
Software Development Life Cycle (SDLC)
Software Development Life Cycle (SDLC)Software Development Life Cycle (SDLC)
Software Development Life Cycle (SDLC)
 
Process models
Process modelsProcess models
Process models
 
List of Software Development Model and Methods
List of Software Development Model and MethodsList of Software Development Model and Methods
List of Software Development Model and Methods
 

Similaire à Parallel Programming

Similaire à Parallel Programming (20)

Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx
 
Chap7 slides
Chap7 slidesChap7 slides
Chap7 slides
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdac
 
Parllelizaion
ParllelizaionParllelizaion
Parllelizaion
 
Matlab ppt
Matlab pptMatlab ppt
Matlab ppt
 
OpenMP
OpenMPOpenMP
OpenMP
 
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMPAlgoritmi e Calcolo Parallelo 2012/2013 - OpenMP
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
Buffer overflow tutorial
Buffer overflow tutorialBuffer overflow tutorial
Buffer overflow tutorial
 
openmpfinal.pdf
openmpfinal.pdfopenmpfinal.pdf
openmpfinal.pdf
 
OpenMP.pptx
OpenMP.pptxOpenMP.pptx
OpenMP.pptx
 
Threads
ThreadsThreads
Threads
 
CS4961-L9.ppt
CS4961-L9.pptCS4961-L9.ppt
CS4961-L9.ppt
 
Lecture6
Lecture6Lecture6
Lecture6
 
Go1
Go1Go1
Go1
 
openmp final2.pptx
openmp final2.pptxopenmp final2.pptx
openmp final2.pptx
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 

Plus de Roman Okolovich

Plus de Roman Okolovich (10)

Unit tests and TDD
Unit tests and TDDUnit tests and TDD
Unit tests and TDD
 
C# XML documentation
C# XML documentationC# XML documentation
C# XML documentation
 
Using QString effectively
Using QString effectivelyUsing QString effectively
Using QString effectively
 
Ram Disk
Ram DiskRam Disk
Ram Disk
 
64 bits for developers
64 bits for developers64 bits for developers
64 bits for developers
 
Virtual Functions
Virtual FunctionsVirtual Functions
Virtual Functions
 
Visual Studio 2008 Overview
Visual Studio 2008 OverviewVisual Studio 2008 Overview
Visual Studio 2008 Overview
 
State Machine Framework
State Machine FrameworkState Machine Framework
State Machine Framework
 
The Big Three
The Big ThreeThe Big Three
The Big Three
 
Smart Pointers
Smart PointersSmart Pointers
Smart Pointers
 

Dernier

UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 

Dernier (20)

UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 

Parallel Programming

  • 1. Parallel Programming By Roman Okolovich
  • 2. Overview  Traditionally, computer software has been written for serial computation. To solve a problem, an algorithm is constructed and implemented as a serial stream of instructions. These instructions are executed on a central processing unit on one computer. Only one instruction may execute at a time—after that instruction is finished, the next is executed.  Nowadays one single machine (PC) can have multi-core and/or multi-processor computer architecture.  A multiprocessor computer architecture where two or more identical processors can connect to a single shared main memory. Most common multiprocessor systems today use an SMP (symmetric multiprocessing) architecture. In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors.
  • 3. Speedup  The amount of performance gained by the use of a multi-core processor is strongly dependent on the software algorithms and implementation. In particular, the possible gains are limited by the fraction of the software that can be "parallelized" to run on multiple cores simultaneously; this effect is described by Amdahl's law. In the best case, so- called embarrassingly parallel problems may realize speedup factors near the number of cores. Many typical applications, however, do not realize such large speedup factors and thus, the parallelization of software is a significant on-going topic of research.
  • 4. Intel Atom  Nokia Booklet 3G - Intel® Atom™ Z530, 1.6 GHz  Intel Atom is the brand name for a line of ultra-low-voltage x86 and x86-64 CPUs (or microprocessors) from Intel, designed in 45 nm CMOS and used mainly in Netbooks, Nettops and MIDs.  Intel Atom can execute up to two instructions per cycle. The performance of a single core Atom is equal to around half that offered by an equivalent Celeron.  Hyper-threading (officially termed Hyper-Threading Technology or HTT) is an Intel-proprietary technology used to improve parallelization of computations (doing multiple tasks at once) performed on PC microprocessors.  A processor with hyper-threading enabled is treated by the operating system as two processors instead of one. This means that only one processor is physically present but the operating system sees two virtual processors, and shares the workload between them.  The advantages of hyper-threading are listed as: improved support for multi-threaded code, allowing multiple threads to run simultaneously, improved reaction and response time.
  • 5. Instruction level parallelism  Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be performed simultaneously. Consider the following program:  1. e = a + b 2. f = c + d 3. g = e * f  Operation 3 depends on the results of operations 1 and 2, so it cannot be calculated until both of them are completed. However, operations 1 and 2 do not depend on any other operation, so they can be calculated simultaneously. (See also: Data dependency) If we assume that each operation can be completed in one unit of time then these three instructions can be completed in a total of two units of time, giving an ILP of 3/2.
  • 6. Qt 4's Multithreading  Qt provides thread support in the form of platform-independent threading classes, a thread-safe way of posting events, and signal-slot connections across threads. This makes it easy to develop portable multithreaded Qt applications and take advantage of multiprocessor machines.  QThread provides the means to start a new thread.  QThreadStorage provides per-thread data storage.  QThreadPool manages a pool of threads that run QRunnable objects.  QRunnable is an abstract class representing a runnable object.  QMutex provides a mutual exclusion lock, or mutex.  QMutexLocker is a convenience class that automatically locks and unlocks a QMutex.  QReadWriteLock provides a lock that allows simultaneous read access.  QReadLocker and QWriteLocker are convenience classes that automatically lock and unlock a QReadWriteLock.  QSemaphore provides an integer semaphore (a generalization of a mutex).  QWaitCondition provides a way for threads to go to sleep until woken up by another thread.  QAtomicInt provides atomic operations on integers.  QAtomicPointer provides atomic operations on pointers.
  • 7. OpenMP  The OpenMP Application Program Interface (API) supports multi-platform shared-memory parallel programming in C/C++ and Fortran on all architectures, including Unix platforms and Windows NT platforms.  OpenMP is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer.  The designers of OpenMP wanted to provide an easy method to thread applications without requiring that the programmer know how to create, synchronize, and destroy threads or even requiring him or her to determine how many threads to create. To achieve these ends, the OpenMP designers developed a platform-independent set of compiler pragmas, directives, function calls, and environment variables that explicitly instruct the compiler how and where to insert threads into the application.  Most loops can be threaded by inserting only one pragma right before the loop. Further, by leaving the nitty-gritty details to the compiler and OpenMP, you can spend more time determining which loops should be threaded and how to best restructure the algorithms for maximum performance.
  • 8. OpenMP Example • OpenMP places the following five restrictions on #include <omp.h> which loops can be threaded: #include <stdio.h> • The loop variable must be of type signed int main() { integer. Unsigned integers, such as #pragma omp parallel DWORD's, will not work. printf("Hello from thread %d, nthreads %dn", • The comparison operation must be in the omp_get_thread_num(), omp_get_num_threads()); form loop_variable <, <=, >, or >= } loop_invariant_integer • The third expression or increment portion of //------------------------------------------- the for loop must be either integer addition #pragma omp parallel shared(n,a,b) or integer subtraction and by a loop { invariant value. #pragma omp for • If the comparison operation is < or <=, the for (int i=0; i<n; i++) loop variable must increment on every { iteration, and conversely, if the comparison a[i] = i + 1; operation is > or >=, the loop variable must #pragma omp parallel for decrement on every iteration. /*-- Okay - This is a parallel region --*/ • The loop must be a basic block, meaning for (int j=0; j<n; j++) no jumps from the inside of the loop to the b[i][j] = a[i]; outside are permitted with the exception of } the exit statement, which terminates the } /*-- End of parallel region --*/ whole application. If the statements goto or //------------------------------------------- break are used, they must jump within the #pragma omp parallel for loop, not outside it. The same goes for for (i=0; i < numPixels; i++) exception handling; exceptions must be { caught within the loop. pGrayScaleBitmap[i] = (unsigned BYTE) (pRGBBitmap[i].red * 0.299 + pRGBBitmap[i].green * 0.587 + pRGBBitmap[i].blue * 0.114); }
  • 10. Intel Threading Building Blocks (TBB)  Intel® Threading Building Blocks (Intel® TBB) is an award-winning C++ template library that abstracts threads to tasks to create reliable, portable, and scalable parallel applications. Just as the C++ Standard Template Library (STL) extends the core language, Intel TBB offers C++ users a higher level abstraction for parallelism. To implement Intel TBB, developers use familiar C++ templates and coding style, leaving low-level threading details to the library. It is also portable between architectures and operating systems.  Intel® TBB for Windows (Linux, Mac OS) costs $299 per sit. #include <iostream> #include <string> #include “tbb/parallel_for.h” #include “tbb/blocked_range.h” using namespace tbb; using namespace std; int main() { //... parallel_for(blocked_range<size_t>(0, to_scan.size() ), SubStringFinder( to_scan, max, pos )); //... return 0; }
  • 11. Parallel Pattern Library (PPL)  The Concurrency Runtime is a concurrent programming framework for C++. The Concurrency Runtime simplifies parallel programming and helps you write robust, scalable, and responsive parallel applications.  The features that the Concurrency Runtime provides are unified by a common work scheduler. This work scheduler implements a work-stealing algorithm that enables your application to scale as the number of available processors increases.  The Concurrency Runtime enables the following programming patterns and concepts:  Imperative data parallelism: Parallel algorithms distribute computations on collections or on sets of data across multiple processors.  Task parallelism: Task objects distribute multiple independent operations across processors.  Declarative data parallelism: Asynchronous agents and message passing enable you to declare what computation has to be performed, but not how it is performed.  Asynchrony: Asynchronous agents make productive use of latency by doing work while waiting for data.  The Concurrency Runtime is provided as part of the C Runtime Library (CRT).  Only Visual Studio 2010 supports PPL
  • 12. Concurrency Runtime Architecture  The Concurrency Runtime is divided into four components: the Parallel Patterns Library (PPL), the Asynchronous Agents Library, the work scheduler, and the resource manager. These components reside between the operating system and applications. The following illustration shows how the Concurrency Runtime components interact among the operating system and applications: struct LongRunningOperationMsg{ LongRunningOperationMsg (int x, int y) : m_x(x),m_y(y){} int m_x; int m_y; } call<LongRunningOperationMsg>* LongRunningOperationCall = new call<LongRunningOperationMsg>([]( LongRunningOperationMsg msg) { LongRunningOperation(msg.x, msg.y); }) void SomeFunction(int x, int y){ asend(LongRunningOperationCall, LongRunningOperationMsg(x,y)); }
  • 13. References  Parallel computing  Superscalar  Simultaneous multithreading  Hyper-threading  Thread Support in Qt  OpenMP  Intel: Getting Started with OpenMP  Intel® Threading Building Blocks (Intel® TBB)  Intel® Threading Building Blocks 2.2 for Open Source  Concurrency Runtime Library  Four Ways to Use the Concurrency Runtime in Your C++ Projects  Parallel Programming in Native Code blog