SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
OpenCL – The Open Standard for Heterogeneous
            Parallel Programming

            Dr.-Ing. Achim Basermann
          Deutsches Zentrum für Luft- und Raumfahrt e.V.
 Abteilung Verteilte Systeme und Komponentensoftware, Köln-Porz


                                                                                     Folie 1
                                                   20090714-1 TechTalk OpenCL SC-VK Basermann
Survey

  Motivation
       Trends in parallel processing
       Role of OpenCL

  Khronos´ OpenCL
       Platform model
       Execution model
       Memory model
       Programming model

  Conclusions


      Material used: OpenCL survey and specification from http://www.khronos.org/opencl/


                                                                                                       Folie 2
                                                                    20090714-1 TechTalk OpenCL SC-VK Basermannt
Motivation: Trends in Parallel Processing, HW

Trend #1: Multicore processor chips
     Maintain (or even reduce) frequency while replicating cores

Trend #2: Accelerators (GPGPUs, e.g.)
     Previously, processors would “catch” up with accelerator function in the
     next generation
          Accelerator design expense not amortized well
     New accelerator designs more likely to maintain performance advantage
          And will maintain an enormous power advantage for target workloads

Trend #2b: Heterogeneous multicore in general (IBM Cell, e.g.)
     Mixes of powerful cores, smaller cores, and accelerators potentially offer
     the most efficient nodes
     The challenge is harnessing them efficiently



                                                                                               Folie 3
                                                            20090714-1 TechTalk OpenCL SC-VK Basermannt
Motivation: Programming Issues

Many cores per node, and accelerators/heterogeneity

Future performance gains will come via massive parallelism (not clock speed)
     An unwelcome situation for HPC apps!

Need new programming models to exploit

At the system/cluster level:
     Message-passing to connect node-level languages, or
     Global addressing to make communication implicit?




                                                                                              Folie 4
                                                           20090714-1 TechTalk OpenCL SC-VK Basermannt
Motivation: Classic Programming Models




                                                                            Folie 5
                                         20090714-1 TechTalk OpenCL SC-VK Basermannt
Motivation: OpenCL
New open standard that specifically addresses parallel compute accelerators

Extension to C

Provides data parallel and task parallel models

Facilitates natural transition from the growing number of CUDA (Compute
Unified Device Architecture, NVIDIA) programs

Porting of Cell applications to a standard model

Play wells with MPI

Can interoperate with Fortran and OpenMP


                                                                                            Folie 6
                                                         20090714-1 TechTalk OpenCL SC-VK Basermannt
Motivation: Roles of OpenCL




                                                                 Folie 7
                              20090714-1 TechTalk OpenCL SC-VK Basermannt
Motivation: Before OpenCL




                                                               Folie 8
                            20090714-1 TechTalk OpenCL SC-VK Basermannt
Motivation: The promise of OpenCL




                                                                       Folie 9
                                    20090714-1 TechTalk OpenCL SC-VK Basermannt
: OpenCL

         CPUs                                                     GPUs
   Multiple cores driving           Emerging              Increasingly general purpose
   performance increases                                     data-parallel computing
                                   Intersection          Improving numerical precision



                                   OpenCL
                                 Heterogenous
                                  Computing
Multi-processor                                                     Graphics APIs
programming –                                                        and Shading
 e.g. OpenMP                                                          Languages




                  OpenCL ––Open Computing Language
                  OpenCL Open Computing Language
    Open, royalty-free standard for portable, parallel programming of heterogeneous
     Open, royalty-free standard for portable, parallel programming of heterogeneous
                 parallel computing CPUs, GPUs, and other processors
                   parallel computing CPUs, GPUs, and other processors
                                                                                                                 Folie 10
                                                                               20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL Working Group
  Diverse industry participation
    Processor vendors, system OEMs, middleware vendors, application developers
  Many industry-leading experts involved in OpenCL’s design
    A healthy diversity of industry perspectives
  Apple initially proposed and is very active in the working group
    Serving as specification editor

Here are some of the other companies in the OpenCL working group




                                                                                                 Folie 11
                                                               20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL Timeline
      Six months from proposal to released specification
         Due to a strong initial proposal and a shared commercial incentive to work quickly
      Apple’s Mac OS X Snow Leopard will include OpenCL
         Improving speed and responsiveness for a wide spectrum of applications
      Multiple OpenCL implementations expected in the next 12 months
         On diverse platforms
                                      OpenCL
 Apple works                      working group                          Khronos
with AMD, Intel,                  develops draft                     publicly releases
  NVIDIA and                        into cross-                         OpenCL as
others on draft                       vendor                            royalty-free
   proposal                        specification                       specification
                   Jun08                               Oct08                                  May09
                                                                         Dec08
              Apple proposes                       Working Group                            Khronos to
              OpenCL working                            sends                                 release
                 group and                         completed draft                         conformance
              contributes draft                      to Khronos                           tests to ensure
               specification to                       Board for                             high-quality
                  Khronos                            Ratification                        implementations
                                                                                                                       Folie 12
                                                                                     20090714-1 TechTalk OpenCL SC-VK Basermannt
Silicon Community
OpenCL: Part of the Khronos API Ecosystem
                             Desktop 3D
                             Ecosystem                Parallel computing and
                                                visualization in scientific and                       Software Community
  3D Asset Interchange
  Format                                              consumer applications
                                                                                              Streamlined APIs for mobile and
                                                                                               embedded graphics, media and
                Cross platform desktop 3D        OpenCL                                                 compute acceleration
                                                    Heterogeneous
                                                   Parallel Computing

                                                                                   Embedded 3D

                                                         Streaming Media and
                                                          Image Processing
                                                                                                                    Vector 2D
       OpenCL is at the center of an
       emerging visual computing                                           Enhanced Audio
       ecosystem that includes 3D
       graphics, video and image                                                                           Surface and
                                                                                                         synch abstraction
       processing on desktop,
       embedded and mobile systems

                                                                                            Integrated Mixed-media Stack
                                            Umbrella specifications define
                                            coherent acceleration stacks for
                                            mobile application portability            Mobile OS Abstraction

                                                                                                                                         Folie 13
                                                                                                       20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL: Platform Model




  One Host + one or more Compute Devices
     Each Compute Device is composed of one or more Compute Units
        Each Compute Unit is further divided into one or more Processing Elements


                                                                                                Folie 14
                                                              20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL: Execution Model
OpenCL Program:
    Kernels
         Basic unit of executable code — similar to C functions, CUDA kernels, etc.
         Data-parallel or task-parallel

    Host Program
         Collection of compute kernels and internal functions
         Analogous to a dynamic library

Kernel Execution
    The host program invokes a kernel over an index space called an NDRange
         NDRange, “N-Dimensional Range”, can be a 1D, 2D, or 3D space
    A single kernel instance at a point in the index space is called a work-item
          Work-items have unique global IDs from the index space
          CUDA: thread IDs
    Work-items are further grouped into work-groups
         Work-groups have a unique work-group ID
         Work-items have a unique local ID within a work-group
         CUDA: Block IDs
                                                                                                       Folie 15
                                                                     20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL: Execution Model,
example 2D NDRange




     Total number of work-items = Gx * Gy
     Size of each work-group = Sx * Sy
     Global ID can be computed from work-group ID and local ID

                                                                                                   Folie 16
                                                                 20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL: Execution Model
   Contexts are used to contain and manage the state of the “world”

   Kernels are executed in contexts defined and manipulated by the host
      Devices
      Kernels - OpenCL functions
      Program objects - kernel source and executable
      Memory objects

   Command-queue - coordinates execution of kernels
      Kernel execution commands
      Memory commands: Transfer or map memory object data
      Synchronization commands: Constrain the order of commands

   Execution order of commands
      Launched in-order
      Executed in-order or out-of-order
      Events are used to implement appropriate synchronization of execution instances

                                                                                                       Folie 17
                                                                     20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL: Memory Model
 Shared memory
    Relaxed consistency (similar to CUDA)
  Multiple distinct address spaces
(which can be collapsed)
    Private memory (to a work item)
  - Qualifier: __private; Ex.: __private char *px;
                                                          PE              PE                    PE                    PE
  - local memory in CUDA
   Local memory (to work group)
 - Qualifier: __local                                      Local Memory                                 Local Memory

 - Shared memory in CUDA
   Global memory                                                 Global / Constant Memory Data Cache

 - Qualifier: __global; Ex.: __global float4 *p;     Compute Device

 - Global memory in CUDA
   Constant memory                                                             Global Memory
 - Qualifier: __constant                             Compute Device Memory
 - Constant memory in CUDA

                                                                                                                       Folie 18
                                                                                     20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL: Memory Consistency


  A relaxed consistency memory model

         Across work-items (CUDA: threads) no consistency

         Within a work-item (CUDA: thread) load/store consistency

         Consistency of memory shared between commands is enforced
         through synchronization




                                                                                              Folie 19
                                                            20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL: Programming Model

Data-Parallel Model

   Must be implemented by all OpenCL compute devices

   Define N-Dimensional computation domain
      Each independent element of execution in an N-Dimensional domain is called a
     work-item
      N-Dimensional domain defines total # of work-items that execute in parallel
     = global work size

   Work-items can be grouped together — work-group
      Work-items in group can communicate with each other
      Can synchronize execution among work-items in group to coordinate memory access

   Execute multiple work-groups in parallel
      Mapping of global work size to work-group can be implicit or explicit


                                                                                                        Folie 20
                                                                      20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL: Programming Model


Task-Parallel Model

   Some compute devices can also execute task-parallel compute kernels


   Execute as a single work-item

   Users express parallelism by
      using vector data types implemented by the device,
      enqueuing multiple tasks (compute kernels written in OpenCL), and/or
      enqueing native kernels developed using a programming model
     orthogonal to OpenCL (native C / C++ functions, e.g.).




                                                                                              Folie 21
                                                            20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL: Programming Model, Synchronization


   Work-items in a single work-group (work-group barrier)
       Similar to _synchthreads () in CUDA

   No mechanism for synchronization between work-groups

   Synchronization points between commands in command-queues
       Similar to multiple kernels in CUDA but more generalized
       Command-queue barrier
       Waiting on an event




                                                                                              Folie 22
                                                            20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL C for Compute Kernels
  Derived from ISO C99
     A few restrictions: recursion, function pointers, functions in C99 standard headers ...
     Preprocessing directives defined by C99 are supported

  Built-in Data Types
     Scalar and vector data types, Pointers
     Data-type conversion functions: convert_type<_sat><_roundingmode>
     Image types: image2d_t, image3d_t and sampler_t

  Built-in Functions — Required
      Work-item functions, math.h, read and write image
      Relational, geometric functions, synchronization functions

  Built-in Functions — Optional
     double precision (latest CUDA supports this), atomics to global and local memory
     selection of rounding mode, writes to image3d_t surface


                                                                                                             Folie 23
                                                                           20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL C Language Highlights
  Function qualifiers
       “__kernel” qualifier declares a function as a kernel
       Kernels can call other kernel functions
  Address space qualifiers
       __global, __local, __constant, __private
       Pointer kernel arguments must be declared with an address space qualifier
  Work-item functions
       Query work-item identifiers
       get_work_dim()
       get_global_id(), get_local_id(), get_group_id()
  Image functions
       Images must be accessed through built-in functions
       Reads/writes performed through sampler objects from host or defined in source
  Synchronization functions
       Barriers - all work-items within a work-group must execute the barrier function before
       any work-item can continue
       Memory fences - provides ordering between memory operations

                                                                                                           Folie 24
                                                                         20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL C Language Restrictions


    Pointers to functions not allowed
    Pointers to pointers allowed within a kernel, but not as an argument
    Bit-fields not supported
    Variable-length arrays and structures not supported
    Recursion not supported
    Writes to a pointer of types less than 32-bit not supported
    Double types not supported, but reserved
    3D Image writes not supported


    Some restrictions are addressed through extensions




                                                                                                   Folie 25
                                                                 20090714-1 TechTalk OpenCL SC-VK Basermannt
Basic OpenCL Program Structure




                                                                   Folie 26
                                 20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL: Kernel Code Example



                               __kernel void VectorAdd(
      Simple element by            __global const float* a,
     element vector addition       __global const float* b,
                                   __global float* c)
                                 {
      For all i,                     int iGID = get_global_id(0);
      C(i) = A(i) + B(i)             c[iGID] = a[iGID] + b[iGID];
                                 }




                                                                                         Folie 27
                                                       20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL VectorAdd: Contexts and Queues

cl_context cxMainContext;           // OpenCL context
cl_command_queue cqCommandQue;      // OpenCL command queue
cl_device_id* cdDevices;            // OpenCL device list
size_t szParmDataBytes;             // byte length of parameter storage


// create the OpenCL context on a GPU device
cxMainContext = clCreateContextFromType (0, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);

// get the list of GPU devices associated with context
clGetContextInfo (cxMainContext, CL_CONTEXT_DEVICES, 0, NULL, &szParmDataBytes);
cdDevices = (cl_device_id*)malloc(szParmDataBytes);
clGetContextInfo (cxMainContext, CL_CONTEXT_DEVICES, szParmDataBytes, cdDevices, NULL);

// create a command-queue
cqCommandQue = clCreateCommandQueue (cxMainContext, cdDevices[0], 0, NULL);



                                                                                                            Folie 28
                                                                          20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL VectorAdd: Create Memory Objects,
                  Program and Kernel
    Create Memory Objects
    // allocate the first source buffer memory object… source data, so read only
    …
    // allocate the second source buffer memory object … source data, so read only
    …
    // allocate the destination buffer memory object … result data, so write only
    …
    Create Program and Kernel
    // create the program
    …
    // build the program
    …
    // create the kernel
    ...
    // set the kernel Argument values
    …

                                                                                                         Folie 29
                                                                       20090714-1 TechTalk OpenCL SC-VK Basermannt
OpenCL VectorAdd: Launch Kernel
cl_command_queue cqCommandQue;          // OpenCL command queue
cl_kernel ckKernel;                     // OpenCL kernel “VectorAdd”
size_t szGlobalWorkSize[1];             // Global # of work items
size_t szLocalWorkSize[1];              // # of Work Items in Work Group
int iTestN = 10000;                     // Length of demo test vectors

// set work-item dimensions
szGlobalWorkSize[0] = iTestN;
szLocalWorkSize[0]= 1;

// execute kernel
ciErrNum = clEnqueueNDRangeKernel (cqCommandQue, ckKernel, 1, NULL,
                                   szGlobalWorkSize, szLocalWorkSize,
                                   0, NULL, NULL);


// Cleanup: release kernel, program, and memory objects
…

                                                                                               Folie 30
                                                             20090714-1 TechTalk OpenCL SC-VK Basermannt
Summary: OpenCL versus CUDA

                     OpenCL                    CUDA



Execution Model      Work-groups/work-items    Block/Thread



Memory model         Global/constant/local/    Global/constant/shared/
                     private                   local
                                               + texture
Memory consistency   Weak consistency          Weak consistency



Synchronization      Synchronization using a   Using synch_threads
                     work-group barrier        between threads
                     (between work-items)


                                                                                            Folie 31
                                                          20090714-1 TechTalk OpenCL SC-VK Basermannt
Conclusions
 OpenCL meets the trends in HPC

   Supports multicore SMPs
   Supports accelerators (in particular GPGPUs)
   Allows programming of heterogenous processors
   Supports vectorization (SIMD operations)
   Explicitly supports parallel image processing
   Suitable for massive parallelism: OpenCL + MPI


   OpenCL is a low-level parallel language and complicated.

   If you master it you master (heterogeneous) parallel programming.

   It might be the basis for new high-level parallel languages.


                                                                                                    Folie 32
                                                                  20090714-1 TechTalk OpenCL SC-VK Basermannt
Questions?




                                               Folie 33
             20090714-1 TechTalk OpenCL SC-VK Basermannt
Next TechTalk on CUDA: Wednesday, July 22, 2009
(15:00-16:00, Raum-KP-2b-06, Funktional)




  CUDA - The Compute Unified Device Architecture
                 from NVIDIA

                  Jens Rühmkorf




                                                                           Folie 34
                                         20090714-1 TechTalk OpenCL SC-VK Basermannt
Additional material: OpenCL Demos


NVIDIA: http://www.youtube.com/watch?v=PJ1jydg8mLg




AMD (1): http://www.youtube.com/watch?v=MCaGb40Bz58&feature=related




AMD (2): http://www.youtube.com/watch?v=mcU89Td53Gg




                                                                                        Folie 35
                                                      20090714-1 TechTalk OpenCL SC-VK Basermannt

Contenu connexe

Tendances

PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...AMD Developer Central
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningAMD Developer Central
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFOfer Rosenberg
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...AMD Developer Central
 
Introduction to OpenVX
Introduction to OpenVXIntroduction to OpenVX
Introduction to OpenVX家榮 張
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterAMD Developer Central
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerAMD Developer Central
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...AMD Developer Central
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosAMD Developer Central
 
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyAMD Developer Central
 
"OpenCV for Embedded: Lessons Learned," a Presentation from itseez
"OpenCV for Embedded: Lessons Learned," a Presentation from itseez"OpenCV for Embedded: Lessons Learned," a Presentation from itseez
"OpenCV for Embedded: Lessons Learned," a Presentation from itseezEdge AI and Vision Alliance
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMDHSA Foundation
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
 
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-Pousty
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-PoustyCC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-Pousty
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-PoustyAMD Developer Central
 

Tendances (20)

PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOF
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
 
Introduction to OpenVX
Introduction to OpenVXIntroduction to OpenVX
Introduction to OpenVX
 
GPU Ecosystem
GPU EcosystemGPU Ecosystem
GPU Ecosystem
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben Gaster
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
PostgreSQL with OpenCL
PostgreSQL with OpenCLPostgreSQL with OpenCL
PostgreSQL with OpenCL
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
 
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
 
"OpenCV for Embedded: Lessons Learned," a Presentation from itseez
"OpenCV for Embedded: Lessons Learned," a Presentation from itseez"OpenCV for Embedded: Lessons Learned," a Presentation from itseez
"OpenCV for Embedded: Lessons Learned," a Presentation from itseez
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-Pousty
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-PoustyCC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-Pousty
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-Pousty
 

Similaire à OpenCL - The Open Standard for Heterogeneous Parallel Programming

OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021The Khronos Group Inc.
 
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...npinto
 
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre..."APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...Edge AI and Vision Alliance
 
OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC
 
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGNFUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGNPankaj Singh
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Pradeep Singh
 
Synthesis of Platform Architectures from OpenCL Programs
Synthesis of Platform Architectures from OpenCL ProgramsSynthesis of Platform Architectures from OpenCL Programs
Synthesis of Platform Architectures from OpenCL ProgramsNikos Bellas
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVGhodhbane Mohamed Amine
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...Edge AI and Vision Alliance
 
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...Edge AI and Vision Alliance
 
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ..."An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...Edge AI and Vision Alliance
 
OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC
 
Programming the Network Data Plane
Programming the Network Data PlaneProgramming the Network Data Plane
Programming the Network Data PlaneC4Media
 
Architecting Solutions for the Manycore Future
Architecting Solutions for the Manycore FutureArchitecting Solutions for the Manycore Future
Architecting Solutions for the Manycore FutureTalbott Crowell
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning
 
OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC
 
UPC and OpenMP Parallel Programming and Analysis in PTP with CDT
UPC and OpenMP Parallel Programming and Analysis in PTP with CDTUPC and OpenMP Parallel Programming and Analysis in PTP with CDT
UPC and OpenMP Parallel Programming and Analysis in PTP with CDTbethtib
 

Similaire à OpenCL - The Open Standard for Heterogeneous Parallel Programming (20)

OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021
 
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
[Harvard CS264] 10a - Easy, Effective, Efficient: GPU Programming in Python w...
 
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre..."APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
 
OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020
 
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGNFUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
 
Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
 
Synthesis of Platform Architectures from OpenCL Programs
Synthesis of Platform Architectures from OpenCL ProgramsSynthesis of Platform Architectures from OpenCL Programs
Synthesis of Platform Architectures from OpenCL Programs
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFV
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
 
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
 
openCL Paper
openCL PaperopenCL Paper
openCL Paper
 
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ..."An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
 
OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019
 
Programming the Network Data Plane
Programming the Network Data PlaneProgramming the Network Data Plane
Programming the Network Data Plane
 
Architecting Solutions for the Manycore Future
Architecting Solutions for the Manycore FutureArchitecting Solutions for the Manycore Future
Architecting Solutions for the Manycore Future
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020
 
UPC and OpenMP Parallel Programming and Analysis in PTP with CDT
UPC and OpenMP Parallel Programming and Analysis in PTP with CDTUPC and OpenMP Parallel Programming and Analysis in PTP with CDT
UPC and OpenMP Parallel Programming and Analysis in PTP with CDT
 
HPC Workbench Presentation
HPC Workbench PresentationHPC Workbench Presentation
HPC Workbench Presentation
 

Plus de Andreas Schreiber

Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
Provenance-based Security Audits and its Application to COVID-19 Contact Trac...Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
Provenance-based Security Audits and its Application to COVID-19 Contact Trac...Andreas Schreiber
 
Visualization of Software Architectures in Virtual Reality and Augmented Reality
Visualization of Software Architectures in Virtual Reality and Augmented RealityVisualization of Software Architectures in Virtual Reality and Augmented Reality
Visualization of Software Architectures in Virtual Reality and Augmented RealityAndreas Schreiber
 
Provenance as a building block for an open science infrastructure
Provenance as a building block for an open science infrastructureProvenance as a building block for an open science infrastructure
Provenance as a building block for an open science infrastructureAndreas Schreiber
 
Raising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace CenterRaising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace CenterAndreas Schreiber
 
Open Source Licensing for Rocket Scientists
Open Source Licensing for Rocket ScientistsOpen Source Licensing for Rocket Scientists
Open Source Licensing for Rocket ScientistsAndreas Schreiber
 
Interactive Visualization of Software Components with Virtual Reality Headsets
Interactive Visualization of Software Components with Virtual Reality HeadsetsInteractive Visualization of Software Components with Virtual Reality Headsets
Interactive Visualization of Software Components with Virtual Reality HeadsetsAndreas Schreiber
 
Provenance for Reproducible Data Science
Provenance for Reproducible Data ScienceProvenance for Reproducible Data Science
Provenance for Reproducible Data ScienceAndreas Schreiber
 
Visualizing Provenance using Comics
Visualizing Provenance using ComicsVisualizing Provenance using Comics
Visualizing Provenance using ComicsAndreas Schreiber
 
Nachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
Nachvollziehbarkeit mit Hinblick auf Privacy-VerletzungenNachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
Nachvollziehbarkeit mit Hinblick auf Privacy-VerletzungenAndreas Schreiber
 
Reproducible Science with Python
Reproducible Science with PythonReproducible Science with Python
Reproducible Science with PythonAndreas Schreiber
 
A Provenance Model for Quantified Self Data
A Provenance Model for Quantified Self DataA Provenance Model for Quantified Self Data
A Provenance Model for Quantified Self DataAndreas Schreiber
 
Tracking after Stroke: Doctors, Dogs and All The Rest
Tracking after Stroke: Doctors, Dogs and All The RestTracking after Stroke: Doctors, Dogs and All The Rest
Tracking after Stroke: Doctors, Dogs and All The RestAndreas Schreiber
 
High Throughput Processing of Space Debris Data
High Throughput Processing of Space Debris DataHigh Throughput Processing of Space Debris Data
High Throughput Processing of Space Debris DataAndreas Schreiber
 
Bericht von der QS15 Conference & Exposition
Bericht von der QS15 Conference & ExpositionBericht von der QS15 Conference & Exposition
Bericht von der QS15 Conference & ExpositionAndreas Schreiber
 
Telemedizin: Gesundheit, messbar für jedermann
Telemedizin: Gesundheit, messbar für jedermannTelemedizin: Gesundheit, messbar für jedermann
Telemedizin: Gesundheit, messbar für jedermannAndreas Schreiber
 
Quantified Self mit Wearable Devices und Smartphone-Sensoren
Quantified Self mit Wearable Devices und Smartphone-SensorenQuantified Self mit Wearable Devices und Smartphone-Sensoren
Quantified Self mit Wearable Devices und Smartphone-SensorenAndreas Schreiber
 

Plus de Andreas Schreiber (20)

Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
Provenance-based Security Audits and its Application to COVID-19 Contact Trac...Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
 
Visualization of Software Architectures in Virtual Reality and Augmented Reality
Visualization of Software Architectures in Virtual Reality and Augmented RealityVisualization of Software Architectures in Virtual Reality and Augmented Reality
Visualization of Software Architectures in Virtual Reality and Augmented Reality
 
Provenance as a building block for an open science infrastructure
Provenance as a building block for an open science infrastructureProvenance as a building block for an open science infrastructure
Provenance as a building block for an open science infrastructure
 
Raising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace CenterRaising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace Center
 
Open Source Licensing for Rocket Scientists
Open Source Licensing for Rocket ScientistsOpen Source Licensing for Rocket Scientists
Open Source Licensing for Rocket Scientists
 
Interactive Visualization of Software Components with Virtual Reality Headsets
Interactive Visualization of Software Components with Virtual Reality HeadsetsInteractive Visualization of Software Components with Virtual Reality Headsets
Interactive Visualization of Software Components with Virtual Reality Headsets
 
Provenance for Reproducible Data Science
Provenance for Reproducible Data ScienceProvenance for Reproducible Data Science
Provenance for Reproducible Data Science
 
Visualizing Provenance using Comics
Visualizing Provenance using ComicsVisualizing Provenance using Comics
Visualizing Provenance using Comics
 
Quantified Self Comics
Quantified Self ComicsQuantified Self Comics
Quantified Self Comics
 
Nachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
Nachvollziehbarkeit mit Hinblick auf Privacy-VerletzungenNachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
Nachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
 
Reproducible Science with Python
Reproducible Science with PythonReproducible Science with Python
Reproducible Science with Python
 
Python at Warp Speed
Python at Warp SpeedPython at Warp Speed
Python at Warp Speed
 
A Provenance Model for Quantified Self Data
A Provenance Model for Quantified Self DataA Provenance Model for Quantified Self Data
A Provenance Model for Quantified Self Data
 
Open Source im DLR
Open Source im DLROpen Source im DLR
Open Source im DLR
 
Tracking after Stroke: Doctors, Dogs and All The Rest
Tracking after Stroke: Doctors, Dogs and All The RestTracking after Stroke: Doctors, Dogs and All The Rest
Tracking after Stroke: Doctors, Dogs and All The Rest
 
High Throughput Processing of Space Debris Data
High Throughput Processing of Space Debris DataHigh Throughput Processing of Space Debris Data
High Throughput Processing of Space Debris Data
 
Bericht von der QS15 Conference & Exposition
Bericht von der QS15 Conference & ExpositionBericht von der QS15 Conference & Exposition
Bericht von der QS15 Conference & Exposition
 
Telemedizin: Gesundheit, messbar für jedermann
Telemedizin: Gesundheit, messbar für jedermannTelemedizin: Gesundheit, messbar für jedermann
Telemedizin: Gesundheit, messbar für jedermann
 
Big Python
Big PythonBig Python
Big Python
 
Quantified Self mit Wearable Devices und Smartphone-Sensoren
Quantified Self mit Wearable Devices und Smartphone-SensorenQuantified Self mit Wearable Devices und Smartphone-Sensoren
Quantified Self mit Wearable Devices und Smartphone-Sensoren
 

Dernier

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Dernier (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

OpenCL - The Open Standard for Heterogeneous Parallel Programming

  • 1. OpenCL – The Open Standard for Heterogeneous Parallel Programming Dr.-Ing. Achim Basermann Deutsches Zentrum für Luft- und Raumfahrt e.V. Abteilung Verteilte Systeme und Komponentensoftware, Köln-Porz Folie 1 20090714-1 TechTalk OpenCL SC-VK Basermann
  • 2. Survey Motivation Trends in parallel processing Role of OpenCL Khronos´ OpenCL Platform model Execution model Memory model Programming model Conclusions Material used: OpenCL survey and specification from http://www.khronos.org/opencl/ Folie 2 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 3. Motivation: Trends in Parallel Processing, HW Trend #1: Multicore processor chips Maintain (or even reduce) frequency while replicating cores Trend #2: Accelerators (GPGPUs, e.g.) Previously, processors would “catch” up with accelerator function in the next generation Accelerator design expense not amortized well New accelerator designs more likely to maintain performance advantage And will maintain an enormous power advantage for target workloads Trend #2b: Heterogeneous multicore in general (IBM Cell, e.g.) Mixes of powerful cores, smaller cores, and accelerators potentially offer the most efficient nodes The challenge is harnessing them efficiently Folie 3 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 4. Motivation: Programming Issues Many cores per node, and accelerators/heterogeneity Future performance gains will come via massive parallelism (not clock speed) An unwelcome situation for HPC apps! Need new programming models to exploit At the system/cluster level: Message-passing to connect node-level languages, or Global addressing to make communication implicit? Folie 4 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 5. Motivation: Classic Programming Models Folie 5 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 6. Motivation: OpenCL New open standard that specifically addresses parallel compute accelerators Extension to C Provides data parallel and task parallel models Facilitates natural transition from the growing number of CUDA (Compute Unified Device Architecture, NVIDIA) programs Porting of Cell applications to a standard model Play wells with MPI Can interoperate with Fortran and OpenMP Folie 6 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 7. Motivation: Roles of OpenCL Folie 7 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 8. Motivation: Before OpenCL Folie 8 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 9. Motivation: The promise of OpenCL Folie 9 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 10. : OpenCL CPUs GPUs Multiple cores driving Emerging Increasingly general purpose performance increases data-parallel computing Intersection Improving numerical precision OpenCL Heterogenous Computing Multi-processor Graphics APIs programming – and Shading e.g. OpenMP Languages OpenCL ––Open Computing Language OpenCL Open Computing Language Open, royalty-free standard for portable, parallel programming of heterogeneous Open, royalty-free standard for portable, parallel programming of heterogeneous parallel computing CPUs, GPUs, and other processors parallel computing CPUs, GPUs, and other processors Folie 10 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 11. OpenCL Working Group Diverse industry participation Processor vendors, system OEMs, middleware vendors, application developers Many industry-leading experts involved in OpenCL’s design A healthy diversity of industry perspectives Apple initially proposed and is very active in the working group Serving as specification editor Here are some of the other companies in the OpenCL working group Folie 11 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 12. OpenCL Timeline Six months from proposal to released specification Due to a strong initial proposal and a shared commercial incentive to work quickly Apple’s Mac OS X Snow Leopard will include OpenCL Improving speed and responsiveness for a wide spectrum of applications Multiple OpenCL implementations expected in the next 12 months On diverse platforms OpenCL Apple works working group Khronos with AMD, Intel, develops draft publicly releases NVIDIA and into cross- OpenCL as others on draft vendor royalty-free proposal specification specification Jun08 Oct08 May09 Dec08 Apple proposes Working Group Khronos to OpenCL working sends release group and completed draft conformance contributes draft to Khronos tests to ensure specification to Board for high-quality Khronos Ratification implementations Folie 12 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 13. Silicon Community OpenCL: Part of the Khronos API Ecosystem Desktop 3D Ecosystem Parallel computing and visualization in scientific and Software Community 3D Asset Interchange Format consumer applications Streamlined APIs for mobile and embedded graphics, media and Cross platform desktop 3D OpenCL compute acceleration Heterogeneous Parallel Computing Embedded 3D Streaming Media and Image Processing Vector 2D OpenCL is at the center of an emerging visual computing Enhanced Audio ecosystem that includes 3D graphics, video and image Surface and synch abstraction processing on desktop, embedded and mobile systems Integrated Mixed-media Stack Umbrella specifications define coherent acceleration stacks for mobile application portability Mobile OS Abstraction Folie 13 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 14. OpenCL: Platform Model One Host + one or more Compute Devices Each Compute Device is composed of one or more Compute Units Each Compute Unit is further divided into one or more Processing Elements Folie 14 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 15. OpenCL: Execution Model OpenCL Program: Kernels Basic unit of executable code — similar to C functions, CUDA kernels, etc. Data-parallel or task-parallel Host Program Collection of compute kernels and internal functions Analogous to a dynamic library Kernel Execution The host program invokes a kernel over an index space called an NDRange NDRange, “N-Dimensional Range”, can be a 1D, 2D, or 3D space A single kernel instance at a point in the index space is called a work-item Work-items have unique global IDs from the index space CUDA: thread IDs Work-items are further grouped into work-groups Work-groups have a unique work-group ID Work-items have a unique local ID within a work-group CUDA: Block IDs Folie 15 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 16. OpenCL: Execution Model, example 2D NDRange Total number of work-items = Gx * Gy Size of each work-group = Sx * Sy Global ID can be computed from work-group ID and local ID Folie 16 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 17. OpenCL: Execution Model Contexts are used to contain and manage the state of the “world” Kernels are executed in contexts defined and manipulated by the host Devices Kernels - OpenCL functions Program objects - kernel source and executable Memory objects Command-queue - coordinates execution of kernels Kernel execution commands Memory commands: Transfer or map memory object data Synchronization commands: Constrain the order of commands Execution order of commands Launched in-order Executed in-order or out-of-order Events are used to implement appropriate synchronization of execution instances Folie 17 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 18. OpenCL: Memory Model Shared memory Relaxed consistency (similar to CUDA) Multiple distinct address spaces (which can be collapsed) Private memory (to a work item) - Qualifier: __private; Ex.: __private char *px; PE PE PE PE - local memory in CUDA Local memory (to work group) - Qualifier: __local Local Memory Local Memory - Shared memory in CUDA Global memory Global / Constant Memory Data Cache - Qualifier: __global; Ex.: __global float4 *p; Compute Device - Global memory in CUDA Constant memory Global Memory - Qualifier: __constant Compute Device Memory - Constant memory in CUDA Folie 18 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 19. OpenCL: Memory Consistency A relaxed consistency memory model Across work-items (CUDA: threads) no consistency Within a work-item (CUDA: thread) load/store consistency Consistency of memory shared between commands is enforced through synchronization Folie 19 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 20. OpenCL: Programming Model Data-Parallel Model Must be implemented by all OpenCL compute devices Define N-Dimensional computation domain Each independent element of execution in an N-Dimensional domain is called a work-item N-Dimensional domain defines total # of work-items that execute in parallel = global work size Work-items can be grouped together — work-group Work-items in group can communicate with each other Can synchronize execution among work-items in group to coordinate memory access Execute multiple work-groups in parallel Mapping of global work size to work-group can be implicit or explicit Folie 20 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 21. OpenCL: Programming Model Task-Parallel Model Some compute devices can also execute task-parallel compute kernels Execute as a single work-item Users express parallelism by using vector data types implemented by the device, enqueuing multiple tasks (compute kernels written in OpenCL), and/or enqueing native kernels developed using a programming model orthogonal to OpenCL (native C / C++ functions, e.g.). Folie 21 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 22. OpenCL: Programming Model, Synchronization Work-items in a single work-group (work-group barrier) Similar to _synchthreads () in CUDA No mechanism for synchronization between work-groups Synchronization points between commands in command-queues Similar to multiple kernels in CUDA but more generalized Command-queue barrier Waiting on an event Folie 22 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 23. OpenCL C for Compute Kernels Derived from ISO C99 A few restrictions: recursion, function pointers, functions in C99 standard headers ... Preprocessing directives defined by C99 are supported Built-in Data Types Scalar and vector data types, Pointers Data-type conversion functions: convert_type<_sat><_roundingmode> Image types: image2d_t, image3d_t and sampler_t Built-in Functions — Required Work-item functions, math.h, read and write image Relational, geometric functions, synchronization functions Built-in Functions — Optional double precision (latest CUDA supports this), atomics to global and local memory selection of rounding mode, writes to image3d_t surface Folie 23 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 24. OpenCL C Language Highlights Function qualifiers “__kernel” qualifier declares a function as a kernel Kernels can call other kernel functions Address space qualifiers __global, __local, __constant, __private Pointer kernel arguments must be declared with an address space qualifier Work-item functions Query work-item identifiers get_work_dim() get_global_id(), get_local_id(), get_group_id() Image functions Images must be accessed through built-in functions Reads/writes performed through sampler objects from host or defined in source Synchronization functions Barriers - all work-items within a work-group must execute the barrier function before any work-item can continue Memory fences - provides ordering between memory operations Folie 24 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 25. OpenCL C Language Restrictions Pointers to functions not allowed Pointers to pointers allowed within a kernel, but not as an argument Bit-fields not supported Variable-length arrays and structures not supported Recursion not supported Writes to a pointer of types less than 32-bit not supported Double types not supported, but reserved 3D Image writes not supported Some restrictions are addressed through extensions Folie 25 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 26. Basic OpenCL Program Structure Folie 26 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 27. OpenCL: Kernel Code Example __kernel void VectorAdd( Simple element by __global const float* a, element vector addition __global const float* b, __global float* c) { For all i, int iGID = get_global_id(0); C(i) = A(i) + B(i) c[iGID] = a[iGID] + b[iGID]; } Folie 27 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 28. OpenCL VectorAdd: Contexts and Queues cl_context cxMainContext; // OpenCL context cl_command_queue cqCommandQue; // OpenCL command queue cl_device_id* cdDevices; // OpenCL device list size_t szParmDataBytes; // byte length of parameter storage // create the OpenCL context on a GPU device cxMainContext = clCreateContextFromType (0, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL); // get the list of GPU devices associated with context clGetContextInfo (cxMainContext, CL_CONTEXT_DEVICES, 0, NULL, &szParmDataBytes); cdDevices = (cl_device_id*)malloc(szParmDataBytes); clGetContextInfo (cxMainContext, CL_CONTEXT_DEVICES, szParmDataBytes, cdDevices, NULL); // create a command-queue cqCommandQue = clCreateCommandQueue (cxMainContext, cdDevices[0], 0, NULL); Folie 28 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 29. OpenCL VectorAdd: Create Memory Objects, Program and Kernel Create Memory Objects // allocate the first source buffer memory object… source data, so read only … // allocate the second source buffer memory object … source data, so read only … // allocate the destination buffer memory object … result data, so write only … Create Program and Kernel // create the program … // build the program … // create the kernel ... // set the kernel Argument values … Folie 29 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 30. OpenCL VectorAdd: Launch Kernel cl_command_queue cqCommandQue; // OpenCL command queue cl_kernel ckKernel; // OpenCL kernel “VectorAdd” size_t szGlobalWorkSize[1]; // Global # of work items size_t szLocalWorkSize[1]; // # of Work Items in Work Group int iTestN = 10000; // Length of demo test vectors // set work-item dimensions szGlobalWorkSize[0] = iTestN; szLocalWorkSize[0]= 1; // execute kernel ciErrNum = clEnqueueNDRangeKernel (cqCommandQue, ckKernel, 1, NULL, szGlobalWorkSize, szLocalWorkSize, 0, NULL, NULL); // Cleanup: release kernel, program, and memory objects … Folie 30 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 31. Summary: OpenCL versus CUDA OpenCL CUDA Execution Model Work-groups/work-items Block/Thread Memory model Global/constant/local/ Global/constant/shared/ private local + texture Memory consistency Weak consistency Weak consistency Synchronization Synchronization using a Using synch_threads work-group barrier between threads (between work-items) Folie 31 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 32. Conclusions OpenCL meets the trends in HPC Supports multicore SMPs Supports accelerators (in particular GPGPUs) Allows programming of heterogenous processors Supports vectorization (SIMD operations) Explicitly supports parallel image processing Suitable for massive parallelism: OpenCL + MPI OpenCL is a low-level parallel language and complicated. If you master it you master (heterogeneous) parallel programming. It might be the basis for new high-level parallel languages. Folie 32 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 33. Questions? Folie 33 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 34. Next TechTalk on CUDA: Wednesday, July 22, 2009 (15:00-16:00, Raum-KP-2b-06, Funktional) CUDA - The Compute Unified Device Architecture from NVIDIA Jens Rühmkorf Folie 34 20090714-1 TechTalk OpenCL SC-VK Basermannt
  • 35. Additional material: OpenCL Demos NVIDIA: http://www.youtube.com/watch?v=PJ1jydg8mLg AMD (1): http://www.youtube.com/watch?v=MCaGb40Bz58&feature=related AMD (2): http://www.youtube.com/watch?v=mcU89Td53Gg Folie 35 20090714-1 TechTalk OpenCL SC-VK Basermannt