SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
OpenCL
Host Programming



   Fast Forward Your Development   www.dsp-ip.com
OPENCL™ EXECUTION MODEL




  Fast Forward Your Development
OpenCL™ Execution Model
•Kernel
  ▫ Basic unit of executable code - similar to a C function
  ▫ Data-parallel or task-parallel
  ▫ H.264Encode is not a kernel
  ▫ Kernel should be a small separate function (SAD)
•Program
  ▫ Collection of kernels and other functions
  ▫ Analogous to a dynamic library
•Applications queue kernel execution instances
  ▫ Queued in-order
  ▫ Executed in-order or out-of-order


                                                              3
        Fast Forward Your Development
Data-Parallelism in OpenCL™
  •Define N-dimensional computation domain (N = 1, 2 or 3)
     ▫ Each independent element of execution in N-D
       domain is called a work-item
     ▫ The N-D domain defines the total number of work-
       items that execute in parallel
                                            Scalar                  Data-Parallel
1024 x 1024 image:
                           void                              kernel void
problem dimensions:        scalar_mul(int n,                 dp_mul(global const float *a,
1024 x 1024 = 1 kernel           const float *a,                   global const float *b,
execution per pixel:             const float *b,                   global float *result)
1,048,576 total executions       float *result)              {
                             {                                 int id = get_global_id(0);
                                 int i;                        result[id] = a[id] * b[id];
                                 for (i=0; i<n; i++)         }
                                  result[i] = a[i] * b[i];   // execute dp_mul over “n” work-items
                             }


                                                                                               4
              Fast Forward Your Development
Compiling Kernels
• Create a program
  ▫ Input: String (source code) or precompiled binary
  ▫ Analogous to a dynamic library: A collection of
    kernels
• Compile the program
  ▫ Specify the devices for which kernels should be
    compiled
  ▫ Pass in compiler flags
  ▫ Check for compilation/build errors
• Create the kernels
  ▫ Returns a kernel object used to hold arguments for
    a given execution
                                                         5
       Fast Forward Your Development
EX-1:OPENCL-”HELLO WORLD”




  Fast Forward Your Development
Fast Forward Your Development
BASIC Program structure
         Include
         Get Platform Info
         Create Context
         Load & compile program
         Create Queue
         Load and Run Kernel
                                    8
    Fast Forward Your Development
Includes
• Pay attention to include ALL OpenCL include
  files


#include   <cstdio>
#include   <cstdlib>
#include   <iostream>
#include   <SDKFile.hpp>
#include   <SDKCommon.hpp>
#include   <SDKApplication.hpp>
#include   <CL/cl.hpp>

                                                9
      Fast Forward Your Development
GetPlatformInfo
• Detects the OpenCL “Devices” in the system:
   ▫ CPUs, GPUs & DSPs
err = cl::Platform::get(&platforms);
if(err != CL_SUCCESS)
{   std::cerr << "Platform::get() failed (" << err << ")" << std::endl;
    return SDK_FAILURE;
}
std::vector<cl::Platform>::iterator i;
if(platforms.size() > 0)
{ for(i = platforms.begin(); i != platforms.end(); ++i)
   {
      if(!strcmp((*i).getInfo<CL_PLATFORM_VENDOR>(&err).c_str(), "Advanced
       Micro Devices, Inc."))
      { break;}
   }
}


                                                                             10
          Fast Forward Your Development
Create Context
• Context enables operation (Queue) and memory
  sharing between devices



cl_context_properties cps[3] =
{ CL_CONTEXT_PLATFORM, (cl_context_properties)(*i)(), 0 };
std::cout<<"Creating a context AMD platformn";
cl::Context context(CL_DEVICE_TYPE_CPU, cps, NULL, NULL, &err);
if (err != CL_SUCCESS)
{
       std::cerr << "Context::Context() failed (" << err << ")n";
       return SDK_FAILURE;
}


                                                                     11
        Fast Forward Your Development
Load Program
• Loads the kernel program (*.cl)

std::cout<<"Loading and compiling CL sourcen";
streamsdk::SDKFile file;
if (!file.open("HelloCL_Kernels.cl"))
{   std::cerr << "We couldn't load CL source coden";
    return SDK_FAILURE;}
cl::Program::Sources
sources(1, std::make_pair(file.source().data(),
file.source().size()));
cl::Program program = cl::Program(context, sources, &err);
if (err != CL_SUCCESS)
{   std::cerr << "Program::Program() failed (" << err << ")n";
    return SDK_FAILURE;
}

                                                                  12
        Fast Forward Your Development
Compile program
• Host program compiles Kernel program per
  device.
• Why compile in RT? - Like Java we don’t know the
  device till we run. We can decide in real-time
  based on load-balancing on which device to run
 err = program.build(devices);
    if (err != CL_SUCCESS) {

if(err == CL_BUILD_PROGRAM_FAILURE)
{      //Handle Error
       std::cerr << "Program::build() failed (" << err << ")n";
       return SDK_FAILURE;
}


                                                                   13
         Fast Forward Your Development
Create Kernel with program
• Associate Kernel object with our loaded and
  compiled program

cl::Kernel kernel(program, "hello", &err);
if (err != CL_SUCCESS)
{
  std::cerr << "Kernel::Kernel() failed (" << err << ")n";
  return SDK_FAILURE;
}
if (err != CL_SUCCESS) {
  std::cerr << "Kernel::setArg() failed (" << err << ")n";
  return SDK_FAILURE;
}


                                                          14
        Fast Forward Your Development
Create Queue per device & Run it
• Loads the kernel program (*.cl). This does not
  have to happen immediately
• Attention: enqueue() is Asynchronous call
  meaning : function return does not imply Kernel
  was executed or even started to execute
cl::CommandQueue queue(context, devices[0], 0, &err);
std::cout<<"Running CL programn";
err = queue.enqueueNDRangeKernel(…..)
err = queue.finish();
if (err != CL_SUCCESS) {
    std::cerr << "Event::wait() failed (" << err << ")n";
}




                                                             15
        Fast Forward Your Development
And that’s All Folks?
• Naaaa…..We still need to learn:
• Writing Kernel functions
• Synchronizing Kernel Functions
• Setting arguments to kernel functions
• Passing data from/to Host




                                          16
     Fast Forward Your Development
References
• “OpenCL Hello World” is an ATI OpenCL SDK
  programming exercise
• ATI OpenCL slides




                                              17
      Fast Forward Your Development
DSP-IP Contact information
Download slides at: www.dsp-ip.com

Course materials & lecture request
Yossi Cohen
info@dsp-ip.com
+972-9-8850956




                                   www.dsp-ip.com
                                   Mail : info@dsp-ip.com
                                   Phone: +972-9-8850956,
                                   Fax : +972-50- 8962910


       Fast Forward Your Development

Contenu connexe

Tendances

Operating system structures
Operating system structuresOperating system structures
Operating system structures
Mohd Arif
 
Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)
Sri Prasanna
 

Tendances (20)

Linux Internals - Interview essentials 2.0
Linux Internals - Interview essentials 2.0Linux Internals - Interview essentials 2.0
Linux Internals - Interview essentials 2.0
 
Lamport’s algorithm for mutual exclusion
Lamport’s algorithm for mutual exclusionLamport’s algorithm for mutual exclusion
Lamport’s algorithm for mutual exclusion
 
Ch03
Ch03Ch03
Ch03
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Chapter 15 - Security
Chapter 15 - SecurityChapter 15 - Security
Chapter 15 - Security
 
Operating Systems Process Scheduling Algorithms
Operating Systems   Process Scheduling AlgorithmsOperating Systems   Process Scheduling Algorithms
Operating Systems Process Scheduling Algorithms
 
Code generation
Code generationCode generation
Code generation
 
Open mp
Open mpOpen mp
Open mp
 
The Message Passing Interface (MPI) in Layman's Terms
The Message Passing Interface (MPI) in Layman's TermsThe Message Passing Interface (MPI) in Layman's Terms
The Message Passing Interface (MPI) in Layman's Terms
 
RC 4
RC 4 RC 4
RC 4
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
Block Ciphers Modes of Operation
Block Ciphers Modes of OperationBlock Ciphers Modes of Operation
Block Ciphers Modes of Operation
 
SPI Drivers
SPI DriversSPI Drivers
SPI Drivers
 
Operating system structures
Operating system structuresOperating system structures
Operating system structures
 
Loop optimization
Loop optimizationLoop optimization
Loop optimization
 
Priority Scheduling
Priority Scheduling  Priority Scheduling
Priority Scheduling
 
Multithreading
Multithreading Multithreading
Multithreading
 
Qemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System EmulationQemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System Emulation
 
Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)
 
Pipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture pptPipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture ppt
 

En vedette

OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
USC
 
FPGA Architecture Presentation
FPGA Architecture PresentationFPGA Architecture Presentation
FPGA Architecture Presentation
omutukuda
 

En vedette (15)

Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 
FPGA Architecture Presentation
FPGA Architecture PresentationFPGA Architecture Presentation
FPGA Architecture Presentation
 
Field programable gate array
Field programable gate arrayField programable gate array
Field programable gate array
 
FPGAs : An Overview
FPGAs : An OverviewFPGAs : An Overview
FPGAs : An Overview
 
FPGA Introduction
FPGA IntroductionFPGA Introduction
FPGA Introduction
 
FPGA
FPGAFPGA
FPGA
 
What is FPGA?
What is FPGA?What is FPGA?
What is FPGA?
 
FPGA
FPGAFPGA
FPGA
 
Fundamentals of FPGA
Fundamentals of FPGAFundamentals of FPGA
Fundamentals of FPGA
 

Similaire à OpenCL Programming 101

MattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxMattsonTutorialSC14.pptx
MattsonTutorialSC14.pptx
gopikahari7
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStack
ke4qqq
 
Puppet and CloudStack
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStack
ke4qqq
 

Similaire à OpenCL Programming 101 (20)

MattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxMattsonTutorialSC14.pptx
MattsonTutorialSC14.pptx
 
MattsonTutorialSC14.pdf
MattsonTutorialSC14.pdfMattsonTutorialSC14.pdf
MattsonTutorialSC14.pdf
 
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhereNew Jersey Red Hat Users Group Presentation: Provisioning anywhere
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
 
Making Service Deployments to AWS a breeze with Nova
Making Service Deployments to AWS a breeze with NovaMaking Service Deployments to AWS a breeze with Nova
Making Service Deployments to AWS a breeze with Nova
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
 
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowakiGoogle Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Deep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecDeep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVec
 
Back to the future with C++ and Seastar
Back to the future with C++ and SeastarBack to the future with C++ and Seastar
Back to the future with C++ and Seastar
 
Build your operator with the right tool
Build your operator with the right toolBuild your operator with the right tool
Build your operator with the right tool
 
From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...
 
NodeJS for Beginner
NodeJS for BeginnerNodeJS for Beginner
NodeJS for Beginner
 
maxbox starter72 multilanguage coding
maxbox starter72 multilanguage codingmaxbox starter72 multilanguage coding
maxbox starter72 multilanguage coding
 
Create your oracle_apps_r12_lab_with_less_than_us1000
Create your oracle_apps_r12_lab_with_less_than_us1000Create your oracle_apps_r12_lab_with_less_than_us1000
Create your oracle_apps_r12_lab_with_less_than_us1000
 
Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStack
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStack
 
Build Your Kubernetes Operator with the Right Tool!
Build Your Kubernetes Operator with the Right Tool!Build Your Kubernetes Operator with the Right Tool!
Build Your Kubernetes Operator with the Right Tool!
 
Scaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container ServiceScaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container Service
 
Puppet and CloudStack
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStack
 

Plus de Yoss Cohen

Product wise computer vision development
Product wise computer vision developmentProduct wise computer vision development
Product wise computer vision development
Yoss Cohen
 

Plus de Yoss Cohen (20)

Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
open platform for swarm training
open platform for swarm training open platform for swarm training
open platform for swarm training
 
Deep Learning - system view
Deep Learning - system viewDeep Learning - system view
Deep Learning - system view
 
Dspip deep learning syllabus
Dspip deep learning syllabusDspip deep learning syllabus
Dspip deep learning syllabus
 
IoT consideration selection
IoT consideration selectionIoT consideration selection
IoT consideration selection
 
IoT evolution
IoT evolutionIoT evolution
IoT evolution
 
Nvidia jetson nano bringup
Nvidia jetson nano bringupNvidia jetson nano bringup
Nvidia jetson nano bringup
 
Autonomous car teleportation architecture
Autonomous car teleportation architectureAutonomous car teleportation architecture
Autonomous car teleportation architecture
 
Motion estimation overview
Motion estimation overviewMotion estimation overview
Motion estimation overview
 
Computer Vision - Image Filters
Computer Vision - Image FiltersComputer Vision - Image Filters
Computer Vision - Image Filters
 
Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learn
 
DASH and HTTP2.0
DASH and HTTP2.0DASH and HTTP2.0
DASH and HTTP2.0
 
HEVC Definitions and high-level syntax
HEVC Definitions and high-level syntaxHEVC Definitions and high-level syntax
HEVC Definitions and high-level syntax
 
Introduction to HEVC
Introduction to HEVCIntroduction to HEVC
Introduction to HEVC
 
FFMPEG on android
FFMPEG on androidFFMPEG on android
FFMPEG on android
 
Hands-on Video Course - "RAW Video"
Hands-on Video Course - "RAW Video" Hands-on Video Course - "RAW Video"
Hands-on Video Course - "RAW Video"
 
Video quality testing
Video quality testingVideo quality testing
Video quality testing
 
HEVC / H265 Hands-On course
HEVC / H265 Hands-On courseHEVC / H265 Hands-On course
HEVC / H265 Hands-On course
 
Web video standards
Web video standardsWeb video standards
Web video standards
 
Product wise computer vision development
Product wise computer vision developmentProduct wise computer vision development
Product wise computer vision development
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

OpenCL Programming 101

  • 1. OpenCL Host Programming Fast Forward Your Development www.dsp-ip.com
  • 2. OPENCL™ EXECUTION MODEL Fast Forward Your Development
  • 3. OpenCL™ Execution Model •Kernel ▫ Basic unit of executable code - similar to a C function ▫ Data-parallel or task-parallel ▫ H.264Encode is not a kernel ▫ Kernel should be a small separate function (SAD) •Program ▫ Collection of kernels and other functions ▫ Analogous to a dynamic library •Applications queue kernel execution instances ▫ Queued in-order ▫ Executed in-order or out-of-order 3 Fast Forward Your Development
  • 4. Data-Parallelism in OpenCL™ •Define N-dimensional computation domain (N = 1, 2 or 3) ▫ Each independent element of execution in N-D domain is called a work-item ▫ The N-D domain defines the total number of work- items that execute in parallel Scalar Data-Parallel 1024 x 1024 image: void kernel void problem dimensions: scalar_mul(int n, dp_mul(global const float *a, 1024 x 1024 = 1 kernel const float *a, global const float *b, execution per pixel: const float *b, global float *result) 1,048,576 total executions float *result) { { int id = get_global_id(0); int i; result[id] = a[id] * b[id]; for (i=0; i<n; i++) } result[i] = a[i] * b[i]; // execute dp_mul over “n” work-items } 4 Fast Forward Your Development
  • 5. Compiling Kernels • Create a program ▫ Input: String (source code) or precompiled binary ▫ Analogous to a dynamic library: A collection of kernels • Compile the program ▫ Specify the devices for which kernels should be compiled ▫ Pass in compiler flags ▫ Check for compilation/build errors • Create the kernels ▫ Returns a kernel object used to hold arguments for a given execution 5 Fast Forward Your Development
  • 6. EX-1:OPENCL-”HELLO WORLD” Fast Forward Your Development
  • 7. Fast Forward Your Development
  • 8. BASIC Program structure Include Get Platform Info Create Context Load & compile program Create Queue Load and Run Kernel 8 Fast Forward Your Development
  • 9. Includes • Pay attention to include ALL OpenCL include files #include <cstdio> #include <cstdlib> #include <iostream> #include <SDKFile.hpp> #include <SDKCommon.hpp> #include <SDKApplication.hpp> #include <CL/cl.hpp> 9 Fast Forward Your Development
  • 10. GetPlatformInfo • Detects the OpenCL “Devices” in the system: ▫ CPUs, GPUs & DSPs err = cl::Platform::get(&platforms); if(err != CL_SUCCESS) { std::cerr << "Platform::get() failed (" << err << ")" << std::endl; return SDK_FAILURE; } std::vector<cl::Platform>::iterator i; if(platforms.size() > 0) { for(i = platforms.begin(); i != platforms.end(); ++i) { if(!strcmp((*i).getInfo<CL_PLATFORM_VENDOR>(&err).c_str(), "Advanced Micro Devices, Inc.")) { break;} } } 10 Fast Forward Your Development
  • 11. Create Context • Context enables operation (Queue) and memory sharing between devices cl_context_properties cps[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties)(*i)(), 0 }; std::cout<<"Creating a context AMD platformn"; cl::Context context(CL_DEVICE_TYPE_CPU, cps, NULL, NULL, &err); if (err != CL_SUCCESS) { std::cerr << "Context::Context() failed (" << err << ")n"; return SDK_FAILURE; } 11 Fast Forward Your Development
  • 12. Load Program • Loads the kernel program (*.cl) std::cout<<"Loading and compiling CL sourcen"; streamsdk::SDKFile file; if (!file.open("HelloCL_Kernels.cl")) { std::cerr << "We couldn't load CL source coden"; return SDK_FAILURE;} cl::Program::Sources sources(1, std::make_pair(file.source().data(), file.source().size())); cl::Program program = cl::Program(context, sources, &err); if (err != CL_SUCCESS) { std::cerr << "Program::Program() failed (" << err << ")n"; return SDK_FAILURE; } 12 Fast Forward Your Development
  • 13. Compile program • Host program compiles Kernel program per device. • Why compile in RT? - Like Java we don’t know the device till we run. We can decide in real-time based on load-balancing on which device to run err = program.build(devices); if (err != CL_SUCCESS) { if(err == CL_BUILD_PROGRAM_FAILURE) { //Handle Error std::cerr << "Program::build() failed (" << err << ")n"; return SDK_FAILURE; } 13 Fast Forward Your Development
  • 14. Create Kernel with program • Associate Kernel object with our loaded and compiled program cl::Kernel kernel(program, "hello", &err); if (err != CL_SUCCESS) { std::cerr << "Kernel::Kernel() failed (" << err << ")n"; return SDK_FAILURE; } if (err != CL_SUCCESS) { std::cerr << "Kernel::setArg() failed (" << err << ")n"; return SDK_FAILURE; } 14 Fast Forward Your Development
  • 15. Create Queue per device & Run it • Loads the kernel program (*.cl). This does not have to happen immediately • Attention: enqueue() is Asynchronous call meaning : function return does not imply Kernel was executed or even started to execute cl::CommandQueue queue(context, devices[0], 0, &err); std::cout<<"Running CL programn"; err = queue.enqueueNDRangeKernel(…..) err = queue.finish(); if (err != CL_SUCCESS) { std::cerr << "Event::wait() failed (" << err << ")n"; } 15 Fast Forward Your Development
  • 16. And that’s All Folks? • Naaaa…..We still need to learn: • Writing Kernel functions • Synchronizing Kernel Functions • Setting arguments to kernel functions • Passing data from/to Host 16 Fast Forward Your Development
  • 17. References • “OpenCL Hello World” is an ATI OpenCL SDK programming exercise • ATI OpenCL slides 17 Fast Forward Your Development
  • 18. DSP-IP Contact information Download slides at: www.dsp-ip.com Course materials & lecture request Yossi Cohen info@dsp-ip.com +972-9-8850956 www.dsp-ip.com Mail : info@dsp-ip.com Phone: +972-9-8850956, Fax : +972-50- 8962910 Fast Forward Your Development