3. OpenCL™ Execution Model
•Kernel
▫ Basic unit of executable code - similar to a C function
▫ Data-parallel or task-parallel
▫ H.264Encode is not a kernel
▫ Kernel should be a small separate function (SAD)
•Program
▫ Collection of kernels and other functions
▫ Analogous to a dynamic library
•Applications queue kernel execution instances
▫ Queued in-order
▫ Executed in-order or out-of-order
3
Fast Forward Your Development
4. Data-Parallelism in OpenCL™
•Define N-dimensional computation domain (N = 1, 2 or 3)
▫ Each independent element of execution in N-D
domain is called a work-item
▫ The N-D domain defines the total number of work-
items that execute in parallel
Scalar Data-Parallel
1024 x 1024 image:
void kernel void
problem dimensions: scalar_mul(int n, dp_mul(global const float *a,
1024 x 1024 = 1 kernel const float *a, global const float *b,
execution per pixel: const float *b, global float *result)
1,048,576 total executions float *result) {
{ int id = get_global_id(0);
int i; result[id] = a[id] * b[id];
for (i=0; i<n; i++) }
result[i] = a[i] * b[i]; // execute dp_mul over “n” work-items
}
4
Fast Forward Your Development
5. Compiling Kernels
• Create a program
▫ Input: String (source code) or precompiled binary
▫ Analogous to a dynamic library: A collection of
kernels
• Compile the program
▫ Specify the devices for which kernels should be
compiled
▫ Pass in compiler flags
▫ Check for compilation/build errors
• Create the kernels
▫ Returns a kernel object used to hold arguments for
a given execution
5
Fast Forward Your Development
8. BASIC Program structure
Include
Get Platform Info
Create Context
Load & compile program
Create Queue
Load and Run Kernel
8
Fast Forward Your Development
9. Includes
• Pay attention to include ALL OpenCL include
files
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <SDKFile.hpp>
#include <SDKCommon.hpp>
#include <SDKApplication.hpp>
#include <CL/cl.hpp>
9
Fast Forward Your Development
10. GetPlatformInfo
• Detects the OpenCL “Devices” in the system:
▫ CPUs, GPUs & DSPs
err = cl::Platform::get(&platforms);
if(err != CL_SUCCESS)
{ std::cerr << "Platform::get() failed (" << err << ")" << std::endl;
return SDK_FAILURE;
}
std::vector<cl::Platform>::iterator i;
if(platforms.size() > 0)
{ for(i = platforms.begin(); i != platforms.end(); ++i)
{
if(!strcmp((*i).getInfo<CL_PLATFORM_VENDOR>(&err).c_str(), "Advanced
Micro Devices, Inc."))
{ break;}
}
}
10
Fast Forward Your Development
11. Create Context
• Context enables operation (Queue) and memory
sharing between devices
cl_context_properties cps[3] =
{ CL_CONTEXT_PLATFORM, (cl_context_properties)(*i)(), 0 };
std::cout<<"Creating a context AMD platformn";
cl::Context context(CL_DEVICE_TYPE_CPU, cps, NULL, NULL, &err);
if (err != CL_SUCCESS)
{
std::cerr << "Context::Context() failed (" << err << ")n";
return SDK_FAILURE;
}
11
Fast Forward Your Development
12. Load Program
• Loads the kernel program (*.cl)
std::cout<<"Loading and compiling CL sourcen";
streamsdk::SDKFile file;
if (!file.open("HelloCL_Kernels.cl"))
{ std::cerr << "We couldn't load CL source coden";
return SDK_FAILURE;}
cl::Program::Sources
sources(1, std::make_pair(file.source().data(),
file.source().size()));
cl::Program program = cl::Program(context, sources, &err);
if (err != CL_SUCCESS)
{ std::cerr << "Program::Program() failed (" << err << ")n";
return SDK_FAILURE;
}
12
Fast Forward Your Development
13. Compile program
• Host program compiles Kernel program per
device.
• Why compile in RT? - Like Java we don’t know the
device till we run. We can decide in real-time
based on load-balancing on which device to run
err = program.build(devices);
if (err != CL_SUCCESS) {
if(err == CL_BUILD_PROGRAM_FAILURE)
{ //Handle Error
std::cerr << "Program::build() failed (" << err << ")n";
return SDK_FAILURE;
}
13
Fast Forward Your Development
14. Create Kernel with program
• Associate Kernel object with our loaded and
compiled program
cl::Kernel kernel(program, "hello", &err);
if (err != CL_SUCCESS)
{
std::cerr << "Kernel::Kernel() failed (" << err << ")n";
return SDK_FAILURE;
}
if (err != CL_SUCCESS) {
std::cerr << "Kernel::setArg() failed (" << err << ")n";
return SDK_FAILURE;
}
14
Fast Forward Your Development
15. Create Queue per device & Run it
• Loads the kernel program (*.cl). This does not
have to happen immediately
• Attention: enqueue() is Asynchronous call
meaning : function return does not imply Kernel
was executed or even started to execute
cl::CommandQueue queue(context, devices[0], 0, &err);
std::cout<<"Running CL programn";
err = queue.enqueueNDRangeKernel(…..)
err = queue.finish();
if (err != CL_SUCCESS) {
std::cerr << "Event::wait() failed (" << err << ")n";
}
15
Fast Forward Your Development
16. And that’s All Folks?
• Naaaa…..We still need to learn:
• Writing Kernel functions
• Synchronizing Kernel Functions
• Setting arguments to kernel functions
• Passing data from/to Host
16
Fast Forward Your Development
17. References
• “OpenCL Hello World” is an ATI OpenCL SDK
programming exercise
• ATI OpenCL slides
17
Fast Forward Your Development
18. DSP-IP Contact information
Download slides at: www.dsp-ip.com
Course materials & lecture request
Yossi Cohen
info@dsp-ip.com
+972-9-8850956
www.dsp-ip.com
Mail : info@dsp-ip.com
Phone: +972-9-8850956,
Fax : +972-50- 8962910
Fast Forward Your Development