SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
OPENCV
OPENCL™ ACCELERATED COMPUTER VISION
 OpenCV Introduction
Andrey Pavlenko, Itseez

 Heterogeneous Compute and OpenCV
Dr. Harris Gasparakis, AMD

 OpenCV 3.0
Vadim Pisarevsky, Itseez
OpenCV introduction
Andrey Pavlenko
1.

Features

2.

History

3.

Development Process

4.

Performance
Open-source Computer Vision Library
1. 2,500+ algorithms and functions
2. Cross-platform

3. Liberal BSD license
4. High performance
5. Professionally developed
6. 7M+ downloads
Functionality overview
Image Processing

Filters

Transformations

Edges, contours

Robust features

Segmentation

Video, Stereo, 3D

Calibration

Pose estimation

Optical Flow

Detection and
recognition

Depth
Industrial applications
• Street View Panorama, etc. (Google)
• Vision system of the PR2 robot (Willow Garage)
• Robots for Mars exploration (NASA)
• Quality control of the production of coins (China)
OpenCV History
Popularity

Contributors

Core team
2000
First
public
release

2008

2009
v2.0
C++ API

2012
@github

2013
v2.4.3,
opencl

present
Contribution/patch workflow:
see OpenCV wiki

OpenCV infrastructure
build.opencv.org: buildbot with 50+ builders

50+ builds nightly!

github.com/itseez/opencv
pullrequest.opencv.org

Every patch to OpenCV
must pass 7 builders!
OpenCV resources
1. Home: opencv.org
2. Docs and tutorials: docs.opencv.org
3. Q&A forum: answers.opencv.org
4. Wiki and issues: code.opencv.org
5. Develop: https://github.com/Itseez/opencv

6. Packages: sourceforge.net/projects/opencvlibrary/
OpenCL™ in OpenCV 2.4
• ‘ocl’ is a separate module (cv::ocl::resize())
• runs on various OpenCL-compliant devices and OSes
• 2.4.7 release on November 6
–
–
–
–
–
–
–
–

official Windows bin pack with OpenCL enabled
OpenCV pre-commit check includes OpenCL tests
200+ pull requests since 2.4.6 (most actively developed OpenCV part)
dynamic OpenCL runtime loading
set default OpenCL device via environment variable
~800 optimized kernels, ~30% of most commonly used functionality
8000+ accuracy and ~500 performance tests
can be built without OpenCL SDK installed
OpenCL™ performance in OpenCV 2.4
AMD A10-6800k (with HD8670D) + Radeon HD7790
HETEROGENEOUS COMPUTE AND OPENCV
 The OpenCL™ Module in OpenCV
 Heterogeneous compute and Computer Vision
 Compute paths and data representations

 Future roadmap: transparent API

12 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
OPENCV’S OPENCL™ MODULE
 Enables taking advantage of OpenCL™ acceleration, but currently it is an explicit path a developer can
choose to call. All OpenCL memory buffer types are supported, but not automatically optimized.
‒ But stay tuned for OpenCV 3.0’s transparent API.

 Initial release: OpenCV 2.4.3 [11/2012]
 Currently ~800 kernels
‒ Image processing
‒ Pixel-wise operations
‒ Geometric transforms
‒ Pixel transforms: filtering, edges, corners etc

‒ Feature detection and matching
‒ SURF, HOG, Haar, brute matching, kNN. templateMatching

‒ Object recognition
‒ SVM: Support Vector Machine

 Applications, including:
‒ Face Detect
‒ Optical flow
‒ Stereo Matching

13 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
COMPILING FROM SOURCE

 OpenCL™ is enabled by default in CMAKE

14 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
COMPILING FROM SOURCE
BROWSE/BUILD CODE IN AN IDE

OpenCL™ module (2.4.x).
Rebuild it even if you just
change a kernel
OpenCL kernels. Those are
converted to kernels.cpp by a
script (hence you need to
rebuild if you change a kernel).

OpenCL samples. After you
build them, go to
[ROOT]bin[CONFIG],,
observe: ocl-example-*.exe

15 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
INCORPORATING OPENCV INTO YOUR OWN CODE
 APP SDK provides 3 examples. Very easy integration!
 With less than 15 lines of code you can have a minimal program that reads video frames,
passes them to the OpenCL™ device, and runs your own simple kernel! OpenCV-CL:
‒ takes care of all OpenCL plumbing
‒ Compiles the kernels, and even caches them at runtime, and saves the OpenCL binaries on disk [user can also
modify default behavior]
‒ Allows specifying an OpenCL device/platform via environment variable.
‒ Allows plugging your own kernels to OpenCV-CL, using the OpenCV-CL data-structures.

16 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
INCORPORATING OPENCV INTO YOUR OWN CODE
SOME CODE, FROM APP SDK 2.9, GESTURE SAMPLE, SHOWCASING OPENNI® INTEGRATION

cv::Mat depthImgClamp = cv::Mat( SIZEY, SIZEX, CV_8UC1, openniBuffer);
cv::ocl::oclMat oclDepthImgClamp(depthImgClamp );

In one line, populate an image
in GPU!

vector<pair<size_t, const void *> > args;
args.push_back(make_pair(sizeof(cl_mem), (void *)&src.data));
args.push_back(make_pair(sizeof(cl_mem), (void *)&oclDst.data));

openCLExecuteKernelInterop (oclDst.clCxt, &depthConvertSrcStr, "convertDepthToWorldCoordinates",

globalThreads, localThreads, args, -1, -1, "",
false, false, true); }

In one command, add your own
kernel launch, acting on
OpenCV-CL data-structures

17 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
HETEROGENEOUS COMPUTE AND COMPUTER VISION
Webcams
everywhere

Heterogeneous
compute everywhere

Real time
computer vision
everywhere
 Heterogeneous compute mission: To take optimal advantage of the full capabilities of the underlying
platform.
‒ APU / HSA APU
‒ Discrete GPU
‒ CPU
‒ FPGA, DSP, etc.
18 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL

Many code paths?
- Possibly interleaving execution between different devices

Many data representations?
DATA REPRESENTATIONS
DISCRETE

APUS, OPENCL™ 1.2

 Copy data to/from
GPU

 Use “device Memory” for data that is used
between GPU kernels

 Map/unmap using
pinned memory

‒ True for all generations. Special memory that
can be read and written fast by GPU.
‒ On APUs: physically part of main memory,
possibly with special paths.
‒ But: device memory cannot be read/written
very fast from CPU.

 Zero copy (map/unmap): best path for
data written(read) by CPU(GPU) or vice
versa.
 Cannot mix and match (bounce back and
forth between) CPU and GPU well.
 Small kernels are typically a bad idea

19 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
H1’14: APUS, OPENCL™ 1.2 + HSA extensions OR OPENCL 2.0

 Can still use “device Memory” for data that is used between GPU kernels, and zero copy still
available.
 However: SVM (shared virtual memory) can be written to/read from both CPU and GPU fast
“enough”
‒ Enables ping/pong (producer/consumer) between CPU and GPU
‒ Enables concurrent producer/consumer between CPU/GPU (platform atomics)
‒ Much easier to port a vision pipeline using HSA. You can incrementally pick and choose what part of
the pipeline to accelerate, and what part to allow the CPU to execute.
‒ On HSA APUs, using SVM is reasonable (and better) than current defaults., significantly simplifying
code.

 User mode enqueueing: much faster kernel dispatching leads to less performance
degradation of small kernels. Can feed the GPU smaller computational tasks fast, and (busy)
wait for results on the CPU.

20 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
COMPUTE PATHS
OpenCV 2.4.x: Face detect on CPU

// initialization
VideoCapture vcap(...);
CascadeClassifier fd("haar_ff.xml");

Removed image

Mat frame, frameGray;

demonstrating face detect

vector<Rect> faces;
for(;;){
// processing loop

vcap >> frame;
cvtColor(frame, frameGray, BGR2GRAY);
equalizeHist(frameGray, frameGray);

fd.detectMultiScale(frameGray, faces, ...);
// draw rectangles …
// show image …
}
21 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
COMPUTE PATHS
OpenCV 2.4.x: Face detect with OpenCL™
// initialization
VideoCapture vcap(...);

ocl::OclCascadeClassifier fd("haar_ff.xml");

Removed image

ocl::oclMat frame, frameGray;

demonstrating face detect

Mat frameCpu;

vector<Rect> faces;
for(;;){
// processing loop
vcap >> frameCpu;
frame = frameCpu;
ocl:: cvtColor(frame, frameGray, BGR2GRAY);
ocl:: equalizeHist(frameGray, frameGray);
ocl:: fd.detectMultiScale(frameGray, faces, ...);
// draw rectangles …
// show image …

22 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
FUTURE ROADMAP
‒ Incorporate OpenCL™ 1.2 with HSA extensions, and OpenCL 2.0
‒ Shared Virtual Memory (SVM) significantly simplifies programming model in general. Allows reusing existing memory as SVM.
‒ In SVM, a “pointer is a pointer”
‒ Pass your tree/linked list/graph data structure in the GPU, have threads explore sub-branches, or explore paths on a graph

‒ Transparent API:
‒
‒
‒
‒

One code path, OpenCV will choose the best execution path at runtime, given the platform.
Changes of data locality should be implemented by the framework.
Includes applying heuristics appropriate for underlying hardware (dGPU, APU, HSA APU).
Eventually it should be self-optimizing
‒ reasonably define optimal memory type “under the hood.”
‒ Detect data flow dependencies, in the pipeline, and automatically represent them as OpenCL events.

Starting with OpenCV 3.0.

23 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap
changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software
changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD
reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of
such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,
Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names
are for informational purposes only and may be trademarks of their respective owners.
24 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
OpenCV 3.0
Vadim Pisarevsky
1.

Transparent API

2.

UMat

3.

Under the hood
OpenCV 3.0
• OpenCV 3.0 is scheduled for 2014’Q1
• Based on 2.x, but:
– transparent API and more efficient and platform-specific

OpenCL™ codepaths (including better zero-copy and SVM support)
– API cleanup
– a lot of new algorithms
Transparent API
• same code can run on CPU or GPU

– no specialized cv::ocl::Canny vs cv::Canny
– no recompilation is needed

• includes the following key components:
– new data structure UMat (Universal Mat)
–
–

simple and robust mechanism for async processing
convenient API for custom algorithm implementation

• minimal or no changes in the existing code
–

CPU-only processing – no changes required
UMat

• Mat=>UMat is the only change needed
• Sometimes, somewhere (HSA) it’s not needed either!
// initialization
VideoCapture vcap(...);
CascadeClassifier fd("haar_ff.xml");
UMat frame, frameGray;
vector<Rect> faces;
for(;;){
// processing loop
vcap >> frame;
cvtColor(frame, frameGray, BGR2GRAY);
equalizeHist(frameGray, frameGray);
fd.detectMultiScale(frameGray, faces, ...);
// draw rectangles …
// show image …
}
Transparent API: under the hood
bool _ocl_cvtColor(InputArray src, OutputArray dst, int code) {
static ocl::ProgramSource oclsrc(“//cvtcolor.cl source coden …”);
UMat src_ocl = src.getUMat(), dst_ocl = dst.getUMat();
if (code == COLOR_BGR2GRAY) {
// get the kernel; kernel is compiled only once and cached
ocl::Kernel kernel(“bgr2gray”, oclsrc, <compile_flags>);
// pass 2 arrays to the kernel and run it
return kernel.args(src, dst).run(0, 0, false);
} else if(code == COLOR_BGR2YUV) { … }
return false;
}
void _cpu_cvtColor(const Mat& src, Mat& dst, int code) { … }
// transparent API dispatcher function
void cvtColor(InputArray src, OutputArray dst, int code) {
dst.create(src.size(), …);
if (useOpenCL(src, dst) && _ocl_cvtColor(src, dst, code)) return;
// getMat() uses zero-copy if available; and with SVM it’s no op
Mat src_cpu = src.getMat();
Mat dst_cpu = dst.getMat();
_cpu_cvtColor(src_cpu, dst_cpu, code);
OpenCV+OpenCL™ execution model
CPU threads

…
cv::ocl::Queue

cv::ocl::Queue

cv::ocl::Device

…

…

cv::ocl::Queue

cv::ocl::Device

• One OpenCL queue and one OpenCL device per CPU thread
• OpenCL kernels are executed asynchronously
• cv::ocl::finish() puts the barrier in the current CPU thread;
.getMat() automatically calls it.
Summary & Future directions
• OpenCL™ is a great tool to boost performance of vision
algorithms; OpenCV unleashes its potential to CV
community
• OpenCV 3.0 transparent API makes it even easier and …
more transparent
• possible directions: pipelines, memory allocation
optimization, more algorithms ported to OpenCL
The first results

Contenu connexe

Tendances

Tendances (20)

PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
 
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
 
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-Pousty
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-PoustyCC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-Pousty
CC-4010, Bringing Spatial Love to your Java Application, by Steven Citron-Pousty
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tb
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
 

Similaire à MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko

"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres..."The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
Edge AI and Vision Alliance
 

Similaire à MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko (20)

OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
AMD_11th_Intl_SoC_Conf_UCI_Irvine
AMD_11th_Intl_SoC_Conf_UCI_IrvineAMD_11th_Intl_SoC_Conf_UCI_Irvine
AMD_11th_Intl_SoC_Conf_UCI_Irvine
 
OpenACC Monthly Highlights
OpenACC Monthly HighlightsOpenACC Monthly Highlights
OpenACC Monthly Highlights
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
 
OpenSCAP Overview(security scanning for docker image and container)
OpenSCAP Overview(security scanning for docker image and container)OpenSCAP Overview(security scanning for docker image and container)
OpenSCAP Overview(security scanning for docker image and container)
 
OpenSCAP Overview(security scanning for docker image and container)
OpenSCAP Overview(security scanning for docker image and container)OpenSCAP Overview(security scanning for docker image and container)
OpenSCAP Overview(security scanning for docker image and container)
 
Distributed tracing in OpenStack
Distributed tracing in OpenStackDistributed tracing in OpenStack
Distributed tracing in OpenStack
 
OpenACC Monthly Highlights June 2017
OpenACC Monthly Highlights June 2017OpenACC Monthly Highlights June 2017
OpenACC Monthly Highlights June 2017
 
OSPRay 1.0 and Beyond
OSPRay 1.0 and BeyondOSPRay 1.0 and Beyond
OSPRay 1.0 and Beyond
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
Scallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsScallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systems
 
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres..."The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
 
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP IntegrationBKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
 
OpenDaylight SDN Controller - Introduction
OpenDaylight SDN Controller - IntroductionOpenDaylight SDN Controller - Introduction
OpenDaylight SDN Controller - Introduction
 
OpenACC Monthly Highlights- December
OpenACC Monthly Highlights- DecemberOpenACC Monthly Highlights- December
OpenACC Monthly Highlights- December
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
Deep Learning and Gene Computing Acceleration with Alluxio in KubernetesDeep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
 
The JVM in the Cloud: OpenJ9 and the traditional HotSpot JVM
The JVM in the Cloud: OpenJ9 and the traditional HotSpot JVMThe JVM in the Cloud: OpenJ9 and the traditional HotSpot JVM
The JVM in the Cloud: OpenJ9 and the traditional HotSpot JVM
 
OpenACC and Hackathons Monthly Highlights
OpenACC and Hackathons Monthly HighlightsOpenACC and Hackathons Monthly Highlights
OpenACC and Hackathons Monthly Highlights
 

Plus de AMD Developer Central

Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
AMD Developer Central
 

Plus de AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko

  • 2.  OpenCV Introduction Andrey Pavlenko, Itseez  Heterogeneous Compute and OpenCV Dr. Harris Gasparakis, AMD  OpenCV 3.0 Vadim Pisarevsky, Itseez
  • 4. Open-source Computer Vision Library 1. 2,500+ algorithms and functions 2. Cross-platform 3. Liberal BSD license 4. High performance 5. Professionally developed 6. 7M+ downloads
  • 5. Functionality overview Image Processing Filters Transformations Edges, contours Robust features Segmentation Video, Stereo, 3D Calibration Pose estimation Optical Flow Detection and recognition Depth
  • 6. Industrial applications • Street View Panorama, etc. (Google) • Vision system of the PR2 robot (Willow Garage) • Robots for Mars exploration (NASA) • Quality control of the production of coins (China)
  • 8. Contribution/patch workflow: see OpenCV wiki OpenCV infrastructure build.opencv.org: buildbot with 50+ builders 50+ builds nightly! github.com/itseez/opencv pullrequest.opencv.org Every patch to OpenCV must pass 7 builders!
  • 9. OpenCV resources 1. Home: opencv.org 2. Docs and tutorials: docs.opencv.org 3. Q&A forum: answers.opencv.org 4. Wiki and issues: code.opencv.org 5. Develop: https://github.com/Itseez/opencv 6. Packages: sourceforge.net/projects/opencvlibrary/
  • 10. OpenCL™ in OpenCV 2.4 • ‘ocl’ is a separate module (cv::ocl::resize()) • runs on various OpenCL-compliant devices and OSes • 2.4.7 release on November 6 – – – – – – – – official Windows bin pack with OpenCL enabled OpenCV pre-commit check includes OpenCL tests 200+ pull requests since 2.4.6 (most actively developed OpenCV part) dynamic OpenCL runtime loading set default OpenCL device via environment variable ~800 optimized kernels, ~30% of most commonly used functionality 8000+ accuracy and ~500 performance tests can be built without OpenCL SDK installed
  • 11. OpenCL™ performance in OpenCV 2.4 AMD A10-6800k (with HD8670D) + Radeon HD7790
  • 12. HETEROGENEOUS COMPUTE AND OPENCV  The OpenCL™ Module in OpenCV  Heterogeneous compute and Computer Vision  Compute paths and data representations  Future roadmap: transparent API 12 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  • 13. OPENCV’S OPENCL™ MODULE  Enables taking advantage of OpenCL™ acceleration, but currently it is an explicit path a developer can choose to call. All OpenCL memory buffer types are supported, but not automatically optimized. ‒ But stay tuned for OpenCV 3.0’s transparent API.  Initial release: OpenCV 2.4.3 [11/2012]  Currently ~800 kernels ‒ Image processing ‒ Pixel-wise operations ‒ Geometric transforms ‒ Pixel transforms: filtering, edges, corners etc ‒ Feature detection and matching ‒ SURF, HOG, Haar, brute matching, kNN. templateMatching ‒ Object recognition ‒ SVM: Support Vector Machine  Applications, including: ‒ Face Detect ‒ Optical flow ‒ Stereo Matching 13 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  • 14. COMPILING FROM SOURCE  OpenCL™ is enabled by default in CMAKE 14 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  • 15. COMPILING FROM SOURCE BROWSE/BUILD CODE IN AN IDE OpenCL™ module (2.4.x). Rebuild it even if you just change a kernel OpenCL kernels. Those are converted to kernels.cpp by a script (hence you need to rebuild if you change a kernel). OpenCL samples. After you build them, go to [ROOT]bin[CONFIG],, observe: ocl-example-*.exe 15 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  • 16. INCORPORATING OPENCV INTO YOUR OWN CODE  APP SDK provides 3 examples. Very easy integration!  With less than 15 lines of code you can have a minimal program that reads video frames, passes them to the OpenCL™ device, and runs your own simple kernel! OpenCV-CL: ‒ takes care of all OpenCL plumbing ‒ Compiles the kernels, and even caches them at runtime, and saves the OpenCL binaries on disk [user can also modify default behavior] ‒ Allows specifying an OpenCL device/platform via environment variable. ‒ Allows plugging your own kernels to OpenCV-CL, using the OpenCV-CL data-structures. 16 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  • 17. INCORPORATING OPENCV INTO YOUR OWN CODE SOME CODE, FROM APP SDK 2.9, GESTURE SAMPLE, SHOWCASING OPENNI® INTEGRATION cv::Mat depthImgClamp = cv::Mat( SIZEY, SIZEX, CV_8UC1, openniBuffer); cv::ocl::oclMat oclDepthImgClamp(depthImgClamp ); In one line, populate an image in GPU! vector<pair<size_t, const void *> > args; args.push_back(make_pair(sizeof(cl_mem), (void *)&src.data)); args.push_back(make_pair(sizeof(cl_mem), (void *)&oclDst.data)); openCLExecuteKernelInterop (oclDst.clCxt, &depthConvertSrcStr, "convertDepthToWorldCoordinates", globalThreads, localThreads, args, -1, -1, "", false, false, true); } In one command, add your own kernel launch, acting on OpenCV-CL data-structures 17 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  • 18. HETEROGENEOUS COMPUTE AND COMPUTER VISION Webcams everywhere Heterogeneous compute everywhere Real time computer vision everywhere  Heterogeneous compute mission: To take optimal advantage of the full capabilities of the underlying platform. ‒ APU / HSA APU ‒ Discrete GPU ‒ CPU ‒ FPGA, DSP, etc. 18 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL Many code paths? - Possibly interleaving execution between different devices Many data representations?
  • 19. DATA REPRESENTATIONS DISCRETE APUS, OPENCL™ 1.2  Copy data to/from GPU  Use “device Memory” for data that is used between GPU kernels  Map/unmap using pinned memory ‒ True for all generations. Special memory that can be read and written fast by GPU. ‒ On APUs: physically part of main memory, possibly with special paths. ‒ But: device memory cannot be read/written very fast from CPU.  Zero copy (map/unmap): best path for data written(read) by CPU(GPU) or vice versa.  Cannot mix and match (bounce back and forth between) CPU and GPU well.  Small kernels are typically a bad idea 19 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  • 20. H1’14: APUS, OPENCL™ 1.2 + HSA extensions OR OPENCL 2.0  Can still use “device Memory” for data that is used between GPU kernels, and zero copy still available.  However: SVM (shared virtual memory) can be written to/read from both CPU and GPU fast “enough” ‒ Enables ping/pong (producer/consumer) between CPU and GPU ‒ Enables concurrent producer/consumer between CPU/GPU (platform atomics) ‒ Much easier to port a vision pipeline using HSA. You can incrementally pick and choose what part of the pipeline to accelerate, and what part to allow the CPU to execute. ‒ On HSA APUs, using SVM is reasonable (and better) than current defaults., significantly simplifying code.  User mode enqueueing: much faster kernel dispatching leads to less performance degradation of small kernels. Can feed the GPU smaller computational tasks fast, and (busy) wait for results on the CPU. 20 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  • 21. COMPUTE PATHS OpenCV 2.4.x: Face detect on CPU // initialization VideoCapture vcap(...); CascadeClassifier fd("haar_ff.xml"); Removed image Mat frame, frameGray; demonstrating face detect vector<Rect> faces; for(;;){ // processing loop vcap >> frame; cvtColor(frame, frameGray, BGR2GRAY); equalizeHist(frameGray, frameGray); fd.detectMultiScale(frameGray, faces, ...); // draw rectangles … // show image … } 21 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  • 22. COMPUTE PATHS OpenCV 2.4.x: Face detect with OpenCL™ // initialization VideoCapture vcap(...); ocl::OclCascadeClassifier fd("haar_ff.xml"); Removed image ocl::oclMat frame, frameGray; demonstrating face detect Mat frameCpu; vector<Rect> faces; for(;;){ // processing loop vcap >> frameCpu; frame = frameCpu; ocl:: cvtColor(frame, frameGray, BGR2GRAY); ocl:: equalizeHist(frameGray, frameGray); ocl:: fd.detectMultiScale(frameGray, faces, ...); // draw rectangles … // show image … 22 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  • 23. FUTURE ROADMAP ‒ Incorporate OpenCL™ 1.2 with HSA extensions, and OpenCL 2.0 ‒ Shared Virtual Memory (SVM) significantly simplifies programming model in general. Allows reusing existing memory as SVM. ‒ In SVM, a “pointer is a pointer” ‒ Pass your tree/linked list/graph data structure in the GPU, have threads explore sub-branches, or explore paths on a graph ‒ Transparent API: ‒ ‒ ‒ ‒ One code path, OpenCV will choose the best execution path at runtime, given the platform. Changes of data locality should be implemented by the framework. Includes applying heuristics appropriate for underlying hardware (dGPU, APU, HSA APU). Eventually it should be self-optimizing ‒ reasonably define optimal memory type “under the hood.” ‒ Detect data flow dependencies, in the pipeline, and automatically represent them as OpenCL events. Starting with OpenCV 3.0. 23 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  • 24. DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners. 24 OPENCV-CL | NOVEMBER 12,2013 | DR. HARRIS GASPARAKIS | CONFIDENTIAL
  • 25. OpenCV 3.0 Vadim Pisarevsky 1. Transparent API 2. UMat 3. Under the hood
  • 26. OpenCV 3.0 • OpenCV 3.0 is scheduled for 2014’Q1 • Based on 2.x, but: – transparent API and more efficient and platform-specific OpenCL™ codepaths (including better zero-copy and SVM support) – API cleanup – a lot of new algorithms
  • 27. Transparent API • same code can run on CPU or GPU – no specialized cv::ocl::Canny vs cv::Canny – no recompilation is needed • includes the following key components: – new data structure UMat (Universal Mat) – – simple and robust mechanism for async processing convenient API for custom algorithm implementation • minimal or no changes in the existing code – CPU-only processing – no changes required
  • 28. UMat • Mat=>UMat is the only change needed • Sometimes, somewhere (HSA) it’s not needed either! // initialization VideoCapture vcap(...); CascadeClassifier fd("haar_ff.xml"); UMat frame, frameGray; vector<Rect> faces; for(;;){ // processing loop vcap >> frame; cvtColor(frame, frameGray, BGR2GRAY); equalizeHist(frameGray, frameGray); fd.detectMultiScale(frameGray, faces, ...); // draw rectangles … // show image … }
  • 29. Transparent API: under the hood bool _ocl_cvtColor(InputArray src, OutputArray dst, int code) { static ocl::ProgramSource oclsrc(“//cvtcolor.cl source coden …”); UMat src_ocl = src.getUMat(), dst_ocl = dst.getUMat(); if (code == COLOR_BGR2GRAY) { // get the kernel; kernel is compiled only once and cached ocl::Kernel kernel(“bgr2gray”, oclsrc, <compile_flags>); // pass 2 arrays to the kernel and run it return kernel.args(src, dst).run(0, 0, false); } else if(code == COLOR_BGR2YUV) { … } return false; } void _cpu_cvtColor(const Mat& src, Mat& dst, int code) { … } // transparent API dispatcher function void cvtColor(InputArray src, OutputArray dst, int code) { dst.create(src.size(), …); if (useOpenCL(src, dst) && _ocl_cvtColor(src, dst, code)) return; // getMat() uses zero-copy if available; and with SVM it’s no op Mat src_cpu = src.getMat(); Mat dst_cpu = dst.getMat(); _cpu_cvtColor(src_cpu, dst_cpu, code);
  • 30. OpenCV+OpenCL™ execution model CPU threads … cv::ocl::Queue cv::ocl::Queue cv::ocl::Device … … cv::ocl::Queue cv::ocl::Device • One OpenCL queue and one OpenCL device per CPU thread • OpenCL kernels are executed asynchronously • cv::ocl::finish() puts the barrier in the current CPU thread; .getMat() automatically calls it.
  • 31. Summary & Future directions • OpenCL™ is a great tool to boost performance of vision algorithms; OpenCV unleashes its potential to CV community • OpenCV 3.0 transparent API makes it even easier and … more transparent • possible directions: pipelines, memory allocation optimization, more algorithms ported to OpenCL