SlideShare a Scribd company logo
1 of 19
Download to read offline
Copyright © 2014 AMD 1
Dr. Harris Gasparakis
5/29/2014
Computer Vision Powered by
Heterogeneous System Architecture
(HSA)
Copyright © 2014 AMD 2
• DEVELOPING EMBEDDED VISION APPLICATIONS: THE PROPRIETARY API LEGACY.
• THE RISE OF GPUS: DENSE DATA PARALLELISM AND CACHE-COHERENT SIMD
• DO WE STILL NEED CPUs?
• THE HETEROGENEOUS FUTURE OF VISION: OPENCL™, HSA
• OPENCL EVOLUTION
• OPENCL 1.X
• OPENCL 2.X AND HSA
• THE OPENCL EXECUTION MODEL
• MONSTERS IN THE ORCHESTRA, CES 2014
• PUTTING OPENCL™ 2.0 TO WORK
• OPENCL™ IN OPENCV
• OPENCV 3.0: THE TRANSPARENT API
• HOW DOES IT WORK?
• CONCLUDING THOUGHTS
AGENDA
Copyright © 2014 AMD 3
• Choose HW and SW platform
– A multitude of devices of different
capabilities and strengths!
– A multitude of algorithms of different
requirements! Data Parallel (Y/N/M?)
• Highly specialized programmers
• Non-portable programs, with high
platform risk
Is Image
Processing/
Vision your
core IP?
Use somebody’s SDK
or end product
No
Yes
This talk is
not for you
Developing Embedded Vision Applications
Copyright © 2014 AMD 4
Sobel
Just SIMD is NOT
good enough for
good GPU
acceleration,
contrary to
popular wisdom
Merge kernels
Split kernels
• “GPU is great for SIMD: Single Instruction Multiple Data”
• Image Processing = Dense Data Parallelism
• Same calculation (e.g. calculate edge strength) for all pixels.
• Adjacent threads load adjacent “enough” data
1. Too simple algorithms (non enough math per memory transfer)
2. Too much complexity per kernel (high register pressure)
The rise of GPUs: Dense Data Parallelism
Copyright © 2014 AMD 5
Features
• Extensive set of CPU libraries
• Several approaches to vision (image understanding) can be thought of as
a “dense to sparse transition”
• Sparsity is not a GPU’s friend.
• OpenCL 2.0 solves this problem much more optimally
(and with less code)
Do we still need CPUs?
Copyright © 2014 AMD 6
CPU
GPU
Audio
Processor
ISP: Image
Signal
Processing
Fixed
Function
Acctr
Encode
Decode
SharedMemory
DSP
Other!
The Heterogeneous Future of Vision
THE RIGHT IP FOR THE RIGHT TASK!
• CPU is great for serial tasks
• Lower latency
• Good branching performance
• Lower throughput
• Good at Task parallelism
• Better for Sparse Data Parallelism
• GPU excels at data parallel problems
• High throughput
• Possibly High latency
• Good at “Dense Data Parallelism”
• Increasingly better at task parallelism
(concurrent kernel execution, and
OpenCL 2.0 Dynamic parallelism)
An efficient Heterogeneous System Architecture would be optimal (e.g. GFLOPS/$/W)
Copyright © 2014 AMD 7
© Copyright 201 HSA Foundation. All Rights Reserved.7
Founders
Promoters
Supporters
Contributors
Academic
HSA FOUNDATION
Copyright © 2014 AMD 8
• OpenCLTM: Khronos Software API
• Cross-platform (Windows, Linux, Mac OS, etc.)
• Multi-vendor (AMD, Apple, IBM, Intel, NVIDIA, etc.),
with maturing support
• Multicore CPU, discrete GPU, integrated GPU (aka
APU), DSP, FPGA, etc.
• HSA: Heterogeneous System Architecture, an industry
standard specification
• OpenCL 2.0 introduces HSA features
• Open Source also helps!
• OpenCV, featuring OpenCL acceleration
Open Standards
Copyright © 2014 AMD 9
OpenCL™ Evolution: Discrete GPU
OpenCL was invented as
an open standards high
level API for GPU
compute, first on discrete
graphics cards
OpenCL abstracts:
• Data management across
multiple memory spaces
• Memory buffers / Images
• Compute Instructions
• “Kernels”
• Execution on “compute
units” CU.
PCIe
™
CU CU CU
CU CU CU CU
CU CU CU CU
GPU device
Memory
GPU
Main memory
Host
Memory
PCIe
Memory
(pinned)
CPU
…
Copyright © 2014 AMD 10
OpenCL™ Evolution: Legacy APU
CU CU
…
CU CU …CU CU CU CU
GPU
 APU: Physical
Integration: CPU
and GPU on
same die
 OpenCL (1.x)
works also on
APUs
• Device memory
is (part of) main
memory, but
still must use
memory
buffers!
Host memory Device Visible
Host Memory
Device memory
Host Visible
Device Memory
Main memory
CPU
…
Copyright © 2014 AMD 11
OpenCL™ Evolution: HSA Enabled APU
 Unified Coherent
Memory enables
data sharing
across all
processors and
GPU compute
units
 OpenCL™ 2.x: No
need to use
memory buffers,
just use data
pointers, just
like you would do
on the CPU.
Unified (Bidirectionally Coherent, pageable) Virtual Memory
CU CU
…
CU CU
…
CU CU CU CU
GPUCPU
Cache Cache
Physical Memory
Copyright © 2014 AMD 12
A 360o x 90o immersive gesture-enabled experience, enabled by OpenCL (and OpenCV)
Monsters in the Orchestra, AMD CES 2014
Copyright © 2014 AMD 13
VISION: Dense to sparse transition
Putting OPENCL™ 2.0 to Work
GPU
keypoints
Data changed by a kernel, can be visible by CPU, before
kernel returns requires (fine grain SVM).
CPU
consumes
keypoints
“as they
come”, and
updates a
“shape
model”
Copyright © 2014 AMD 14
Device
Setup
Compile
Kernels
Allocate
Memory
Further
Processing
Clean up
Other
Tasks…
Host
Memory Transfer
(discrete GPU)
Or Zero Copy (APU
OpenCL 1.2)
Or Shared Virtual Memory
(APU OpenCL 2.0)
Kernel
1
Kernel
2
Kernel
2_1
Kernel
2_2
Kernel
Compute Device
New in OpenCL
2.0: Dynamic
parallelism! A
kernel can
enqueue
another kernel
The OpenCL Execution Model
Copyright © 2014 AMD 15
Open Computer Vision Library
2,500+ algorithms and functions Cross-platform BSD license
High performance Professionally developed 7M+ downloads
• OpenCL is fully integrated in OpenCV
• ~100 most commonly used algorithms optimized with OpenCL
• Can be built without OpenCL SDK installed. Dynamic OpenCL runtime loading
• OpenCL enabled on the official Windows bin pack
• OpenCV pre-commit check includes OpenCL tests
• Very easy to plug in your own kernels using OpenCV plumbing
• In 2.4.x, OpenCL acceleration is a distinct code path
First public
release
2000 2013 ~10/2014
v2.0
C++ API
v2.4.3
OpenCL™
2009
v3.0 alpha
Transparent API
Copyright © 2014 AMD 16
// initialization
VideoCapture vcap(...);
CascadeClassifier
fd("haar_ff.xml");
Mat frame, frameGray;
vector<Rect> faces;
for(;;){
vcap >> frame;
cvtColor(frame, frameGray,
BGR2GRAY);
equalizeHist(frameGray,
frameGray);
fd.detectMultiScale(frameGr
ay, faces);
}
OCV 2.4: Face detect on CPU
// initialization
VideoCapture vcap(...);
ocl::OclCascadeClassifier
fd("haar_ff.xml");
ocl::oclMat frame, frameGray;
Mat frameCpu;
vector<Rect> faces;
for(;;){
vcap >> frameCpu;
frame = frameCpu;
ocl:: cvtColor(frame, frameGray,
BGR2GRAY);
ocl:: equalizeHist(frameGray,
frameGray);
ocl::
fd.detectMultiScale(frameGray,
faces);
}
OCV 2.4: Face detect using OpenCL™
OpenCV 2.4: Similar, but not identical code paths. You will
need to write code explicitly for both CPU and OpenCL
// initialization
VideoCapture vcap(...);
CascadeClassifier
fd("haar_ff.xml");
UMat frame, frameGray;
vector<Rect> faces;
for(;;){
vcap >> frame;
cvtColor(frame, frameGray,
BGR2GRAY);
equalizeHist(frameGray,
frameGray);
fd.detectMultiScale(frameGray
, faces);
}
OCV 3.0: Face detect Anywhere!
This code will run, and configure itself
differently on different platforms!
The Need for a Transparent API
Copyright © 2014 AMD 17
UMat:
UMatData:
Reference counts
Dirty bits
Opaque handles (e.g. clBuffer)
CPU data
GPU data
Handles data synchronization efficiently
Mat:
getMat(…)
getUMat(…)
• Easy transition path from 2.x to 3.x. Code that used to work in 2.x,
should still work. Therefore, cv::Mat is still around.
Both Mat and UMat are views into UMatData, which does the heavy lifting
How does OpenCV 3.0 T-API work?
Copyright © 2014 AMD 18
• OpenCL™ provides a non-proprietary API suitable for image processing
and vision applications, that works well on multiple platforms
• OpenCL 2.0 and HSA enable efficient collaboration between CPU and
GPU cores, on equal footing. An evolution that can only be compared to
the one from single core to multi-core CPUs!
• OpenCV contains lots of OpenCL examples that can be a great starting
point for your own projects.
Join the Open Standards evolution!
Concluding Thoughts
Copyright © 2014 AMD 19
The information presented in this document is for informational purposes only and may contain technical inaccuracies,
omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not
limited to product and roadmap changes, component and motherboard version changes, new model and/or product
releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the
like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right
to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify
any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO
RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN
NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES
ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY
OF SUCH DAMAGES.
ATTRIBUTION
© 2014 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks
of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. OpenCL is a trademark of Apple Inc. used by
permission by Khronos. Other names are for informational purposes only and may be trademarks of their respective owners.
Disclaimer & Attribution

More Related Content

What's hot

Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattAMD Developer Central
 
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...AMD Developer Central
 
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...AMD Developer Central
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
 
MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
MM-4085, Designing a game audio engine for HSA, by Laurent BetbederMM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
MM-4085, Designing a game audio engine for HSA, by Laurent BetbederAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...AMD Developer Central
 
HC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
HC-4018, How to make the most of GPU accessible memory, by Paul BlinzerHC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
HC-4018, How to make the most of GPU accessible memory, by Paul BlinzerAMD Developer Central
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahAMD Developer Central
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorAMD Developer Central
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...AMD Developer Central
 
PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovAMD Developer Central
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...AMD Developer Central
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbr Skip
 
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...AMD Developer Central
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
 

What's hot (20)

Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian Bratt
 
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
 
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
MM-4085, Designing a game audio engine for HSA, by Laurent BetbederMM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 
HC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
HC-4018, How to make the most of GPU accessible memory, by Paul BlinzerHC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
HC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
 
PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry Kozlov
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tb
 
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 

Viewers also liked

MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoAMD Developer Central
 
Audio processing algorithms on the gpu
Audio processing algorithms on the gpu Audio processing algorithms on the gpu
Audio processing algorithms on the gpu Luca Pintavalle
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
Notice writing x beta_114_27.10.09
Notice writing x beta_114_27.10.09Notice writing x beta_114_27.10.09
Notice writing x beta_114_27.10.09avtardhillon
 
Computer architecture and organization
Computer architecture and organizationComputer architecture and organization
Computer architecture and organizationTushar B Kute
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD
 

Viewers also liked (18)

MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
 
Audio processing algorithms on the gpu
Audio processing algorithms on the gpu Audio processing algorithms on the gpu
Audio processing algorithms on the gpu
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Notice writing x beta_114_27.10.09
Notice writing x beta_114_27.10.09Notice writing x beta_114_27.10.09
Notice writing x beta_114_27.10.09
 
Computer architecture and organization
Computer architecture and organizationComputer architecture and organization
Computer architecture and organization
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 

Similar to Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Harris Gasparakis, AMD

ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowDatabricks
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSCeph Community
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Stefano Di Carlo
 
2012-03-15 What's New at Red Hat
2012-03-15 What's New at Red Hat2012-03-15 What's New at Red Hat
2012-03-15 What's New at Red HatShawn Wells
 
Add sale davinci
Add sale davinciAdd sale davinci
Add sale davinciAkash Sahoo
 
XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...
XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...
XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...The Linux Foundation
 
OffensiveCon2022: Case Studies of Fuzzing with Xen
OffensiveCon2022: Case Studies of Fuzzing with XenOffensiveCon2022: Case Studies of Fuzzing with Xen
OffensiveCon2022: Case Studies of Fuzzing with XenTamas K Lengyel
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph Community
 
0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlab0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlabNational Cheng Kung University
 
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...The Linux Foundation
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Intel® Software
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
Power AI introduction
Power AI introductionPower AI introduction
Power AI introductionSnowy Chen
 
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Intel® Software
 
GPU Virtualization in SUSE
GPU Virtualization in SUSEGPU Virtualization in SUSE
GPU Virtualization in SUSELiang Yan
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 

Similar to Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Harris Gasparakis, AMD (20)

AMD It's Time to ROC
AMD It's Time to ROCAMD It's Time to ROC
AMD It's Time to ROC
 
ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlow
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
 
2012-03-15 What's New at Red Hat
2012-03-15 What's New at Red Hat2012-03-15 What's New at Red Hat
2012-03-15 What's New at Red Hat
 
Add sale davinci
Add sale davinciAdd sale davinci
Add sale davinci
 
XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...
XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...
XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...
 
OffensiveCon2022: Case Studies of Fuzzing with Xen
OffensiveCon2022: Case Studies of Fuzzing with XenOffensiveCon2022: Case Studies of Fuzzing with Xen
OffensiveCon2022: Case Studies of Fuzzing with Xen
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
 
0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlab0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlab
 
Fuzzing_with_Xen.pdf
Fuzzing_with_Xen.pdfFuzzing_with_Xen.pdf
Fuzzing_with_Xen.pdf
 
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Power AI introduction
Power AI introductionPower AI introduction
Power AI introduction
 
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
 
GPU Virtualization in SUSE
GPU Virtualization in SUSEGPU Virtualization in SUSE
GPU Virtualization in SUSE
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 

More from AMD Developer Central

RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14AMD Developer Central
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14AMD Developer Central
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...AMD Developer Central
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
 
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...AMD Developer Central
 

More from AMD Developer Central (9)

RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Harris Gasparakis, AMD

  • 1. Copyright © 2014 AMD 1 Dr. Harris Gasparakis 5/29/2014 Computer Vision Powered by Heterogeneous System Architecture (HSA)
  • 2. Copyright © 2014 AMD 2 • DEVELOPING EMBEDDED VISION APPLICATIONS: THE PROPRIETARY API LEGACY. • THE RISE OF GPUS: DENSE DATA PARALLELISM AND CACHE-COHERENT SIMD • DO WE STILL NEED CPUs? • THE HETEROGENEOUS FUTURE OF VISION: OPENCL™, HSA • OPENCL EVOLUTION • OPENCL 1.X • OPENCL 2.X AND HSA • THE OPENCL EXECUTION MODEL • MONSTERS IN THE ORCHESTRA, CES 2014 • PUTTING OPENCL™ 2.0 TO WORK • OPENCL™ IN OPENCV • OPENCV 3.0: THE TRANSPARENT API • HOW DOES IT WORK? • CONCLUDING THOUGHTS AGENDA
  • 3. Copyright © 2014 AMD 3 • Choose HW and SW platform – A multitude of devices of different capabilities and strengths! – A multitude of algorithms of different requirements! Data Parallel (Y/N/M?) • Highly specialized programmers • Non-portable programs, with high platform risk Is Image Processing/ Vision your core IP? Use somebody’s SDK or end product No Yes This talk is not for you Developing Embedded Vision Applications
  • 4. Copyright © 2014 AMD 4 Sobel Just SIMD is NOT good enough for good GPU acceleration, contrary to popular wisdom Merge kernels Split kernels • “GPU is great for SIMD: Single Instruction Multiple Data” • Image Processing = Dense Data Parallelism • Same calculation (e.g. calculate edge strength) for all pixels. • Adjacent threads load adjacent “enough” data 1. Too simple algorithms (non enough math per memory transfer) 2. Too much complexity per kernel (high register pressure) The rise of GPUs: Dense Data Parallelism
  • 5. Copyright © 2014 AMD 5 Features • Extensive set of CPU libraries • Several approaches to vision (image understanding) can be thought of as a “dense to sparse transition” • Sparsity is not a GPU’s friend. • OpenCL 2.0 solves this problem much more optimally (and with less code) Do we still need CPUs?
  • 6. Copyright © 2014 AMD 6 CPU GPU Audio Processor ISP: Image Signal Processing Fixed Function Acctr Encode Decode SharedMemory DSP Other! The Heterogeneous Future of Vision THE RIGHT IP FOR THE RIGHT TASK! • CPU is great for serial tasks • Lower latency • Good branching performance • Lower throughput • Good at Task parallelism • Better for Sparse Data Parallelism • GPU excels at data parallel problems • High throughput • Possibly High latency • Good at “Dense Data Parallelism” • Increasingly better at task parallelism (concurrent kernel execution, and OpenCL 2.0 Dynamic parallelism) An efficient Heterogeneous System Architecture would be optimal (e.g. GFLOPS/$/W)
  • 7. Copyright © 2014 AMD 7 © Copyright 201 HSA Foundation. All Rights Reserved.7 Founders Promoters Supporters Contributors Academic HSA FOUNDATION
  • 8. Copyright © 2014 AMD 8 • OpenCLTM: Khronos Software API • Cross-platform (Windows, Linux, Mac OS, etc.) • Multi-vendor (AMD, Apple, IBM, Intel, NVIDIA, etc.), with maturing support • Multicore CPU, discrete GPU, integrated GPU (aka APU), DSP, FPGA, etc. • HSA: Heterogeneous System Architecture, an industry standard specification • OpenCL 2.0 introduces HSA features • Open Source also helps! • OpenCV, featuring OpenCL acceleration Open Standards
  • 9. Copyright © 2014 AMD 9 OpenCL™ Evolution: Discrete GPU OpenCL was invented as an open standards high level API for GPU compute, first on discrete graphics cards OpenCL abstracts: • Data management across multiple memory spaces • Memory buffers / Images • Compute Instructions • “Kernels” • Execution on “compute units” CU. PCIe ™ CU CU CU CU CU CU CU CU CU CU CU GPU device Memory GPU Main memory Host Memory PCIe Memory (pinned) CPU …
  • 10. Copyright © 2014 AMD 10 OpenCL™ Evolution: Legacy APU CU CU … CU CU …CU CU CU CU GPU  APU: Physical Integration: CPU and GPU on same die  OpenCL (1.x) works also on APUs • Device memory is (part of) main memory, but still must use memory buffers! Host memory Device Visible Host Memory Device memory Host Visible Device Memory Main memory CPU …
  • 11. Copyright © 2014 AMD 11 OpenCL™ Evolution: HSA Enabled APU  Unified Coherent Memory enables data sharing across all processors and GPU compute units  OpenCL™ 2.x: No need to use memory buffers, just use data pointers, just like you would do on the CPU. Unified (Bidirectionally Coherent, pageable) Virtual Memory CU CU … CU CU … CU CU CU CU GPUCPU Cache Cache Physical Memory
  • 12. Copyright © 2014 AMD 12 A 360o x 90o immersive gesture-enabled experience, enabled by OpenCL (and OpenCV) Monsters in the Orchestra, AMD CES 2014
  • 13. Copyright © 2014 AMD 13 VISION: Dense to sparse transition Putting OPENCL™ 2.0 to Work GPU keypoints Data changed by a kernel, can be visible by CPU, before kernel returns requires (fine grain SVM). CPU consumes keypoints “as they come”, and updates a “shape model”
  • 14. Copyright © 2014 AMD 14 Device Setup Compile Kernels Allocate Memory Further Processing Clean up Other Tasks… Host Memory Transfer (discrete GPU) Or Zero Copy (APU OpenCL 1.2) Or Shared Virtual Memory (APU OpenCL 2.0) Kernel 1 Kernel 2 Kernel 2_1 Kernel 2_2 Kernel Compute Device New in OpenCL 2.0: Dynamic parallelism! A kernel can enqueue another kernel The OpenCL Execution Model
  • 15. Copyright © 2014 AMD 15 Open Computer Vision Library 2,500+ algorithms and functions Cross-platform BSD license High performance Professionally developed 7M+ downloads • OpenCL is fully integrated in OpenCV • ~100 most commonly used algorithms optimized with OpenCL • Can be built without OpenCL SDK installed. Dynamic OpenCL runtime loading • OpenCL enabled on the official Windows bin pack • OpenCV pre-commit check includes OpenCL tests • Very easy to plug in your own kernels using OpenCV plumbing • In 2.4.x, OpenCL acceleration is a distinct code path First public release 2000 2013 ~10/2014 v2.0 C++ API v2.4.3 OpenCL™ 2009 v3.0 alpha Transparent API
  • 16. Copyright © 2014 AMD 16 // initialization VideoCapture vcap(...); CascadeClassifier fd("haar_ff.xml"); Mat frame, frameGray; vector<Rect> faces; for(;;){ vcap >> frame; cvtColor(frame, frameGray, BGR2GRAY); equalizeHist(frameGray, frameGray); fd.detectMultiScale(frameGr ay, faces); } OCV 2.4: Face detect on CPU // initialization VideoCapture vcap(...); ocl::OclCascadeClassifier fd("haar_ff.xml"); ocl::oclMat frame, frameGray; Mat frameCpu; vector<Rect> faces; for(;;){ vcap >> frameCpu; frame = frameCpu; ocl:: cvtColor(frame, frameGray, BGR2GRAY); ocl:: equalizeHist(frameGray, frameGray); ocl:: fd.detectMultiScale(frameGray, faces); } OCV 2.4: Face detect using OpenCL™ OpenCV 2.4: Similar, but not identical code paths. You will need to write code explicitly for both CPU and OpenCL // initialization VideoCapture vcap(...); CascadeClassifier fd("haar_ff.xml"); UMat frame, frameGray; vector<Rect> faces; for(;;){ vcap >> frame; cvtColor(frame, frameGray, BGR2GRAY); equalizeHist(frameGray, frameGray); fd.detectMultiScale(frameGray , faces); } OCV 3.0: Face detect Anywhere! This code will run, and configure itself differently on different platforms! The Need for a Transparent API
  • 17. Copyright © 2014 AMD 17 UMat: UMatData: Reference counts Dirty bits Opaque handles (e.g. clBuffer) CPU data GPU data Handles data synchronization efficiently Mat: getMat(…) getUMat(…) • Easy transition path from 2.x to 3.x. Code that used to work in 2.x, should still work. Therefore, cv::Mat is still around. Both Mat and UMat are views into UMatData, which does the heavy lifting How does OpenCV 3.0 T-API work?
  • 18. Copyright © 2014 AMD 18 • OpenCL™ provides a non-proprietary API suitable for image processing and vision applications, that works well on multiple platforms • OpenCL 2.0 and HSA enable efficient collaboration between CPU and GPU cores, on equal footing. An evolution that can only be compared to the one from single core to multi-core CPUs! • OpenCV contains lots of OpenCL examples that can be a great starting point for your own projects. Join the Open Standards evolution! Concluding Thoughts
  • 19. Copyright © 2014 AMD 19 The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2014 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. OpenCL is a trademark of Apple Inc. used by permission by Khronos. Other names are for informational purposes only and may be trademarks of their respective owners. Disclaimer & Attribution