SlideShare une entreprise Scribd logo
1  sur  34
The open standards enabling vision processing in ADAS
Andrew Richards, CEO, Codeplay
AutoSens September 2016
© 2016 Codeplay Software Ltd.2
How do
we get
from
here…
… to
here?
Level 1
•Adaptive
•Assist
Level 2
•Execute
•Automated
manoeuvres
Level 3
•Limited overall
control
Level 4
•Deep self
control
•All conditions
Level 5
•Autonomous
•Stages from
very local to
extensive
journeys
We have SAE Levels to climb
Level 0
•Warnings
© 2016 Codeplay Software Ltd.3
We have a mountain to climb
How do
we get to
the top?
When we
don’t know
what the
top looks
like...
… and we
want to get
there in safe,
manageable,
affordable
steps…
… without
getting lost
on our
own…
… or
climbing the
wrong
mountain
© 2016 Codeplay Software Ltd.4
This presentation will focus on:
• The hardware and software platforms that will be able to
deliver the results
• The software tools to build up the solutions for those
platforms
• The open standards that will enable solutions to interoperate
• How we build the platforms and tools
© 2016 Codeplay Software Ltd.5
Where do we need to go?
“On a 100 millimetre-squared chip, Google needs something
like 50 teraflops of performance”
- Daniel Rosenband (Google’s self-driving car
project) at HotChips 2016
© 2016 Codeplay Software Ltd.6
1
1
2
4
8
16
32
64
128
256
512
1,024
2,048
4,096
8,192
16,384
32,768
65,536
Google target
Desktop GPU
Integrated GPU
Smartphone GPU
Smartphone CPU
Desktop CPU
Performance trends
GFLOPS
Year of introduction
These trend lines
seem to violate
the rules of
physics…
© 2016 Codeplay Software Ltd.7
What will our platform be?
•Will we build self-driving
cars on graphics
processors designed for
videogames?
Or, will we take the successful
design decisions from GPUs
and apply them to processors
designed specially for
autonomous driving?
Build on
GPUs
Special-
purpose
vision
processors
© 2016 Codeplay Software Ltd.8
How do we get there from here?
1.We need to write
software today
for platforms that
cannot be built yet
We need to start with
simpler systems
that are not fully
autonomous
We need to validate
the systems as safe
© 2016 Codeplay Software Ltd.9
Two models of software development
Design a
model
Validate
model
Select
platform
Implement
on
platform
Write software
Select platform
Optimize for
platform
Validate whole
platform
Design next
version
Which method can get us all the way to autonomous vehicles?
© 2016 Codeplay Software Ltd.10
The different levels of programming model
Device-specific
programming
•Assembly
language
•VHDL
•Device-specific C-
like programming
models
Higher-level
language
enabler
•NVIDIA PTX
•HSA
•OpenCL SPIR
•SPIR-V
C level
programming
•OpenCL C
•DSP C
•MCAPI/MTAPI
C++ level
programming
•SYCL
•CUDA
•HCC
•C++ AMP
Graph
programming
•OpenCV
•OpenVX
•Halide
•VisionCpp
•TensorFlow
•Caffe
© 2016 Codeplay Software Ltd.11
Device-specific programming
Not a route to full
autonomy
Can deliver quick
results today
Can…
hand-optimize
directly for the
device
Cannot …
develop software
today for future
platforms
© 2016 Codeplay Software Ltd.12
The route to full autonomy
• Graph programming
• This is the most widely-adopted approach to machine vision and machine
learning
• Open standards
• This lets you develop today for future architectures
© 2016 Codeplay Software Ltd.13
Why graph programming?
When you scale the
number of cores:
• You don’t scale the
number of memory ports
• Your compute
performance increases
• But your off-chip memory
bandwidth does not
Therefore:
• You need to
reduce off-chip
memory
bandwidth by
processing
everything on-
chip
• This is achieved by
tiling
However,
writing tiled
image pipelines
is hard
If we build up a graph of operations (e.g. convolutions) and
then have a runtime system split into fused tiled operations
across an entire system-on-chip, we get great performance
© 2016 Codeplay Software Ltd.14
Graph programming: some numbers
0
5
10
15
20
25
30
35
40
45
OpenCV Halide SYCL
Optimization across the whole graph
Kernel (ms) Overhead (ms)
Without fusion, each operation takes roughly the
same amount of time on the accelerator (an AMD
APU in this case) but the overhead varies a little
OpenCV does not fuse in this case, but Halide and SYCL
do. The fused kernels are significantly faster than non-
fused when using C++ programming to achieve fusion.
0
20
40
60
80
100
OpenCV Halide SYCL
Graph execution of individual nodes
Channel Masking Kernel(ms) Channel Masking Overhead (ms)
HSV to RGB Kernel(ms) HSV to RGB Overhead (ms)
RGB to HSV Kernel(ms) RGB to HSV Overhead (ms)
© 2016 Codeplay Software Ltd.15
Graph programming: some numbers
0
10
20
30
40
50
60
70
80
90
100
OpenCV (nodes) OpenCV (graph) Halide (nodes) Halide (graph) SYCL (nodes) SYCL (graph)
Effect of combining graph nodes on performance
Kernel time (ms) Overhead time (ms)
In this example,
we perform 3
image processing
operations on an
accelerator and
compare 3
systems when
executing
individual nodes,
or a whole graph
Halide and SYCL
use kernel fusion,
whereas OpenCV
does not. For all 3
systems, the
performance of
the whole graph is
significantly better
than individual
nodes executed
on their own
The system is an AMD
APU and the
operations are: RGB-
>HSV, channel
masking, HSV->RGB
© 2016 Codeplay Software Ltd.16
Graph programming
• For both machine vision algorithms and machine learning,
graph programming is the most widely-adopted approach
• Two styles of graph programming that we commonly see:
C-style graph
programming
• OpenVX
• OpenCV
C++ style graph
programming
• Halide
• RapidMind
• Eigen (also in TensorFlow)
• VisionCpp
© 2016 Codeplay Software Ltd.17
C style graph programming
OpenVX: open standard
• Can be implemented by vendors
• Create a graph with C API, then map to an
entire SoC
OpenCV: open source
• Implemented on OpenCL
• Implemented on device-specific accelerators
• Create a graph with C API, then execute
© 2016 Codeplay Software Ltd.18
& Device-Specific Programming
How do we adapt it
for all the graph
nodes we need?
Runtime systems
can automatically
optimize the graphs
Can …
develop software
today for future
platforms
What happens if we
invent our own
graph nodes?
© 2016 Codeplay Software Ltd.19
C++ style graph programming
Examples in machine
vision/machine learning
• Halide
• RapidMind
• Eigen (also in
TensorFlow)
• VisionCpp
C++ compilers that
support this style
• CUDA
• C++ OpenMP
• C++ 17 Parallel STL
• SYCL
© 2016 Codeplay Software Ltd.20
C++ single-source programming
• C++ lets us build up graphs at compile-time
• This means we can map a graph to the processors offline
• C++ lets us write custom nodes ourselves
• This approach is called a C++ Embedded Domain-Specific
Language
• Very widely used, eg Eigen, Boost, TensorFlow, RapidMind,
Halide
© 2016 Codeplay Software Ltd.21
Combining: open
standards, C++ and
graph programming
SYCL combines C++ single-source
with OpenCL acceleration
OpenCL lets us
run on a very
wide range of
accelerators now
and in the future
Single-source is
most widely-
adopted machine
learning
programming
model
C++ single source
lets us create
customizable
graph models
© 2016 Codeplay Software Ltd.22
Putting it all together: Building it
© 2016 Codeplay Software Ltd.23
C++
0: #include <visioncpp.hpp>
1: int main() {
2: auto in= cv::imread(“input.jpg”);
3:
4: auto a = Node<sRGB, 512, 512,Host>(in.data));
5: auto b = Node<sRGB2lRGB>(a);
6: auto c = Node<lRGB2lHSV>(b);
7: auto d = Node<Constant>(0.1);
8: auto e = Node<lHSV2Scale>(c , d);
9: auto f = Node<lHSV2lRGB>(e);
10: auto g = Node<sRGB2lRGB>(f);
11: auto h = execute<fuse> (g );
12: auto ptr = h.get_data();
13: auto output = cv::Mat(512 , 512 , CV_8UC3 , ptr.get());
14: cv::imshow (“Display Image” , output);
15: return 0;
16: }
SYCL
0: #include <visioncpp.hpp>
1: int main() {
2: auto in= cv::imread(“input.jpg”);
3: auto q =get_queue<gpu_selector>();
4: auto a = Node<sRGB, 512, 512,Image>(in.data));
5: auto b = Node<sRGB2lRGB>(a);
6: auto c = Node<lRGB2lHSV>(b);
7: auto d = Node<Constant>(0.1);
8: auto e = Node<lHSV2Scale>(c , d);
9: auto f = Node<lHSV2lRGB>(e);
10: auto g = Node<sRGB2lRGB>(f);
11: auto h = execute<fuse> (g , q);
12: auto ptr = h.get_data();
13: auto output = cv::Mat(512 , 512 , CV_8UC3 , ptr.get());
14: cv::imshow (“Display Image” , output);
15: return 0;
16: }
out
lRGB
2
sRGB
lHSV
2
lRGB
lHSV
2
Scale
lRGB
2
lHSV Coef
sRGB
2
lRGB
in
h
a
b
e
d
f
g
c
Leaf Type Leaf Type
Queue
Queue No
Queue
No
Queue
© 2016 Codeplay Software Ltd.24
SYCL
Backend structure
OpenMP
1: template <typename Expr, typename… Acc>
void sycl (handler& cgh, Expr expr, Acc… acc) {
// sycl accessor for accessing data on device
2: auto outPtr = expr.out-> template get_accessor<write>(cgh) ;
// sycl range representing valid range of accessing data
3: auto rng = range < 2 > (Expr::Rows , Expr::Cols) ;
// sycl parallel for for parallelisng execution across the range
4: cgh.parallel_for<Type>(rng), [=](item<2> itemID) {
// rebuilding accessor tuple on the device
5: auto tuple = make_tuple (acc) ;
// calling the eval function for each pixel
6: outPtr[itemID] = expr.eval ( itemID, tuple );
7: });
8: }
1: template <typename Expr, typename... Acc>
void cpp(Expr expr, Acc.. acc) {
// output pinter for accessing data on host
2: auto outPtr = expr.out->get();
// valid range for accessing data on host
3: auto rng = range (Expr::Rows , Expr::Cols );
// rebuilding the tuple of input pointer on host
4: auto tuple = make_tuple (acc) ;
// OpenMP directive for parallelising for loop
5: #pragma omp parallel for
6: for(size_t i=0; i< rng.rows; i++)
7: for(size_t j=0; j< rng.cols; j++)
// calling the eval function for each pixel
8: outPtr[indx] = expr.eval (index (i , j), tuple );
9: };
Accessor
Pointer
C++/OpenMPParallel
for
© 2016 Codeplay Software Ltd.25
Higher level programming enablers
NVIDIA PTX
•NVIDIA CUDA-only
HSA
• Royalty-free open
standard
• HSAIL is the IR
• Provides a single
address space, with
virtual memory
• Low-latency
communication
OpenCL SPIR
• Defined for OpenCL
v1.2
• Based on
Clang/LLVM (the
open-source
compiler)
SPIR-V
• Open standard
• Defined by Khronos
• Supports compute
and graphics
(OpenCL, Vulkan and
OpenGL)
• Not tied to any
compiler
Open standard intermediate representations enable
tools to be built on top and support a wide range of
platforms
© 2016 Codeplay Software Ltd.26
HSA
• One of the big problems of offloading to accelerators is
the high cost of offload:
• Moving data to the accelerator
• May need to translate addresses and pointer sizes between CPU and
accelerator
• Going into OS kernel-level driver to start work or synchronize
• High cost of compilation, especially for JIT languages
• In a multi-accelerator system, this may get even more costly
• HSA solves this with user-mode-queueing, user-mode
synchronization, and shared virtual memory
• It also provides open-source software to help implement HSA
• And, we’re even going further and working towards
advanced tools standardization, such as profiling and
debugging
CPU GPU DSP
CPU-
optimized
code
GPU-
optimized
code
DSP-
optimized
code
HSAIL
HSAIL
HSAIL
The HSA Runtime gives very low-
latency communication
© 2016 Codeplay Software Ltd.27
Which model should we choose?
Device-specific
programming
•Assembly
language
•VHDL
•Device-specific C-
like programming
models
Higher-level
language
enabler
•NVIDIA PTX
•HSA
•OpenCL SPIR
•SPIR-V
C level
programming
•OpenCL C
•DSP C
•MCAPI/MTAPI
C++ level
programming
•SYCL
•CUDA
•HCC
•C++ AMP
Graph
programming
•OpenCV
•OpenVX
•Halide
•VisionCpp
•TensorFlow
•Caffe
© 2016 Codeplay Software Ltd.28
They are not alternatives, they are layers
Device-specific programming
Assembly language VHDL Device-specific C-like programming models
Higher-level language enabler
NVIDIA PTX HSA OpenCL SPIR SPIR-V
C/C++ level programming
SYCL CUDA HCC C++ AMP OpenCL
Graph programming
OpenCV OpenVX Halide VisionCpp TensorFlow Caffe
© 2016 Codeplay Software Ltd.29
Can specify, test and validate each layer
Device-specific programming
Device-specific specification Device-specific testing and validation
Higher-level language enabler
SPIR/SPIR-V/HSAIL specs Conformance testsuites
C/C++ level programming
OpenCL/SYCL specs Clsmith testsuite Conformance testsuites Wide range of other testsuites
Graph programming
Validate graph models Validate the code using standard tools
© 2016 Codeplay Software Ltd.30
For Codeplay, these are our layer choices
Device-
specific
programming
• LLVM
Higher-level
language
enabler
• OpenCL SPIR
C/C++ level
programming
• SYCL
Graph
programming
• TensorFlow
• OpenCV
We have chosen a layer of standards,
based on current market adoption
• TensorFlow and OpenCV
• SYCL
• OpenCL (with SPIR)
• LLVM as the standard compiler back-end
The actual choice of standards may change
based on market dynamics, but by choosing
widely adopted standards and a layering
approach, it is easy to adapt
© 2016 Codeplay Software Ltd.31
For Codeplay, these are our products
Device-
specific
programming
• LLVM
Higher-level
language
enabler
• OpenCL SPIR
C/C++ level
programming
• SYCL
Graph
programming
• TensorFlow
• OpenCV
© 2016 Codeplay Software Ltd.32
Codeplay
•Standards bodies
•HSA Foundation: Chair of software
group, spec editor of runtime and
debugging
•Khronos: chair & spec editor of SYCL.
Contributors to OpenCL, Safety
Critical, Vulkan
•ISO C++: Chair of Low Latency,
Embedded WG; Editor of SG1
Concurrency TS
•EEMBC: members
Research
•Members of EU research
consortiums: PEPPHER, LPGPU,
LPGPU2, CARP
•Sponsorship of PhDs and EngDs for
heterogeneous programming: HSA,
FPGAs, ray-tracing
•Collaborations with academics
•Members of HiPEAC
Open source
•HSA LLDB Debugger
•SPIR-V tools
•RenderScript debugger in AOSP
•LLDB for Qualcomm Hexagon
•TensorFlow for OpenCL
•C++ 17 Parallel STL for SYCL
•VisionCpp: C++ performance-
portable programming model for
vision
Presentations
•Building an LLVM back-end
•Creating an SPMD Vectorizer for
OpenCL with LLVM
•Challenges of Mixed-Width Vector
Code Gen & Scheduling in LLVM
•C++ on Accelerators: Supporting
Single-Source SYCL and HSA
•LLDB Tutorial: Adding debugger
support for your target
Company
•Based in Edinburgh, Scotland
•57 staff, mostly engineering
•License and customize technologies
for semiconductor companies
•ComputeAorta and ComputeCpp:
implementations of OpenCL, Vulkan
and SYCL
•15+ years of experience in
heterogeneous systems tools
Codeplay build the software platforms that deliver massive performance
© 2016 Codeplay Software Ltd.33
Further information
• OpenCL https://www.khronos.org/opencl/
• OpenVX https://www.khronos.org/openvx/
• HSA http://www.hsafoundation.com/
• SYCL http://sycl.tech
• OpenCV http://opencv.org/
• Halide http://halide-lang.org/
• VisionCpp https://github.com/codeplaysoftware/visioncpp
/codeplaysoft@codeplaysoft codeplay.com
Questions ?
My contact details

Contenu connexe

Similaire à Open Standards for ADAS: Andrew Richards, Codeplay, at AutoSens 2016

Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...CEE-SEC(R)
 
Codeplay Software - Open Standards for Automotive Vision Processing & Machine...
Codeplay Software - Open Standards for Automotive Vision Processing & Machine...Codeplay Software - Open Standards for Automotive Vision Processing & Machine...
Codeplay Software - Open Standards for Automotive Vision Processing & Machine...Charles Macfarlane
 
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...Embarcados
 
Mesa and Its Debugging, Вадим Шовкопляс
Mesa and Its Debugging, Вадим ШовкоплясMesa and Its Debugging, Вадим Шовкопляс
Mesa and Its Debugging, Вадим ШовкоплясSigma Software
 
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen..."Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...Edge AI and Vision Alliance
 
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfJIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfSamiraKids
 
Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer George Markomanolis
 
Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCVAutomatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCVEditor IJCATR
 
Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCV Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCV Editor IJCATR
 
"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from IntelEdge AI and Vision Alliance
 
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres..."The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...Edge AI and Vision Alliance
 
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...Edge AI and Vision Alliance
 
C++ on the Web: Run your big 3D game in the browser
C++ on the Web: Run your big 3D game in the browserC++ on the Web: Run your big 3D game in the browser
C++ on the Web: Run your big 3D game in the browserAndre Weissflog
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentDavid Galeano
 
Freedreno on Android – XDC 2023
Freedreno on Android          – XDC 2023Freedreno on Android          – XDC 2023
Freedreno on Android – XDC 2023Igalia
 
OpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CADOpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CADDesign World
 
Clojure: Programming self-optimizing webapps in Lisp
Clojure: Programming self-optimizing webapps in LispClojure: Programming self-optimizing webapps in Lisp
Clojure: Programming self-optimizing webapps in LispStefan Richter
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on SparkMathieu Dumoulin
 

Similaire à Open Standards for ADAS: Andrew Richards, Codeplay, at AutoSens 2016 (20)

Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
 
Codeplay Software - Open Standards for Automotive Vision Processing & Machine...
Codeplay Software - Open Standards for Automotive Vision Processing & Machine...Codeplay Software - Open Standards for Automotive Vision Processing & Machine...
Codeplay Software - Open Standards for Automotive Vision Processing & Machine...
 
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
 
Mesa and Its Debugging, Вадим Шовкопляс
Mesa and Its Debugging, Вадим ШовкоплясMesa and Its Debugging, Вадим Шовкопляс
Mesa and Its Debugging, Вадим Шовкопляс
 
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen..."Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
 
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfJIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
 
Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer
 
Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCVAutomatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCV
 
Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCV Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCV
 
"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel
 
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres..."The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
 
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
 
C++ on the Web: Run your big 3D game in the browser
C++ on the Web: Run your big 3D game in the browserC++ on the Web: Run your big 3D game in the browser
C++ on the Web: Run your big 3D game in the browser
 
Mesa and Its Debugging
Mesa and Its DebuggingMesa and Its Debugging
Mesa and Its Debugging
 
Graphics Libraries
Graphics LibrariesGraphics Libraries
Graphics Libraries
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game development
 
Freedreno on Android – XDC 2023
Freedreno on Android          – XDC 2023Freedreno on Android          – XDC 2023
Freedreno on Android – XDC 2023
 
OpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CADOpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CAD
 
Clojure: Programming self-optimizing webapps in Lisp
Clojure: Programming self-optimizing webapps in LispClojure: Programming self-optimizing webapps in Lisp
Clojure: Programming self-optimizing webapps in Lisp
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on Spark
 

Dernier

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Open Standards for ADAS: Andrew Richards, Codeplay, at AutoSens 2016

  • 1. The open standards enabling vision processing in ADAS Andrew Richards, CEO, Codeplay AutoSens September 2016
  • 2. © 2016 Codeplay Software Ltd.2 How do we get from here… … to here? Level 1 •Adaptive •Assist Level 2 •Execute •Automated manoeuvres Level 3 •Limited overall control Level 4 •Deep self control •All conditions Level 5 •Autonomous •Stages from very local to extensive journeys We have SAE Levels to climb Level 0 •Warnings
  • 3. © 2016 Codeplay Software Ltd.3 We have a mountain to climb How do we get to the top? When we don’t know what the top looks like... … and we want to get there in safe, manageable, affordable steps… … without getting lost on our own… … or climbing the wrong mountain
  • 4. © 2016 Codeplay Software Ltd.4 This presentation will focus on: • The hardware and software platforms that will be able to deliver the results • The software tools to build up the solutions for those platforms • The open standards that will enable solutions to interoperate • How we build the platforms and tools
  • 5. © 2016 Codeplay Software Ltd.5 Where do we need to go? “On a 100 millimetre-squared chip, Google needs something like 50 teraflops of performance” - Daniel Rosenband (Google’s self-driving car project) at HotChips 2016
  • 6. © 2016 Codeplay Software Ltd.6 1 1 2 4 8 16 32 64 128 256 512 1,024 2,048 4,096 8,192 16,384 32,768 65,536 Google target Desktop GPU Integrated GPU Smartphone GPU Smartphone CPU Desktop CPU Performance trends GFLOPS Year of introduction These trend lines seem to violate the rules of physics…
  • 7. © 2016 Codeplay Software Ltd.7 What will our platform be? •Will we build self-driving cars on graphics processors designed for videogames? Or, will we take the successful design decisions from GPUs and apply them to processors designed specially for autonomous driving? Build on GPUs Special- purpose vision processors
  • 8. © 2016 Codeplay Software Ltd.8 How do we get there from here? 1.We need to write software today for platforms that cannot be built yet We need to start with simpler systems that are not fully autonomous We need to validate the systems as safe
  • 9. © 2016 Codeplay Software Ltd.9 Two models of software development Design a model Validate model Select platform Implement on platform Write software Select platform Optimize for platform Validate whole platform Design next version Which method can get us all the way to autonomous vehicles?
  • 10. © 2016 Codeplay Software Ltd.10 The different levels of programming model Device-specific programming •Assembly language •VHDL •Device-specific C- like programming models Higher-level language enabler •NVIDIA PTX •HSA •OpenCL SPIR •SPIR-V C level programming •OpenCL C •DSP C •MCAPI/MTAPI C++ level programming •SYCL •CUDA •HCC •C++ AMP Graph programming •OpenCV •OpenVX •Halide •VisionCpp •TensorFlow •Caffe
  • 11. © 2016 Codeplay Software Ltd.11 Device-specific programming Not a route to full autonomy Can deliver quick results today Can… hand-optimize directly for the device Cannot … develop software today for future platforms
  • 12. © 2016 Codeplay Software Ltd.12 The route to full autonomy • Graph programming • This is the most widely-adopted approach to machine vision and machine learning • Open standards • This lets you develop today for future architectures
  • 13. © 2016 Codeplay Software Ltd.13 Why graph programming? When you scale the number of cores: • You don’t scale the number of memory ports • Your compute performance increases • But your off-chip memory bandwidth does not Therefore: • You need to reduce off-chip memory bandwidth by processing everything on- chip • This is achieved by tiling However, writing tiled image pipelines is hard If we build up a graph of operations (e.g. convolutions) and then have a runtime system split into fused tiled operations across an entire system-on-chip, we get great performance
  • 14. © 2016 Codeplay Software Ltd.14 Graph programming: some numbers 0 5 10 15 20 25 30 35 40 45 OpenCV Halide SYCL Optimization across the whole graph Kernel (ms) Overhead (ms) Without fusion, each operation takes roughly the same amount of time on the accelerator (an AMD APU in this case) but the overhead varies a little OpenCV does not fuse in this case, but Halide and SYCL do. The fused kernels are significantly faster than non- fused when using C++ programming to achieve fusion. 0 20 40 60 80 100 OpenCV Halide SYCL Graph execution of individual nodes Channel Masking Kernel(ms) Channel Masking Overhead (ms) HSV to RGB Kernel(ms) HSV to RGB Overhead (ms) RGB to HSV Kernel(ms) RGB to HSV Overhead (ms)
  • 15. © 2016 Codeplay Software Ltd.15 Graph programming: some numbers 0 10 20 30 40 50 60 70 80 90 100 OpenCV (nodes) OpenCV (graph) Halide (nodes) Halide (graph) SYCL (nodes) SYCL (graph) Effect of combining graph nodes on performance Kernel time (ms) Overhead time (ms) In this example, we perform 3 image processing operations on an accelerator and compare 3 systems when executing individual nodes, or a whole graph Halide and SYCL use kernel fusion, whereas OpenCV does not. For all 3 systems, the performance of the whole graph is significantly better than individual nodes executed on their own The system is an AMD APU and the operations are: RGB- >HSV, channel masking, HSV->RGB
  • 16. © 2016 Codeplay Software Ltd.16 Graph programming • For both machine vision algorithms and machine learning, graph programming is the most widely-adopted approach • Two styles of graph programming that we commonly see: C-style graph programming • OpenVX • OpenCV C++ style graph programming • Halide • RapidMind • Eigen (also in TensorFlow) • VisionCpp
  • 17. © 2016 Codeplay Software Ltd.17 C style graph programming OpenVX: open standard • Can be implemented by vendors • Create a graph with C API, then map to an entire SoC OpenCV: open source • Implemented on OpenCL • Implemented on device-specific accelerators • Create a graph with C API, then execute
  • 18. © 2016 Codeplay Software Ltd.18 & Device-Specific Programming How do we adapt it for all the graph nodes we need? Runtime systems can automatically optimize the graphs Can … develop software today for future platforms What happens if we invent our own graph nodes?
  • 19. © 2016 Codeplay Software Ltd.19 C++ style graph programming Examples in machine vision/machine learning • Halide • RapidMind • Eigen (also in TensorFlow) • VisionCpp C++ compilers that support this style • CUDA • C++ OpenMP • C++ 17 Parallel STL • SYCL
  • 20. © 2016 Codeplay Software Ltd.20 C++ single-source programming • C++ lets us build up graphs at compile-time • This means we can map a graph to the processors offline • C++ lets us write custom nodes ourselves • This approach is called a C++ Embedded Domain-Specific Language • Very widely used, eg Eigen, Boost, TensorFlow, RapidMind, Halide
  • 21. © 2016 Codeplay Software Ltd.21 Combining: open standards, C++ and graph programming SYCL combines C++ single-source with OpenCL acceleration OpenCL lets us run on a very wide range of accelerators now and in the future Single-source is most widely- adopted machine learning programming model C++ single source lets us create customizable graph models
  • 22. © 2016 Codeplay Software Ltd.22 Putting it all together: Building it
  • 23. © 2016 Codeplay Software Ltd.23 C++ 0: #include <visioncpp.hpp> 1: int main() { 2: auto in= cv::imread(“input.jpg”); 3: 4: auto a = Node<sRGB, 512, 512,Host>(in.data)); 5: auto b = Node<sRGB2lRGB>(a); 6: auto c = Node<lRGB2lHSV>(b); 7: auto d = Node<Constant>(0.1); 8: auto e = Node<lHSV2Scale>(c , d); 9: auto f = Node<lHSV2lRGB>(e); 10: auto g = Node<sRGB2lRGB>(f); 11: auto h = execute<fuse> (g ); 12: auto ptr = h.get_data(); 13: auto output = cv::Mat(512 , 512 , CV_8UC3 , ptr.get()); 14: cv::imshow (“Display Image” , output); 15: return 0; 16: } SYCL 0: #include <visioncpp.hpp> 1: int main() { 2: auto in= cv::imread(“input.jpg”); 3: auto q =get_queue<gpu_selector>(); 4: auto a = Node<sRGB, 512, 512,Image>(in.data)); 5: auto b = Node<sRGB2lRGB>(a); 6: auto c = Node<lRGB2lHSV>(b); 7: auto d = Node<Constant>(0.1); 8: auto e = Node<lHSV2Scale>(c , d); 9: auto f = Node<lHSV2lRGB>(e); 10: auto g = Node<sRGB2lRGB>(f); 11: auto h = execute<fuse> (g , q); 12: auto ptr = h.get_data(); 13: auto output = cv::Mat(512 , 512 , CV_8UC3 , ptr.get()); 14: cv::imshow (“Display Image” , output); 15: return 0; 16: } out lRGB 2 sRGB lHSV 2 lRGB lHSV 2 Scale lRGB 2 lHSV Coef sRGB 2 lRGB in h a b e d f g c Leaf Type Leaf Type Queue Queue No Queue No Queue
  • 24. © 2016 Codeplay Software Ltd.24 SYCL Backend structure OpenMP 1: template <typename Expr, typename… Acc> void sycl (handler& cgh, Expr expr, Acc… acc) { // sycl accessor for accessing data on device 2: auto outPtr = expr.out-> template get_accessor<write>(cgh) ; // sycl range representing valid range of accessing data 3: auto rng = range < 2 > (Expr::Rows , Expr::Cols) ; // sycl parallel for for parallelisng execution across the range 4: cgh.parallel_for<Type>(rng), [=](item<2> itemID) { // rebuilding accessor tuple on the device 5: auto tuple = make_tuple (acc) ; // calling the eval function for each pixel 6: outPtr[itemID] = expr.eval ( itemID, tuple ); 7: }); 8: } 1: template <typename Expr, typename... Acc> void cpp(Expr expr, Acc.. acc) { // output pinter for accessing data on host 2: auto outPtr = expr.out->get(); // valid range for accessing data on host 3: auto rng = range (Expr::Rows , Expr::Cols ); // rebuilding the tuple of input pointer on host 4: auto tuple = make_tuple (acc) ; // OpenMP directive for parallelising for loop 5: #pragma omp parallel for 6: for(size_t i=0; i< rng.rows; i++) 7: for(size_t j=0; j< rng.cols; j++) // calling the eval function for each pixel 8: outPtr[indx] = expr.eval (index (i , j), tuple ); 9: }; Accessor Pointer C++/OpenMPParallel for
  • 25. © 2016 Codeplay Software Ltd.25 Higher level programming enablers NVIDIA PTX •NVIDIA CUDA-only HSA • Royalty-free open standard • HSAIL is the IR • Provides a single address space, with virtual memory • Low-latency communication OpenCL SPIR • Defined for OpenCL v1.2 • Based on Clang/LLVM (the open-source compiler) SPIR-V • Open standard • Defined by Khronos • Supports compute and graphics (OpenCL, Vulkan and OpenGL) • Not tied to any compiler Open standard intermediate representations enable tools to be built on top and support a wide range of platforms
  • 26. © 2016 Codeplay Software Ltd.26 HSA • One of the big problems of offloading to accelerators is the high cost of offload: • Moving data to the accelerator • May need to translate addresses and pointer sizes between CPU and accelerator • Going into OS kernel-level driver to start work or synchronize • High cost of compilation, especially for JIT languages • In a multi-accelerator system, this may get even more costly • HSA solves this with user-mode-queueing, user-mode synchronization, and shared virtual memory • It also provides open-source software to help implement HSA • And, we’re even going further and working towards advanced tools standardization, such as profiling and debugging CPU GPU DSP CPU- optimized code GPU- optimized code DSP- optimized code HSAIL HSAIL HSAIL The HSA Runtime gives very low- latency communication
  • 27. © 2016 Codeplay Software Ltd.27 Which model should we choose? Device-specific programming •Assembly language •VHDL •Device-specific C- like programming models Higher-level language enabler •NVIDIA PTX •HSA •OpenCL SPIR •SPIR-V C level programming •OpenCL C •DSP C •MCAPI/MTAPI C++ level programming •SYCL •CUDA •HCC •C++ AMP Graph programming •OpenCV •OpenVX •Halide •VisionCpp •TensorFlow •Caffe
  • 28. © 2016 Codeplay Software Ltd.28 They are not alternatives, they are layers Device-specific programming Assembly language VHDL Device-specific C-like programming models Higher-level language enabler NVIDIA PTX HSA OpenCL SPIR SPIR-V C/C++ level programming SYCL CUDA HCC C++ AMP OpenCL Graph programming OpenCV OpenVX Halide VisionCpp TensorFlow Caffe
  • 29. © 2016 Codeplay Software Ltd.29 Can specify, test and validate each layer Device-specific programming Device-specific specification Device-specific testing and validation Higher-level language enabler SPIR/SPIR-V/HSAIL specs Conformance testsuites C/C++ level programming OpenCL/SYCL specs Clsmith testsuite Conformance testsuites Wide range of other testsuites Graph programming Validate graph models Validate the code using standard tools
  • 30. © 2016 Codeplay Software Ltd.30 For Codeplay, these are our layer choices Device- specific programming • LLVM Higher-level language enabler • OpenCL SPIR C/C++ level programming • SYCL Graph programming • TensorFlow • OpenCV We have chosen a layer of standards, based on current market adoption • TensorFlow and OpenCV • SYCL • OpenCL (with SPIR) • LLVM as the standard compiler back-end The actual choice of standards may change based on market dynamics, but by choosing widely adopted standards and a layering approach, it is easy to adapt
  • 31. © 2016 Codeplay Software Ltd.31 For Codeplay, these are our products Device- specific programming • LLVM Higher-level language enabler • OpenCL SPIR C/C++ level programming • SYCL Graph programming • TensorFlow • OpenCV
  • 32. © 2016 Codeplay Software Ltd.32 Codeplay •Standards bodies •HSA Foundation: Chair of software group, spec editor of runtime and debugging •Khronos: chair & spec editor of SYCL. Contributors to OpenCL, Safety Critical, Vulkan •ISO C++: Chair of Low Latency, Embedded WG; Editor of SG1 Concurrency TS •EEMBC: members Research •Members of EU research consortiums: PEPPHER, LPGPU, LPGPU2, CARP •Sponsorship of PhDs and EngDs for heterogeneous programming: HSA, FPGAs, ray-tracing •Collaborations with academics •Members of HiPEAC Open source •HSA LLDB Debugger •SPIR-V tools •RenderScript debugger in AOSP •LLDB for Qualcomm Hexagon •TensorFlow for OpenCL •C++ 17 Parallel STL for SYCL •VisionCpp: C++ performance- portable programming model for vision Presentations •Building an LLVM back-end •Creating an SPMD Vectorizer for OpenCL with LLVM •Challenges of Mixed-Width Vector Code Gen & Scheduling in LLVM •C++ on Accelerators: Supporting Single-Source SYCL and HSA •LLDB Tutorial: Adding debugger support for your target Company •Based in Edinburgh, Scotland •57 staff, mostly engineering •License and customize technologies for semiconductor companies •ComputeAorta and ComputeCpp: implementations of OpenCL, Vulkan and SYCL •15+ years of experience in heterogeneous systems tools Codeplay build the software platforms that deliver massive performance
  • 33. © 2016 Codeplay Software Ltd.33 Further information • OpenCL https://www.khronos.org/opencl/ • OpenVX https://www.khronos.org/openvx/ • HSA http://www.hsafoundation.com/ • SYCL http://sycl.tech • OpenCV http://opencv.org/ • Halide http://halide-lang.org/ • VisionCpp https://github.com/codeplaysoftware/visioncpp