"Update on Khronos Standards for Vision and Machine Learning," a Presentation from the Khronos Group

Copyright © Khronos® Group Inc. 2018 - Page 1
Vision and
Inferencing
Update
September 2018
Neil Trevett | Khronos President
NVIDIA | VP Developer Ecosystem
ntrevett@nvidia.com | @neilt3d
www.khronos.org

Khronos Mission
Software
Silicon
Khronos is an International Industry Consortium creating royalty-free, open standards
to enable software to access hardware acceleration for
3D Graphics, Virtual and Augmented Reality, Parallel Computing,
Neural Networks and Vision Processing

Khronos Primary Standards
3D Graphics
VR and AR
Heterogenous Compute
(Parallel Processing)
Vision and Inferencing
APIs File Formats

Topics for Today
1 2 3
Landscape of
vision and
inferencing
acceleration
Where Khronos open
standards provide
non-proprietary
ecosystem choices
New updates on
Khronos vision
and inferencing
standards

Runtimes
Machine Learning Acceleration
Desktop and Cloud
GPU/TPU Acceleration
Diverse Inferencing
Acceleration Hardware
Training on Desktop and Cloud
Neural Net Training
Frameworks
Neural Net Training
Frameworks
Neural Net Training
Frameworks
Training
Frameworks
Deployment on Embedded Devices
Optimization
Trained
Networks
Applications
Training Data Sets
OR
Live
Data

Machine Learning Training
Desktop and
Cloud Hardware
cuDNN MIOpen clDNN
Neural Net Training
Frameworks
Neural Net Training
Frameworks
Neural Net Training
Frameworks
Neural Net Training
Frameworks
TPU
Authoring
Interchange
GPUs have well established
APIs and libraries for
compute acceleration

Runtimes
Desktop and Cloud
Diverse Inferencing
Neural Net Training
Frameworks
Neural Net Training
Frameworks
Neural Net Training
Frameworks
Training
Frameworks
Optimization
Trained
Networks
Applications
Training Data Sets
OR
Live
Data

NNEF - Neural Network Exchange Format
NN Authoring Framework 1
Inference Engine 1
Inference Engine 2
Inference Engine 3
Every Tool Needs an Exporter to
Every Accelerator
With NNEF
Before NNEF
Inference Engine 1
Inference Engine 2
Inference Engine 3
Optimization and processing tools

Network Data File
Binary format contains parameter tensors
Supports float and quantized (integer) data
Flexible bit widths and quantization algorithms
Quantization algorithms expressed as extensible
compound operations
Quantization info provided as hints for execution
Network Data File
compound operations
NNEF Captures a Neural Network Description
Network Structure File
Distilled, platform independent network description
Human readable, syntactical elements from Python
Standardized Operations
Rigorously defined semantics
Linear, convolution, pooling, normalization, activation, unary/binary
Supports fully connected, convolutional, recurrent architectures
Two Levels of Expressiveness
Flat
Basic transfer of computation graphs with standardized operations
Simple to parse and translate to vendor specific formats
Compositional
Define custom compound operations
Higher-level graph descriptions
More complex to parse but offers more optimization hints
Network Data File
compound operations
Split Structure and Data files
Easy independent access to network structure or individual parameter data
Set of files can use a container such as tar or zip with optional compression and encryption
Can associate multiple Data Files with one Network
Structure File e.g. the same data in multiple formats

NNEF 1.0
Files
Syntax
Parser/
Validator
TensorFlow
and Caffe
Exporters
NNEF open source projects hosted on
Khronos NNEF GitHub repository
Apache 2.0 license
https://github.com/KhronosGroup/NNEF-Tools
TensorFlow
and Caffe2
Importer /
Exporters
Google
NNAPI
Convertor
OpenVX
Ingestion &
Execution
Live
Imminent
NNEF V1.0 released in August 2018
After positive industry feedback on Provisional
specification released in December 2017
NNEF Working Group Participants

NNEF and ONNX
Embedded Inferencing Import Training Interchange
Defined Specification Open Source Project
Stability for hardware deployment Software stack flexibility
Multi-company Governance Initiated by Facebook
Flexible Precision / Quantization 32-bit Floating Point only
Comparing Neural Network
Exchange Industry Initiatives
ONNX and NNEF are Complementary
- ONNX will HAVE to move fast to track authoring framework interchange
- NNEF provides a stable bridge from training into edge inferencing engines
Bidirectional translator
in open source
Initiating open source
bidirectional translator
Khronos tried to use LLVM as a hardware IR
BUT
LLVM evolves without needing to preserve
backwards compatibility.
Fine for software compilers – very difficult to
manage for hardware toolchains and roadmaps
SO
Khronos created hardware oriented SPIR-V
with bidirectional translation to LLVM
Same Industry Dynamics as
LLVM and SPIR-V

Runtimes
Desktop and Cloud
Diverse Inferencing
Neural Net Training
Frameworks
Neural Net Training
Frameworks
Neural Net Training
Frameworks
Training
Frameworks
Optimization
Trained
Networks
Applications
Training Data Sets
OR
Live
Data

Three Broad Inferencing Choices
Run in
training
framework
Export to
inference
runtime
Compile to
optimized
code
Acceleration needs high-level programming
tools – typically large C++ applications
E.g. TensorFlow or TensorFlow Lite
The most popular industry choice today.
Runtime often uses underlying acceleration APIs
E.g. TensorRT, CoreML
Often used to merge custom or vision code
alongside inferencing runtime – generates
LLVM for CPU + accelerated API code

SYCL Single Source C++ Parallel Programming
• SYCL 1.2.1 Adopters Program released in July 2018 with open source conformance tests
- https://www.khronos.org/news/press/khronos-releases-conformance-test-suite-for-sycl-1.2.1
• Multiple Implémentations shipping: triSYCL, ComputeCpp
- http://sycl.tech
• Multiple SYCL libraries for vision and inferencing
- SYCL-BLAS, SYCL-DNN, SYCL-Eigen
C++ Kernel Fusion can gives better performance
on complex apps and libs than hand-coding
Single application source
file using STANDARD C++C++ templates and lambda
functions separate host &
device code
Accelerated code passed into
device OpenCL compilers

Python Client C++ Client
Optional C API
TensorFlow tensor Kernels
(> 800 kernels)
ConvolutionsMatrix multiply
Eigen Tensors SYCL-BLAS Library SYCL-DNN Library
TensorFlow on SYCL / OpenCL
State-of-the-art C++
compilers can fuse nodes
in vision and neural
network graphs to provide
optimized performance
often faster than hand-
coded applications

Three Broad Inferencing Choices
Run in
training
framework
Export to
inference
runtime
Compile to
optimized
code
Acceleration needs high-level programming
tools – typically large C++ applications
E.g. TensorFlow or TensorFlow Lite
The most popular industry choice today.
Runtime often uses underlying acceleration APIs
E.g. TensorRT, CoreML
Often used to merge custom or vision code
alongside inferencing runtime – generates
LLVM for CPU + accelerated API code

Platform Inferencing Stacks
Microsoft Windows
Windows Machine Learning (WinML)
Google Android
Neural Network API (NNAPI)
Apple MacOS and iOS
CoreML
https://docs.microsoft.com/en-us/windows/uwp/machine-learning/ https://developer.android.com/ndk/guides/neuralnetworks/ https://developer.apple.com/documentation/coreml
Core ML Model
Consistent Three Steps
1. Import trained NN model file
2. Build optimized version of graph
3. Run graph on accelerated runtime using
underlying low-level API

NNVM - Open Compiler for AI Inferencing
http://www.tvmlang.org/2017/08/17/tvm-release-announcement.html
SPIR-V IR for parallel accelerators
Backend in development
LLVM IR for CPUs
1.Import Trained
Network Description
2. Graph-level
Optimizations
3. Decompose to primitive
instructions and emit
programs for accelerated
run-times
Paul G. Allen School of Computer Science & Engineering, University of Washington
Facebook Glow Compiler
(Graph Lowering Optimizations)
https://facebook.ai/developers/tools/glow

OpenVX
PowerEfficiency
Computation Flexibility
Dedicated
Hardware
GPU
Compute
Multi-core
CPUX1
X10
X100
Vision
DSPs
Wide range of vision hardware architectures
OpenVX provides a high-level Graph-based abstraction
->
Enables Graph-level optimizations!
Can be implemented on almost any hardware or processor!
->
Portable, Efficient Vision Processing!
Vision
Node
Vision
Node
Vision
NodeVision
Node
Vision Processing Graph
Shipping Implementations

Extending OpenVX for Inferencing #1
Neural Network Extension
• OpenVX Nodes to represent common NN Layers
• 1D-4D Tensors to connect layers and common
• INT16, INT7.8, INT8, and U8 Tensor Ops
vxActivationLayer vxConvolutionLayer vxDeconvolutionLayer
vxFullyConnectedLayer vxNormalizationLayer vxPoolingLayer
vxSoftmaxLayer vxROIPoolingLayer …
Vision
Node
Vision
Node
Vision
Node
Downstream
Application
Processing
Native
Camera
Control CNN Nodes
An OpenVX graph mixing CNN
nodes with traditional vision nodes
NNEF Translator converts NNEF
representation into OpenVX Node Graphs
NNEF Translator
• Ingests NNEF File and builds OpenVX node graph
• Open source project in progress
Importing NNEF Neural Network Descriptions

Extending OpenVX for Inferencing #2
OpenVX/OpenCL Interop
• Provisional Extension
• Enables custom OpenCL acceleration to be
invoked from OpenVX User Kernels
• Memory objects can be mapped or copied
Kernel/Graph Import
• Provisional Extension
• Defines container for executable or IR code
• Enables arbitrary code to be inserted as a
OpenVX Node in a graph
OpenCL Command Queue
Application
cl_mem buffers
Fully asynchronous host-device
operations during data exchange
OpenVX data objects
Runtime
Runtime Map or copy OpenVX data
objects into cl_mem buffers
Copy or export
cl_mem buffers into OpenVX
data objects
OpenVX user-kernels can access command
queue and cl_mem objects to asynchronously
schedule OpenCL kernel execution
OpenVX/OpenCL Interop
Creating Custom User Nodes

NNEF and OpenVX for Inferencing
Compilation
Kernel
Import
Ingestion
Proprietary
Runtimes
Vision Nodes
User Nodes
NN
Extension
Translator
Executable
Code
To mix inferencing
with vision and other
custom processing
Acceleration
APIs
Many inferencing stacks end up
using OpenCL for hardware
acceleration
Compile to
executable
code
Execute
accelerated
OpenVX
Runtime
Compile to
IR/Binary

GPU
OpenCL – Unique Heterogeneous Runtime
FPGA DSP
Custom Hardware
GPU
CPUCPUCPUGPU
Growing number of optimized OpenCL
vision and inferencing libraries
Vision: OpenCV, Halide, Visioncpp
Machine Learning: Xiaomi MACE, Arm Compute Library
Linear Algebra: clDNN, clBlast, ViennaCL
Application or
Inferencing Run-time
Fragmented GPU
API Landscape
OpenCL is the only industry standard for low-level heterogeneous compute
Portable control over memory and parallel task execution
“The closest you can be to your embedded accelerator and still be portable”
Application or
Inferencing Run-time

OpenCL Ecosystem Roadmap
2011
OpenCL 1.2
OpenCL C Kernel
Language
OpenCL 2.1
SPIR-V in Core
2015
SYCL 1.2
C++11 Single source
programming
OpenCL 2.2
C++ Kernel Language
2017
SYCL 1.2.1
C++11 Single source
programming
Bringing Heterogeneous
compute to standard ISO C++
Khronos hosting C++17 Parallel STL
C++20 Parallel STL with Ranges Proposal
Processor Deployment
Flexibility
Parallel computation across diverse
processor architectures
Kernel Deployment
Flexibility
Execute OpenCL C kernels on
Vulkan GPU runtimes
OpenCL has an active
three track roadmap

OpenCL Next - Feature Set Flexibility
• Defining OpenCL features that become optional for enhanced deployment flexibility
- API and language features e.g. floating point precisions
• Feature Sets avoid fragmentation
- Defined to suit specific markets – e.g. desktop, embedded vision and inferencing
• Implementations are conformant if fully support feature set functionality
OpenCL 2.2 Functionality = queryable, optional feature
Khronos-defined
OpenCL 2.2 Full Profile
Feature Set
Khronos-defined
OpenCL 1.2 Full Profile
Feature Set
Industry-defined
Feature Set E.g.
Embedded Vision
and Inferencing

Universal Deployment Flexibility
Open source SPIRV-Cross converts
SPIR-V to MSL or HLSLClspv and clvk
open source tools
OpenCL
Programs
Native
Vulkan Drivers
UWP and
D3D based
Consoles
Open source shims
convert Vulkan to
Metal or D3D API calls
Open source tools enable
OpenCL and Vulkan apps
to be increasingly
deployed on any platform

"Update on Khronos Standards for Vision and Machine Learning," a Presentation from the Khronos Group

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à "Update on Khronos Standards for Vision and Machine Learning," a Presentation from the Khronos Group

Similaire à "Update on Khronos Standards for Vision and Machine Learning," a Presentation from the Khronos Group (20)

Plus de Edge AI and Vision Alliance

Plus de Edge AI and Vision Alliance (20)

Dernier

Dernier (20)

"Update on Khronos Standards for Vision and Machine Learning," a Presentation from the Khronos Group