SlideShare a Scribd company logo
1 of 28
Download to read offline
Deep Learning on Arm Cortex-M
Microcontrollers
Rod Crawford
Director Software Technologies, Arm
© 2018 Arm Limited2
What is Machine Learning (ML)?
Additional terms
• Location
• Cloud – processing done in data farms
• Edge – processing done in local
devices (growing much faster than
Cloud ML)
• Key components of machine learning
• Model – a mathematical
approximation of a collection of input
data
• Training – in deep learning, data-sets
are used to create a ‘model’
• Inference – in deep learning, a
‘model’ is used to check against new
data
Artificial
Intelligence
Machine
Learning
Deep
Learning
Neural
Networks
© 2018 Arm Limited3
Bandwidth ReliabilityPower SecurityCost Latency
Why is ML Moving to the Edge
© 2018 Arm Limited4
ML Edge Use cases
Keyword
detection
Pattern
training
Voice & image
recognition
Object
detection
Image
enhancement
Autonomous
drive
Increasing power and cost (silicon)
Increasingperformance(ops/second)
Arm Cortex-M
systems
~100W
~1W
~1-10mW
~10W
© 2018 Arm Limited5
Expanding opportunity
for the embedded
intelligence market
Arm
Cortex-M
Use cases demand more embedded intelligence
More data More processing More algorithms More compute
Contextual awarenessSensor fusion
Voice activation Biometrics
© 2018 Arm Limited6
Developing NN Solution on Cortex-M MCUs
Hardware-constrained
NN model search
Problem
Trained NN
model
Deployable
Solution
Optimized
code on h/w
NN model translation
Optimized NN functions:
CMSIS-NN
© 2018 Arm Limited7
Keyword Spotting
Listen for certain words / phrases
• Voice activation: “Ok Google”, “Hey Siri”, “Alexa”
• Simple commands: “Play music”, “Pause”, “Set Alarm for 10 am”
Feature extraction
FFT-based mel frequency cepstral coefficients
(MFCC) or log filter bank energies (LFBE).
Classification
Neural network based – DNN, CNN, RNN
(LSTM/GRU) or a mix of them
© 2018 Arm Limited8
ST Nucleo-F103RB
Cortex-M3, 72 MHz
20KB RAM, 128KB Flash
Arm Cortex-M based MCU Platforms
More boards at: https://os.mbed.com/platforms/
Memory
Performance
NXP LPC11U24
Cortex-M0, 48 MHz
8KB RAM, 32KB Flash
Nordic nRF52-DK
Cortex-M4, 64 MHz
64KB RAM, 512KB Flash
ST Nucleo-F746ZG
Cortex-M7, 216 MHz
320KB RAM, 1MB Flash
ST Nucleo-F411RE
Cortex-M4, 100 MHz
128KB RAM, 512KB Flash
NXP i.MX RT 1050
Cortex-M7, 600 MHz
512KB RAM
© 2018 Arm Limited9
Keyword Spotting NN Models: Memory vs. Ops
Need compact models: that fit within the
Cortex-M system memory.
Need models with less operations: to achieve
real time performance.
Neural network model search parameters
• NN architecture
• Number of input features
• Number of layers (3-layers, 4-layers, etc.)
• Types of layers (conv, ds-conv, fc, pool, etc.)
• Number of features per layer
DNN: Google Research (2014): G. Chen et. al. “Small-footprint keyword spotting using deep neural networks”
CNN: Google Research (2015): T. N. Sainath, et. al. “Convolutional neural networks for small-footprint keyword spotting”
LSTM: Amazon (2017): M. Sun, et. al. “Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting”
CNN-GRU: Baidu Research (2017): S. O. Arik, et. al. “Convolutional recurrent neural networks for small-footprint keyword spotting”
NN Models from literature trained on
Google speech commands dataset.
© 2018 Arm Limited10
H/W-Constrained NN Model Search
© 2018 Arm Limited11
Summary of Best NN models
Depthwise Separable CNN (DS-CNN)
achieves highest accuracy
Accuracy asymptotically reaches to 95%.
Training scripts and models are available on github:
https://github.com/ARM-software/ML-KWS-for-MCU
Y. Zhang, et. al. “Hello Edge: Keyword spotting on Microcontrollers”,
arXiv: 1711.07128.
© 2018 Arm Limited12
Developing NN Solution on Cortex-M MCUs
Hardware-constrained
NN model search
Problem
Trained NN
model
Deployable
Solution
Optimized
code on h/w
NN model translation
Optimized NN functions:
CMSIS-NN
© 2018 Arm Limited13
Quantization type impacts model size and performance.
• Bit-width: 8-bit or 16-bit
• Symmetric around zero or not
• Quantization range as [-2n,2n) a.k.a. fixed-point quantization.
Steps in model quantization
• Run quantization sweeps to identify optimal quantization strategy.
• Quantize weights: does not need a dataset
• Quantize activations: run the model with dataset to extract ranges.
NN Model Quantization
Minimal loss in accuracy (~0.1%).
May increase accuracy in some cases, because of
regularization (or reduce over-fitting).
Neural networks can tolerate quantization.
© 2018 Arm Limited14
Model Deployment on Cortex-M MCUs
Running ML framework on Cortex-M systems is impractical.
Need to run bare-metal code to efficiently use the limited resources.
Arm NN translates trained model to the code that runs on Cortex-M
cores using CMSIS-NN functions.
CMSIS-NN: optimized low-level NN functions for Cortex-M CPUs.
CMSIS-NN APIs may also be directly used in the application code.
© 2018 Arm Limited15
Developing NN Solution on Cortex-M MCUs
Hardware-constrained
NN model search
Problem
Trained NN
model
Deployable
Solution
Optimized
code on h/w
NN model translation
Optimized NN functions:
CMSIS-NN
© 2018 Arm Limited16
Cortex Microcontroller Software Interface Standard (CMSIS)
CMSIS: vendor-independent hardware abstraction layer for Cortex-M processor cores.
Enable consistent device support, reduce learning curve and time-to-market.
Open-source: https://github.com/ARM-software/CMSIS_5/
© 2018 Arm Limited17
CMSIS-NN
CMSIS-NN: collection of optimized neural network functions for Cortex-M cores
Key considerations
• Improve performance using SIMD instructions.
• Minimize memory footprint.
• NN-specific optimizations: data-layout and offline weight reordering.
© 2018 Arm Limited18
CMSIS-NN – Efficient NN kernels for Cortex-M CPUs
Convolution
• Boost compute density with GEMM based implementation
• Reduce data movement overhead with depth-first data layout
• Interleave data movement and compute to minimize memory
footprint
Pooling
• Improve performance by splitting pooling into x-y directions
• Improve memory access and footprint with in-situ updates
Activation
• ReLU: Improve parallelism by branch-free implementation
• Sigmoid/Tanh: fast table-lookup instead of exponent computation
*Baseline uses CMSIS 1D Conv and Caffe-like Pooling/ReLU
0
1
2
3
4
5
6
Conv Pooling Activation
(ReLU)
Total
Relativethroughtput
CNN Runtime improvement
Baseline New kernels
0
2
4
6
Conv Pooling Activation
(ReLU)
Total
RelativeOpsperJoule
Energy efficiency improvement
4.9x
higher
eff.
4.6x
higher
perf.
© 2018 Arm Limited21
CNN on Cortex-M7
• CNN with 8-bit weights and 8-bit activations
• Total memory footprint: 87 kB weights + 40 kB
activations + 10 kB buffers (I/O etc.)
• Total operations: 24.7 MOps, run time: 99.1ms
• Example code available in CMSIS-NN github.
NUCLEO-F746ZG: 216 MHz, 320 KB SRAM
Video : https://www.youtube.com/watch?v=PdWi_fvY9Og
OpenMV platform
with a Cortex-M7
© 2018 Arm Limited22
https://youtu.be/PdWi_fvY9Og?t=1m
Demo
© 2018 Arm Limited23
Summary
• ML is moving to the IoT Edge and Cortex-M plays a key role in enabling embedded
intelligence.
• Case study: keyword spotting on Cortex-M CPU
• Hardware-constrained neural network architecture exploration
• Deployment flow of model on Cortex-M CPUs
• CMSIS-NN: optimized neural network functions
2424
Thank You!
Danke!
Merci!
!
!
Gracias!
Kiitos!
감사합니다
ध"यवाद
© 2018 Arm Limited
© 2018 Arm Limited25
Project Trillium: Arm ML and OD Processors
First-generation ML processor targets Mobile market
Massive uplift from CPUs, GPUs, DSPs and accelerators
Ground-up design for high performance and efficiency
Enabled by open-source software
© 2018 Arm Limited26
Flexible, Scalable ML Solutions
Keyword
detection
Pattern
training
Voice and image
recognition
Object
detection
Image
enhancement
Autonomous
driving
Data
center
Increasingperformance(ops/second)
Mali GPUs
ML and OD processors
Cortex-M/A CPUs
Only Arm can enable ML everywhere
90% of the AI-enabled units shipped
today are based on Arm
(source: IDC WW Embedded and Intelligent Systems
Forecast, 2017-2022 and Arm forecast)
Increasing power and cost (Silicon)
© 2018 Arm Limited27
Project Trillium: Arm’s ML Computing Platform
Arm Hardware IP for AI/ML
AI/ML Applications, Algorithms and Frameworks
CPU GPUHardware
Products
Software
Products
Ecosystem
Software Libraries Optimized for Arm Hardware
ML and OD
Processors
Machine Learning
Object Detection (OD)
Arm NN
Compute Library Object Detection LibrariesCMSIS-NN
DSPs, FPGAs,
Accelerators
Partner IP
Android NNAPI
Armv8 SVE
© 2018 Arm Limited28
Resources
CMSIS-NN paper: https://arxiv.org/abs/1801.06601
CMSIS-NN blog: https://community.arm.com/processors/b/blog/posts/new-neural-
network-kernels-boost-efficiency-in-microcontrollers-by-5x
CMSIS-NN Github link: https://github.com/ARM-software/CMSIS_5/
KWS (Keyword Spotting) paper: https://arxiv.org/abs/1711.07128
KWS blog: https://community.arm.com/processors/b/blog/posts/high-accuracy-keyword-
spotting-on-cortex-m-processors
KWS Github link: https://github.com/ARM-software/ML-KWS-for-MCU/
ArmNN: https://developer.arm.com/products/processors/machine-learning/arm-nn
ArmNN SDK blog: https://community.arm.com/tools/b/blog/posts/arm-nn-sdk
2929
Thank You!
Danke!
Merci!
!
!
Gracias!
Kiitos!
감사합니다
ध"यवाद
© 2018 Arm Limited
3030 © 2018 Arm Limited
The Arm trademarks featured in this presentation are registered trademarks or
trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights
reserved. All other marks featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks

More Related Content

What's hot

"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li..."The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...Edge AI and Vision Alliance
 
"Edge/Cloud Tradeoffs and Scaling a Consumer Computer Vision Product," a Pres...
"Edge/Cloud Tradeoffs and Scaling a Consumer Computer Vision Product," a Pres..."Edge/Cloud Tradeoffs and Scaling a Consumer Computer Vision Product," a Pres...
"Edge/Cloud Tradeoffs and Scaling a Consumer Computer Vision Product," a Pres...Edge AI and Vision Alliance
 
Виктор Ерухимов Open VX mixar moscow sept'15
Виктор Ерухимов Open VX  mixar moscow sept'15 Виктор Ерухимов Open VX  mixar moscow sept'15
Виктор Ерухимов Open VX mixar moscow sept'15 mixARConference
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooJason Dai
 
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote) FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote) Wim Vanderbauwhede
 
AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group Ganesan Narayanasamy
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
"Building Complete Embedded Vision Systems on Linux—From Camera to Display," ...
"Building Complete Embedded Vision Systems on Linux—From Camera to Display," ..."Building Complete Embedded Vision Systems on Linux—From Camera to Display," ...
"Building Complete Embedded Vision Systems on Linux—From Camera to Display," ...Edge AI and Vision Alliance
 
OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021The Khronos Group Inc.
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERAchronix
 
“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org
“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org
“OpenCV: Past, Present and Future,” a Presentation from OpenCV.orgEdge AI and Vision Alliance
 
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...Edge AI and Vision Alliance
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...Edge AI and Vision Alliance
 
“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...
“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...
“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...Edge AI and Vision Alliance
 
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...Edge AI and Vision Alliance
 
Vulkan ML Japan Virtual Open House Feb 2021
Vulkan ML Japan Virtual Open House Feb 2021Vulkan ML Japan Virtual Open House Feb 2021
Vulkan ML Japan Virtual Open House Feb 2021The Khronos Group Inc.
 

What's hot (20)

"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li..."The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
 
"Edge/Cloud Tradeoffs and Scaling a Consumer Computer Vision Product," a Pres...
"Edge/Cloud Tradeoffs and Scaling a Consumer Computer Vision Product," a Pres..."Edge/Cloud Tradeoffs and Scaling a Consumer Computer Vision Product," a Pres...
"Edge/Cloud Tradeoffs and Scaling a Consumer Computer Vision Product," a Pres...
 
Виктор Ерухимов Open VX mixar moscow sept'15
Виктор Ерухимов Open VX  mixar moscow sept'15 Виктор Ерухимов Open VX  mixar moscow sept'15
Виктор Ерухимов Open VX mixar moscow sept'15
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics Zoo
 
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote) FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
"Building Complete Embedded Vision Systems on Linux—From Camera to Display," ...
"Building Complete Embedded Vision Systems on Linux—From Camera to Display," ..."Building Complete Embedded Vision Systems on Linux—From Camera to Display," ...
"Building Complete Embedded Vision Systems on Linux—From Camera to Display," ...
 
OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWER
 
“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org
“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org
“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org
 
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Interconnect your future
Interconnect your futureInterconnect your future
Interconnect your future
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
 
“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...
“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...
“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...
 
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...
 
Vulkan ML Japan Virtual Open House Feb 2021
Vulkan ML Japan Virtual Open House Feb 2021Vulkan ML Japan Virtual Open House Feb 2021
Vulkan ML Japan Virtual Open House Feb 2021
 

Similar to HKG18-312 - CMSIS-NN

Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsPerformance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsHannes Tschofenig
 
ML for embedded systems at the edge - NXP and Arm - FINAL.pdf
ML for embedded systems at the edge - NXP and Arm - FINAL.pdfML for embedded systems at the edge - NXP and Arm - FINAL.pdf
ML for embedded systems at the edge - NXP and Arm - FINAL.pdfsetagllib
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghData Con LA
 
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...Edge AI and Vision Alliance
 
“Introducing the i.MX 93: Your “Go-to” Processor for Embedded Vision,” a Pres...
“Introducing the i.MX 93: Your “Go-to” Processor for Embedded Vision,” a Pres...“Introducing the i.MX 93: Your “Go-to” Processor for Embedded Vision,” a Pres...
“Introducing the i.MX 93: Your “Go-to” Processor for Embedded Vision,” a Pres...Edge AI and Vision Alliance
 
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...Edge AI and Vision Alliance
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Eric Van Hensbergen
 
“Arm Cortex-M Series Processors Spark a New Era of Use Cases, Enabling Low-co...
“Arm Cortex-M Series Processors Spark a New Era of Use Cases, Enabling Low-co...“Arm Cortex-M Series Processors Spark a New Era of Use Cases, Enabling Low-co...
“Arm Cortex-M Series Processors Spark a New Era of Use Cases, Enabling Low-co...Edge AI and Vision Alliance
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the EdgeAmazon Web Services
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableRebekah Rodriguez
 
So you think developing an SoC needs to be complex or expensive?
So you think developing an SoC needs to be complex or expensive?So you think developing an SoC needs to be complex or expensive?
So you think developing an SoC needs to be complex or expensive?Arm
 
Efficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesEfficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesArm
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the EdgeJulien SIMON
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeDatabricks
 
“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...
“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...
“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...Edge AI and Vision Alliance
 
Software development in ar mv8 m architecture - yiu
Software development in ar mv8 m architecture - yiuSoftware development in ar mv8 m architecture - yiu
Software development in ar mv8 m architecture - yiuArm
 

Similar to HKG18-312 - CMSIS-NN (20)

Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsPerformance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
 
ML for embedded systems at the edge - NXP and Arm - FINAL.pdf
ML for embedded systems at the edge - NXP and Arm - FINAL.pdfML for embedded systems at the edge - NXP and Arm - FINAL.pdf
ML for embedded systems at the edge - NXP and Arm - FINAL.pdf
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
 
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
 
“Introducing the i.MX 93: Your “Go-to” Processor for Embedded Vision,” a Pres...
“Introducing the i.MX 93: Your “Go-to” Processor for Embedded Vision,” a Pres...“Introducing the i.MX 93: Your “Go-to” Processor for Embedded Vision,” a Pres...
“Introducing the i.MX 93: Your “Go-to” Processor for Embedded Vision,” a Pres...
 
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
 
Trends in DNN compression
Trends in DNN compressionTrends in DNN compression
Trends in DNN compression
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
 
“Arm Cortex-M Series Processors Spark a New Era of Use Cases, Enabling Low-co...
“Arm Cortex-M Series Processors Spark a New Era of Use Cases, Enabling Low-co...“Arm Cortex-M Series Processors Spark a New Era of Use Cases, Enabling Low-co...
“Arm Cortex-M Series Processors Spark a New Era of Use Cases, Enabling Low-co...
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the Edge
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
 
So you think developing an SoC needs to be complex or expensive?
So you think developing an SoC needs to be complex or expensive?So you think developing an SoC needs to be complex or expensive?
So you think developing an SoC needs to be complex or expensive?
 
Ameya_Kasbekar_Resume
Ameya_Kasbekar_ResumeAmeya_Kasbekar_Resume
Ameya_Kasbekar_Resume
 
Efficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesEfficient software development with heterogeneous devices
Efficient software development with heterogeneous devices
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the Edge
 
Ijetr042175
Ijetr042175Ijetr042175
Ijetr042175
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks Runtime
 
Making AI Ubiquitous
Making AI UbiquitousMaking AI Ubiquitous
Making AI Ubiquitous
 
“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...
“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...
“Accelerate Tomorrow’s Models with Lattice FPGAs,” a Presentation from Lattic...
 
Software development in ar mv8 m architecture - yiu
Software development in ar mv8 m architecture - yiuSoftware development in ar mv8 m architecture - yiu
Software development in ar mv8 m architecture - yiu
 

More from Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramLinaro
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...Linaro
 

More from Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
 

Recently uploaded

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

HKG18-312 - CMSIS-NN

  • 1. Deep Learning on Arm Cortex-M Microcontrollers Rod Crawford Director Software Technologies, Arm
  • 2. © 2018 Arm Limited2 What is Machine Learning (ML)? Additional terms • Location • Cloud – processing done in data farms • Edge – processing done in local devices (growing much faster than Cloud ML) • Key components of machine learning • Model – a mathematical approximation of a collection of input data • Training – in deep learning, data-sets are used to create a ‘model’ • Inference – in deep learning, a ‘model’ is used to check against new data Artificial Intelligence Machine Learning Deep Learning Neural Networks
  • 3. © 2018 Arm Limited3 Bandwidth ReliabilityPower SecurityCost Latency Why is ML Moving to the Edge
  • 4. © 2018 Arm Limited4 ML Edge Use cases Keyword detection Pattern training Voice & image recognition Object detection Image enhancement Autonomous drive Increasing power and cost (silicon) Increasingperformance(ops/second) Arm Cortex-M systems ~100W ~1W ~1-10mW ~10W
  • 5. © 2018 Arm Limited5 Expanding opportunity for the embedded intelligence market Arm Cortex-M Use cases demand more embedded intelligence More data More processing More algorithms More compute Contextual awarenessSensor fusion Voice activation Biometrics
  • 6. © 2018 Arm Limited6 Developing NN Solution on Cortex-M MCUs Hardware-constrained NN model search Problem Trained NN model Deployable Solution Optimized code on h/w NN model translation Optimized NN functions: CMSIS-NN
  • 7. © 2018 Arm Limited7 Keyword Spotting Listen for certain words / phrases • Voice activation: “Ok Google”, “Hey Siri”, “Alexa” • Simple commands: “Play music”, “Pause”, “Set Alarm for 10 am” Feature extraction FFT-based mel frequency cepstral coefficients (MFCC) or log filter bank energies (LFBE). Classification Neural network based – DNN, CNN, RNN (LSTM/GRU) or a mix of them
  • 8. © 2018 Arm Limited8 ST Nucleo-F103RB Cortex-M3, 72 MHz 20KB RAM, 128KB Flash Arm Cortex-M based MCU Platforms More boards at: https://os.mbed.com/platforms/ Memory Performance NXP LPC11U24 Cortex-M0, 48 MHz 8KB RAM, 32KB Flash Nordic nRF52-DK Cortex-M4, 64 MHz 64KB RAM, 512KB Flash ST Nucleo-F746ZG Cortex-M7, 216 MHz 320KB RAM, 1MB Flash ST Nucleo-F411RE Cortex-M4, 100 MHz 128KB RAM, 512KB Flash NXP i.MX RT 1050 Cortex-M7, 600 MHz 512KB RAM
  • 9. © 2018 Arm Limited9 Keyword Spotting NN Models: Memory vs. Ops Need compact models: that fit within the Cortex-M system memory. Need models with less operations: to achieve real time performance. Neural network model search parameters • NN architecture • Number of input features • Number of layers (3-layers, 4-layers, etc.) • Types of layers (conv, ds-conv, fc, pool, etc.) • Number of features per layer DNN: Google Research (2014): G. Chen et. al. “Small-footprint keyword spotting using deep neural networks” CNN: Google Research (2015): T. N. Sainath, et. al. “Convolutional neural networks for small-footprint keyword spotting” LSTM: Amazon (2017): M. Sun, et. al. “Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting” CNN-GRU: Baidu Research (2017): S. O. Arik, et. al. “Convolutional recurrent neural networks for small-footprint keyword spotting” NN Models from literature trained on Google speech commands dataset.
  • 10. © 2018 Arm Limited10 H/W-Constrained NN Model Search
  • 11. © 2018 Arm Limited11 Summary of Best NN models Depthwise Separable CNN (DS-CNN) achieves highest accuracy Accuracy asymptotically reaches to 95%. Training scripts and models are available on github: https://github.com/ARM-software/ML-KWS-for-MCU Y. Zhang, et. al. “Hello Edge: Keyword spotting on Microcontrollers”, arXiv: 1711.07128.
  • 12. © 2018 Arm Limited12 Developing NN Solution on Cortex-M MCUs Hardware-constrained NN model search Problem Trained NN model Deployable Solution Optimized code on h/w NN model translation Optimized NN functions: CMSIS-NN
  • 13. © 2018 Arm Limited13 Quantization type impacts model size and performance. • Bit-width: 8-bit or 16-bit • Symmetric around zero or not • Quantization range as [-2n,2n) a.k.a. fixed-point quantization. Steps in model quantization • Run quantization sweeps to identify optimal quantization strategy. • Quantize weights: does not need a dataset • Quantize activations: run the model with dataset to extract ranges. NN Model Quantization Minimal loss in accuracy (~0.1%). May increase accuracy in some cases, because of regularization (or reduce over-fitting). Neural networks can tolerate quantization.
  • 14. © 2018 Arm Limited14 Model Deployment on Cortex-M MCUs Running ML framework on Cortex-M systems is impractical. Need to run bare-metal code to efficiently use the limited resources. Arm NN translates trained model to the code that runs on Cortex-M cores using CMSIS-NN functions. CMSIS-NN: optimized low-level NN functions for Cortex-M CPUs. CMSIS-NN APIs may also be directly used in the application code.
  • 15. © 2018 Arm Limited15 Developing NN Solution on Cortex-M MCUs Hardware-constrained NN model search Problem Trained NN model Deployable Solution Optimized code on h/w NN model translation Optimized NN functions: CMSIS-NN
  • 16. © 2018 Arm Limited16 Cortex Microcontroller Software Interface Standard (CMSIS) CMSIS: vendor-independent hardware abstraction layer for Cortex-M processor cores. Enable consistent device support, reduce learning curve and time-to-market. Open-source: https://github.com/ARM-software/CMSIS_5/
  • 17. © 2018 Arm Limited17 CMSIS-NN CMSIS-NN: collection of optimized neural network functions for Cortex-M cores Key considerations • Improve performance using SIMD instructions. • Minimize memory footprint. • NN-specific optimizations: data-layout and offline weight reordering.
  • 18. © 2018 Arm Limited18 CMSIS-NN – Efficient NN kernels for Cortex-M CPUs Convolution • Boost compute density with GEMM based implementation • Reduce data movement overhead with depth-first data layout • Interleave data movement and compute to minimize memory footprint Pooling • Improve performance by splitting pooling into x-y directions • Improve memory access and footprint with in-situ updates Activation • ReLU: Improve parallelism by branch-free implementation • Sigmoid/Tanh: fast table-lookup instead of exponent computation *Baseline uses CMSIS 1D Conv and Caffe-like Pooling/ReLU 0 1 2 3 4 5 6 Conv Pooling Activation (ReLU) Total Relativethroughtput CNN Runtime improvement Baseline New kernels 0 2 4 6 Conv Pooling Activation (ReLU) Total RelativeOpsperJoule Energy efficiency improvement 4.9x higher eff. 4.6x higher perf.
  • 19. © 2018 Arm Limited21 CNN on Cortex-M7 • CNN with 8-bit weights and 8-bit activations • Total memory footprint: 87 kB weights + 40 kB activations + 10 kB buffers (I/O etc.) • Total operations: 24.7 MOps, run time: 99.1ms • Example code available in CMSIS-NN github. NUCLEO-F746ZG: 216 MHz, 320 KB SRAM Video : https://www.youtube.com/watch?v=PdWi_fvY9Og OpenMV platform with a Cortex-M7
  • 20. © 2018 Arm Limited22 https://youtu.be/PdWi_fvY9Og?t=1m Demo
  • 21. © 2018 Arm Limited23 Summary • ML is moving to the IoT Edge and Cortex-M plays a key role in enabling embedded intelligence. • Case study: keyword spotting on Cortex-M CPU • Hardware-constrained neural network architecture exploration • Deployment flow of model on Cortex-M CPUs • CMSIS-NN: optimized neural network functions
  • 23. © 2018 Arm Limited25 Project Trillium: Arm ML and OD Processors First-generation ML processor targets Mobile market Massive uplift from CPUs, GPUs, DSPs and accelerators Ground-up design for high performance and efficiency Enabled by open-source software
  • 24. © 2018 Arm Limited26 Flexible, Scalable ML Solutions Keyword detection Pattern training Voice and image recognition Object detection Image enhancement Autonomous driving Data center Increasingperformance(ops/second) Mali GPUs ML and OD processors Cortex-M/A CPUs Only Arm can enable ML everywhere 90% of the AI-enabled units shipped today are based on Arm (source: IDC WW Embedded and Intelligent Systems Forecast, 2017-2022 and Arm forecast) Increasing power and cost (Silicon)
  • 25. © 2018 Arm Limited27 Project Trillium: Arm’s ML Computing Platform Arm Hardware IP for AI/ML AI/ML Applications, Algorithms and Frameworks CPU GPUHardware Products Software Products Ecosystem Software Libraries Optimized for Arm Hardware ML and OD Processors Machine Learning Object Detection (OD) Arm NN Compute Library Object Detection LibrariesCMSIS-NN DSPs, FPGAs, Accelerators Partner IP Android NNAPI Armv8 SVE
  • 26. © 2018 Arm Limited28 Resources CMSIS-NN paper: https://arxiv.org/abs/1801.06601 CMSIS-NN blog: https://community.arm.com/processors/b/blog/posts/new-neural- network-kernels-boost-efficiency-in-microcontrollers-by-5x CMSIS-NN Github link: https://github.com/ARM-software/CMSIS_5/ KWS (Keyword Spotting) paper: https://arxiv.org/abs/1711.07128 KWS blog: https://community.arm.com/processors/b/blog/posts/high-accuracy-keyword- spotting-on-cortex-m-processors KWS Github link: https://github.com/ARM-software/ML-KWS-for-MCU/ ArmNN: https://developer.arm.com/products/processors/machine-learning/arm-nn ArmNN SDK blog: https://community.arm.com/tools/b/blog/posts/arm-nn-sdk
  • 28. 3030 © 2018 Arm Limited The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks