SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
CMSIS-NN, INTRO
新⽵竹碼農
Anthony Liu, 2018/03/08
1
RESOURCES
• Source: https://github.com/ARM-software/CMSIS_5
• Web 1: https://developer.arm.com/embedded/cmsis
• Web 2: http://www2.keil.com/mdk5/cmsis/
• Paper: https://arxiv.org/abs/1801.06601
• Manual: http://arm-software.github.io/CMSIS_5/NN/html/
index.html
2
CMSIS 5.3.0
• http://www2.keil.com/mdk5/cmsis/
• https://developer.arm.com/embedded/cmsis
• Cortex Microcontroller Software Interface Standard
• CMSIS-NN first appeared in 5.2.1 dev 3
CMSIS-CORE CMSIS-RTOS CMSIS-DSP
CMSIS-Driver CMSIS-SVD CMSIS-DAP
CMSIS-Pack CMSIS-NNCMSIS-Zone
(planned)
3
CMSIS
https://developer.arm.com/embedded/cmsis
4
CMSIS-NN
• DSP: Cortex-M0 (N) / Cortex-M3 (N) Cortex-M4 (Y) / Cortex-M7 (Y) / Cortex-M33 (Optional)
• For inference only with limited computation power
• CPU: Dozens MHz to 192MHz Cortex-M4, 400 MHz Cortex-M7
• MEMORY: Dozens KB to a few MB
• Kernels Support: q7t and q15_t fractional data type: [ -1.0, 1.0 )
• Functions
• Neural Network Convolution Functions
• Neural Network Activation Functions
• Fully-connected Layer Functions
• Neural Network Pooling Functions
• Softmax Functions
5
SUPPORT
• Data conversion
• arm_q7_to_q15_no_shift
• arm_q7_to_q15_reordered_no_shift
6
CONVOLUTION
• arm_convolve_HWC_q7_basic
• arm_convolve_HWC_q15_basic
• arm_convolve_HWC_q7_fast
• arm_convolve_HWC_q7_fast_nonsquare
• arm_convolve_HWC_q7_RGB
• arm_convolve_HWC_q15_fast
• arm_convolve_1x1_HWC_q7_fast_nonsquare
• arm_depthwise_separable_conv_HWC_q7
• arm_depthwise_separable_conv_HWC_q7_nonsquare
7
ACTIVATION
• ReLU
• arm_relu_q7
• arm_relu_q15
• Sigmoid / Tanh
• arm_nn_activations_direct_q7
• arm_nn_activations_direct_q15
8
POOLING
• Supports 1.7 format max-pooling and
average-pooling
• arm_maxpool_q7_HWC
• arm_avepool_q7_HWC
9
SOFTMAX
• EXP(2) based softmax function
• arm_softmax_q7
• arm_softmax_q15
10
FULLY-CONNECTED LAYER
• arm_fully_connected_q7
• arm_fully_connected_q7_opt
• arm_fully_connected_q15
• arm_fully_connected_q15_opt
• arm_fully_connected_mat_q7_vec_q15
• arm_fully_connected_mat_q7_vec_q15_opt
11
FOOTPRINT - 9,306
text data bss dec hex filename
132 0 0 132 84 ./SoftmaxFunctions/arm_softmax_q15.o
154 0 0 154 9a ./SoftmaxFunctions/arm_softmax_q7.o
544 0 0 544 220 ./PoolingFunctions/arm_pool_q7_HWC.o
2816 0 0 2816 b00 ./NNSupportFunctions/arm_nntables.o
84 0 0 84 54 ./NNSupportFunctions/arm_q7_to_q15_no_shift.o
72 0 0 72 48 ./NNSupportFunctions/arm_q7_to_q15_reordered_no_shift.o
102 0 0 102 66 ./FullyConnectedFunctions/arm_fully_connected_q15.o
88 0 0 88 58 ./FullyConnectedFunctions/arm_fully_connected_mat_q7_vec_q15.o
476 0 0 476 1dc ./FullyConnectedFunctions/arm_fully_connected_mat_q7_vec_q15_opt.o
486 0 0 486 1e6 ./FullyConnectedFunctions/arm_fully_connected_q15_opt.o
86 0 0 86 56 ./FullyConnectedFunctions/arm_fully_connected_q7.o
532 0 0 532 214 ./FullyConnectedFunctions/arm_fully_connected_q7_opt.o
266 0 0 266 10a ./ConvolutionFunctions/arm_convolve_1x1_HWC_q7_fast_nonsquare.o
404 0 0 404 194 ./ConvolutionFunctions/arm_convolve_HWC_q15_basic.o
450 0 0 450 1c2 ./ConvolutionFunctions/arm_convolve_HWC_q15_fast.o
426 0 0 426 1aa ./ConvolutionFunctions/arm_convolve_HWC_q7_basic.o
434 0 0 434 1b2 ./ConvolutionFunctions/arm_convolve_HWC_q7_fast.o
434 0 0 434 1b2 ./ConvolutionFunctions/arm_convolve_HWC_q7_fast_nonsquare.o
428 0 0 428 1ac ./ConvolutionFunctions/arm_convolve_HWC_q7_RGB.o
298 0 0 298 12a ./ConvolutionFunctions/arm_depthwise_separable_conv_HWC_q7.o
378 0 0 378 17a ./ConvolutionFunctions/arm_depthwise_separable_conv_HWC_q7_nonsquare.o
4 0 0 4 4 ./ConvolutionFunctions/arm_nn_mat_mult_kernel_q7_q15.o
4 0 0 4 4 ./ConvolutionFunctions/arm_nn_mat_mult_kernel_q7_q15_reordered.o
104 0 0 104 68 ./ActivationFunctions/arm_nn_activations_q15.o
48 0 0 48 30 ./ActivationFunctions/arm_nn_activations_q7.o
28 0 0 28 1c ./ActivationFunctions/arm_relu_q15.o
28 0 0 28 1c ./ActivationFunctions/arm_relu_q7.o
EXAMPLE - CIFAR-10
• arm_convolve_HWC_q7_RGB()
• arm_relu_q7()
• arm_maxpool_q7_HWC()
• arm_convolve_HWC_q7_fast()
• arm_relu_q7()
• arm_avepool_q7_HWC()
13
• arm_convolve_HWC_q7_fast()
• arm_relu_q7()
• arm_avepool_q7_HWC()
• arm_fully_connected_q7()
• arm_softmax_q7()
• conv1_wt: 2,400
• conv1_bias: 32
• conv2_wt: 12,800
• conv2_bias: 16
• conv3_wt: 12,800
• conv3_bias: 32
• ip1_wt: 10
• ip1_bias: 10
• input_data: 3K
• output_data: 10
• col_buffer: 3,200
• scratch_buffer: 40K
PERFORMANCE
CIFAR-10
speed show case
GRU
power-save show case

Contenu connexe

Similaire à CMSIS-NN

Putting Microservices on a Diet: with Istio!
Putting Microservices on a Diet: with Istio!Putting Microservices on a Diet: with Istio!
Putting Microservices on a Diet: with Istio!
QAware GmbH
 

Similaire à CMSIS-NN (20)

ArcSight Connector Appliance v6.0 Patch 2 Release Notes
ArcSight Connector Appliance v6.0 Patch 2 Release NotesArcSight Connector Appliance v6.0 Patch 2 Release Notes
ArcSight Connector Appliance v6.0 Patch 2 Release Notes
 
C&C Botnet Factory
C&C Botnet FactoryC&C Botnet Factory
C&C Botnet Factory
 
Putting Microservices on a Diet: with Istio!
Putting Microservices on a Diet: with Istio!Putting Microservices on a Diet: with Istio!
Putting Microservices on a Diet: with Istio!
 
Understanding kube proxy in ipvs mode
Understanding kube proxy in ipvs modeUnderstanding kube proxy in ipvs mode
Understanding kube proxy in ipvs mode
 
Auto cutmanual
Auto cutmanualAuto cutmanual
Auto cutmanual
 
PVS-Studio is ready to improve the code of Tizen operating system
PVS-Studio is ready to improve the code of Tizen operating systemPVS-Studio is ready to improve the code of Tizen operating system
PVS-Studio is ready to improve the code of Tizen operating system
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
9 steps to awesome with kubernetes
9 steps to awesome with kubernetes9 steps to awesome with kubernetes
9 steps to awesome with kubernetes
 
Continuous Security: From tins to containers - now what!
Continuous Security: From tins to containers - now what!Continuous Security: From tins to containers - now what!
Continuous Security: From tins to containers - now what!
 
Data Driven Decisions in DevOps
Data Driven Decisions in DevOpsData Driven Decisions in DevOps
Data Driven Decisions in DevOps
 
growthbotics audit.pdf
growthbotics audit.pdfgrowthbotics audit.pdf
growthbotics audit.pdf
 
20160221 va interconnect_pub
20160221 va interconnect_pub20160221 va interconnect_pub
20160221 va interconnect_pub
 
Sprint 138
Sprint 138Sprint 138
Sprint 138
 
IoT Gateways - A Market Overview of Selected Vendors v2d - July 2017
IoT Gateways - A Market Overview of Selected Vendors v2d - July 2017IoT Gateways - A Market Overview of Selected Vendors v2d - July 2017
IoT Gateways - A Market Overview of Selected Vendors v2d - July 2017
 
Open stack gbp final sn-4-slideshare
Open stack gbp final sn-4-slideshareOpen stack gbp final sn-4-slideshare
Open stack gbp final sn-4-slideshare
 
Node.js on microsoft azure april 2014
Node.js on microsoft azure april 2014Node.js on microsoft azure april 2014
Node.js on microsoft azure april 2014
 
Asuntos de escaneado de transporte pesado
Asuntos de escaneado de transporte pesadoAsuntos de escaneado de transporte pesado
Asuntos de escaneado de transporte pesado
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to Maintenance
 
Microcontroller part 2
Microcontroller part 2Microcontroller part 2
Microcontroller part 2
 
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 

CMSIS-NN

  • 2. RESOURCES • Source: https://github.com/ARM-software/CMSIS_5 • Web 1: https://developer.arm.com/embedded/cmsis • Web 2: http://www2.keil.com/mdk5/cmsis/ • Paper: https://arxiv.org/abs/1801.06601 • Manual: http://arm-software.github.io/CMSIS_5/NN/html/ index.html 2
  • 3. CMSIS 5.3.0 • http://www2.keil.com/mdk5/cmsis/ • https://developer.arm.com/embedded/cmsis • Cortex Microcontroller Software Interface Standard • CMSIS-NN first appeared in 5.2.1 dev 3 CMSIS-CORE CMSIS-RTOS CMSIS-DSP CMSIS-Driver CMSIS-SVD CMSIS-DAP CMSIS-Pack CMSIS-NNCMSIS-Zone (planned) 3
  • 5. CMSIS-NN • DSP: Cortex-M0 (N) / Cortex-M3 (N) Cortex-M4 (Y) / Cortex-M7 (Y) / Cortex-M33 (Optional) • For inference only with limited computation power • CPU: Dozens MHz to 192MHz Cortex-M4, 400 MHz Cortex-M7 • MEMORY: Dozens KB to a few MB • Kernels Support: q7t and q15_t fractional data type: [ -1.0, 1.0 ) • Functions • Neural Network Convolution Functions • Neural Network Activation Functions • Fully-connected Layer Functions • Neural Network Pooling Functions • Softmax Functions 5
  • 6. SUPPORT • Data conversion • arm_q7_to_q15_no_shift • arm_q7_to_q15_reordered_no_shift 6
  • 7. CONVOLUTION • arm_convolve_HWC_q7_basic • arm_convolve_HWC_q15_basic • arm_convolve_HWC_q7_fast • arm_convolve_HWC_q7_fast_nonsquare • arm_convolve_HWC_q7_RGB • arm_convolve_HWC_q15_fast • arm_convolve_1x1_HWC_q7_fast_nonsquare • arm_depthwise_separable_conv_HWC_q7 • arm_depthwise_separable_conv_HWC_q7_nonsquare 7
  • 8. ACTIVATION • ReLU • arm_relu_q7 • arm_relu_q15 • Sigmoid / Tanh • arm_nn_activations_direct_q7 • arm_nn_activations_direct_q15 8
  • 9. POOLING • Supports 1.7 format max-pooling and average-pooling • arm_maxpool_q7_HWC • arm_avepool_q7_HWC 9
  • 10. SOFTMAX • EXP(2) based softmax function • arm_softmax_q7 • arm_softmax_q15 10
  • 11. FULLY-CONNECTED LAYER • arm_fully_connected_q7 • arm_fully_connected_q7_opt • arm_fully_connected_q15 • arm_fully_connected_q15_opt • arm_fully_connected_mat_q7_vec_q15 • arm_fully_connected_mat_q7_vec_q15_opt 11
  • 12. FOOTPRINT - 9,306 text data bss dec hex filename 132 0 0 132 84 ./SoftmaxFunctions/arm_softmax_q15.o 154 0 0 154 9a ./SoftmaxFunctions/arm_softmax_q7.o 544 0 0 544 220 ./PoolingFunctions/arm_pool_q7_HWC.o 2816 0 0 2816 b00 ./NNSupportFunctions/arm_nntables.o 84 0 0 84 54 ./NNSupportFunctions/arm_q7_to_q15_no_shift.o 72 0 0 72 48 ./NNSupportFunctions/arm_q7_to_q15_reordered_no_shift.o 102 0 0 102 66 ./FullyConnectedFunctions/arm_fully_connected_q15.o 88 0 0 88 58 ./FullyConnectedFunctions/arm_fully_connected_mat_q7_vec_q15.o 476 0 0 476 1dc ./FullyConnectedFunctions/arm_fully_connected_mat_q7_vec_q15_opt.o 486 0 0 486 1e6 ./FullyConnectedFunctions/arm_fully_connected_q15_opt.o 86 0 0 86 56 ./FullyConnectedFunctions/arm_fully_connected_q7.o 532 0 0 532 214 ./FullyConnectedFunctions/arm_fully_connected_q7_opt.o 266 0 0 266 10a ./ConvolutionFunctions/arm_convolve_1x1_HWC_q7_fast_nonsquare.o 404 0 0 404 194 ./ConvolutionFunctions/arm_convolve_HWC_q15_basic.o 450 0 0 450 1c2 ./ConvolutionFunctions/arm_convolve_HWC_q15_fast.o 426 0 0 426 1aa ./ConvolutionFunctions/arm_convolve_HWC_q7_basic.o 434 0 0 434 1b2 ./ConvolutionFunctions/arm_convolve_HWC_q7_fast.o 434 0 0 434 1b2 ./ConvolutionFunctions/arm_convolve_HWC_q7_fast_nonsquare.o 428 0 0 428 1ac ./ConvolutionFunctions/arm_convolve_HWC_q7_RGB.o 298 0 0 298 12a ./ConvolutionFunctions/arm_depthwise_separable_conv_HWC_q7.o 378 0 0 378 17a ./ConvolutionFunctions/arm_depthwise_separable_conv_HWC_q7_nonsquare.o 4 0 0 4 4 ./ConvolutionFunctions/arm_nn_mat_mult_kernel_q7_q15.o 4 0 0 4 4 ./ConvolutionFunctions/arm_nn_mat_mult_kernel_q7_q15_reordered.o 104 0 0 104 68 ./ActivationFunctions/arm_nn_activations_q15.o 48 0 0 48 30 ./ActivationFunctions/arm_nn_activations_q7.o 28 0 0 28 1c ./ActivationFunctions/arm_relu_q15.o 28 0 0 28 1c ./ActivationFunctions/arm_relu_q7.o
  • 13. EXAMPLE - CIFAR-10 • arm_convolve_HWC_q7_RGB() • arm_relu_q7() • arm_maxpool_q7_HWC() • arm_convolve_HWC_q7_fast() • arm_relu_q7() • arm_avepool_q7_HWC() 13 • arm_convolve_HWC_q7_fast() • arm_relu_q7() • arm_avepool_q7_HWC() • arm_fully_connected_q7() • arm_softmax_q7() • conv1_wt: 2,400 • conv1_bias: 32 • conv2_wt: 12,800 • conv2_bias: 16 • conv3_wt: 12,800 • conv3_bias: 32 • ip1_wt: 10 • ip1_bias: 10 • input_data: 3K • output_data: 10 • col_buffer: 3,200 • scratch_buffer: 40K