SlideShare une entreprise Scribd logo
1  sur  54
Applying Artificial Intelligence on Edge devices using Deep
Learning with Embedded optimizations
User group meeting 3 29-06-2021
ai-edge.be
iot-incubator.be www.eavise.be
VLAIO TETRA HBC.2019.2641
Agenda
1. Introduction
2. Overview of frameworks
3. Academic use case
4. Use cases by the user group
1. E.D.&A.
2. TML & Digipolis
3. Melexis
5. Workshop
6. VLAIO
7. Networking
2
Neural network deployment
• Overview
• Design space model
• Examples
3
Neural network deployment
Edge Impulse flow:
Preprocessing:
• Example Academic use case
Deployment
4
Neural network deployment
• Deployment possibilities in Edge Impulse
• 2 options:
• Choose “library” (framework)
→ Download library for your hardware
• Choose “firmware” (device)
→ Acquires data via data forwarder (CLI tool from Edge
Impulse), runs network on device, reports back to online tool
5
Neural network deployment
Library deployment in Edge Impulse
• Frameworks:
• C++ & Arduino (TFLite)
• Cube.MX CMSIS Pack
• WebAssembly (node.js)
• TensorRT (Nvidia GPU)
• Our Target:
• Sensortile
• STM32L476JGY (Cortex M4)
• 80 MHz
• 128 KB RAM
• 1 MB Flash
6
Neural network deployment
Library deployment in Edge Impulse
• Model:
• TFLite model (2D conv)
• Possible optimizations:
• Quantizing 32 bit float → 8 bit int
• EON Compiler (Edge Optimized Neural)
• Compared to TFLite for µC compiler:
• 25-55% less RAM, 35% less Flash
• Same accuracy
• Compiles model to C++
• No interpreter necessary
• Not loading model @ runtime
7
Neural network deployment
Library deployment in Edge Impulse
• Results with (inference only)
• Quantization: 32 bit float → 8 bit int
• EON Compiler
8
Compiler Quantisation Accuracy (%) Latency (ms) RAM usage (kB) Flash usage (kB)
Normal
32 bit float 85.5 165 90.4 65.4
8 bit int 85.5 35 28.3 53.2
EON
32 bit float 85.5 165 73.0 46.6
8 bit int 85.5 35 20.6 33.5
User group interaction
Questions / remarks?
9
Update Academic use-
case
Detection of writing A or B by using the 3-axis accelerometers of
an STM Sensortile attached to a pen
10
Update Academic use-case: Overview
1. Hardware update
2. Data acquisition update
3. Analysis on optimal inference model
a. No pre-processing (raw data)
b. Spectral analysis - FFT length
c. Spectral analysis - LPF
d. Spectral analysis - HPF
e. Spectrogram - 1D conv
f. Spectrogram - 2D conv
4. Conclusion & overall best
11
Update Academic use-case: Hardware
• Design of Sensortile & pen holder
• Several iterations
• Click & slide mechanism
• Final 3D printed product
• Allows access to:
• On/off switch
• Micro-USB for charging & UART
• SWD-pins (Serial Wire Debug)
• SD-card slot
12
Update Academic use-case: Data
13
• Acquiring more training & test data
• During student project-week
• Colleagues
• Students
• New dataset:
• A-Left
• A-Right
• B-Left
• B-Right
• Idle
• Old dataset:
• A-Left
• B-Left
• Idle
Update Academic use-case: Analysis
14
• Analysis on optimal inference model
• Preprocessing: YES/NO?
• Fully interconnected (dense) layer: how many neurons?
• Which combination yields the best result?
• Minimal loss
• Maximal accuracy
• Latency/RAM usage
Update Academic use-case
Recap of the acquired data: (input)
One letter equals:
• Measurement for 1 second @ 100 Hz (101 datapoints)
• 3-axis accelerometer data (3x101 datapoints)
• Total = 303 datapoints (inputs)
Goal: classify to 5 labels (outputs)
15
Update Academic use-case: Raw data
1. Raw Data (no preprocessing)
Network topology: FFNN
16
303 inputs hidden layer (x neurons)
5 outputs
Update Academic use-case: Raw data
x-axis: number of neurons
y-axes: performance metrics
Full lines: 32 bit float
Dashed lines: 8 bit int (quantized)
Cyan: RAM with EON compiler
(accuracy, loss, latency equal)
17
Update Academic use-case: Raw data
Raw Data Conclusions
- Min. 30 neurons
- Accuracy ~80%
- Loss = high
- RAM: within specs
- EON compiler: < 2 kB
- Latency (7 → 2 ms)
18
Update Academic use-case: Spectrum
2. Spectral Analysis (FFT)
Network topology: FFNN
19
FFT inputs hidden layer (x neurons)
5 outputs
Update Academic use-case: Spectrum
2. Spectral Analysis
Options in Edge Impulse:
• Apply filter: None, LPF, HPF
• Cut-off frequency
• Filter order
• FFT length (power of 2)
20
Extracted features:
• RMS value after filter
• N peaks (value & frequency)
• (M-1) Power buckets
• with M edges
Total features: 1+2N+(M-1)
Update Academic use-case: Spectrum
2. Spectral Analysis
Options in Edge Impulse:
• Apply filter: None, LPF, HPF
• Cut-off frequency
• Filter order
• FFT length (power of 2)
21
Extracted features:
• RMS value after filter
• N peaks (value & frequency)
• (M-1) Power buckets
• with M edges
Total features: 1+2N+(M-1)
Update Academic use-case: Spectrum
2. Spectral Analysis
No filter, 5 peaks, 5 buckets
→ 48 features (inputs)
FFT length 64
x-axis: number of neurons
y-axes: performance metrics
Full lines: 32 bit float
Dashed lines: 8 bit int (quantized)
Cyan: RAM with EON compiler
(accuracy, loss, latency equal)
22
Update Academic use-case: Spectrum
2. Spectral Analysis
No filter, 5 peaks, 5 buckets
→ 48 features (inputs)
FFT length 128
x-axis: number of neurons
y-axes: performance metrics
Full lines: 32 bit float
Dashed lines: 8 bit int (quantized)
Cyan: RAM with EON compiler
(accuracy, loss, latency equal)
23
Update Academic use-case: Spectrum
2. Spectral Analysis
No filter, 5 peaks, 5 buckets
→ 48 features (inputs)
FFT length 1024
x-axis: number of neurons
y-axes: performance metrics
Full lines: 32 bit float
Dashed lines: 8 bit int (quantized)
Cyan: RAM with EON compiler
(accuracy, loss, latency equal)
24
Update Academic use-case: Spectrum
FFT Length Conclusions:
- Length: almost no effect
- Less inputs
- → Lower latency
- 20-30 neurons
- Accuracy ~80%
- Loss = lower
- Compared to raw data
- RAM: within specs
- EON compiler: < 1.5 kB
- Latency (2 → 1 ms)
25
Update Academic use-case: Spectrum
2. Spectral Analysis
Options in Edge Impulse:
• Apply filter: None, LPF, HPF
• Cut-off frequency
• Filter order → 6
• FFT length → 128
26
Extracted features:
• RMS value after filter
• N peaks (value & frequency)
• (M-1) Power buckets
• with M edges
Total features: 1+2N+(M-1)
Update Academic use-case: Spectrum
2. Spectral Analysis
LPF, 5 peaks, 5 buckets
→ 48 features (inputs)
FFT length 128
32 bit float (not quantized)
x-axis: number of neurons
y-axes: cutoff frequency (Hz)
27
Update Academic use-case: Spectrum
2. Spectral Analysis
LPF, 5 peaks, 5 buckets
→ 48 features (inputs)
FFT length 128
8 bit int (quantized)
x-axis: number of neurons
y-axes: cutoff frequency (Hz)
28
Update Academic use-case: Spectrum
2. Spectral Analysis
HPF, 5 peaks, 5 buckets
→ 48 features (inputs)
FFT length 128
32 bit float (not quantized)
x-axis: number of neurons
y-axes: cutoff frequency (Hz)
29
Update Academic use-case: Spectrum
2. Spectral Analysis
HPF, 5 peaks, 5 buckets
→ 48 features (inputs)
FFT length 128
8 bit int (quantized)
x-axis: number of neurons
y-axes: cutoff frequency (Hz)
30
Update Academic use-case: Spectrum
2. Spectral Analysis
Other performance metrics:
• Latency: Independent of cut-off frequency (~1-2 ms)
• Ram usage: similar as other spectral analysis performance (~2 kB)
Conclusions:
• LPF better for this use-case (info in lower frequencies)
• Filter necessary to reach higher accuracies: > 80%
31
Update Academic use-case: Spectrogram
3. Spectrogram = time and frequency information
- Split up time domain in frames
- Take FFT per time frame
Number of network input features:
- FFT L:128, 92 frames
- 5980 inputs per accelerometer axis!
- 17940 inputs total
32
3. Spectrogram = time and frequency information
• Large number of inputs
• Single dense layer: not sufficient
• Solution
• 1D convolutional architecture
• 2D convolutional architecture
Commonly used on images
33
Update Academic use-case: Spectrogram
3. Spectrogram = time and frequency information
Results (with EON compiler, 8 bit quantisation):
Confusion matrix:
34
Update Academic use-case: Spectrogram
Architecture Accuracy (%) Loss Latency (ms) RAM usage (kB) Flash usage (kB)
1D convolutional 84.8 0.39 70 38.2 36.2
2D convolutional 84.2 0.82 948 178.2 121.3
Update Academic use-case
Overall comparison
Y-axis: Accuracy (%)
Versus: Latency, Loss, RAM
One of the best:
• LPF 4 Hz
• FFT 128
• 86 neurons
35
Update Academic use-case
Conclusions:
• All methods yield implementable result
• Used method is application-specific
• Used method depends on requirements
• A lot of variables, use prior knowledge to make the right
decisions
36
User group interaction
Questions / remarks?
37
Use case by the user group: E.D.&A.
• Induction cooker with built-in ventilation unit
• Capacitive touch control
• Increasing/decreasing ventilation
• 2 buttons
• 4 states
• 7 segment display
• Raw ADC values
• 12 Hz sampling rate
• arbitrary idle value
• delta touch +/- 20
• Published on UART @ 9600 baud
38
E.D.&A.: Induction cooker ventilation
control
• Classical signal processing for touch detection
• Hardware and firmware setup fixed
• Setup developed by intern student (2019)
• Labeled data using mechanical finger
• data from a single button
• idle states
• increasing / decreasing induction
• wet touch buttons
• ...
39
Use case by the user group: E.D.&A.
Goals
1. Find out if a NN on microcontroller can
replace classical processing algorithm
a. Best fit model
b. Targets:
i. Cortex M4
ii. Cortex M0
2. Find out if a NN can improve immunity compared to
classical processing algorithm
a. Different levels of induction settings
b. Wet touch buttons
c. Different types of pots and locations
40
User group interaction
Questions / remarks?
41
TML: Introduction
42
TML: Introduction
43
Telraam project
• Easy traffic counting
• Using Raspberry Pi with camera
• Insights about traffic density with user supplied data
How?
• Classic ML & CV:
- Background subtraction: slow
- Classifier for object blobs: difficult and inaccurate
TML: Use Case
Goals:
• Traffic counting at home
• Using Raspberry Pi with camera
• 2 labeled data sets available
• Detecting 5 different classes: pedestrian,
bike, car, truck and other
• Frame rate of +/-5 fps
44
TML: Methodology
Use object detector to detect object class and location
Slow (~ seconds/frame) in normal DL framework
TF Lite is perfect for low power devices!
Combine with Object Detection API
45
TML: Detection
Pre-trained (MS COCO) SSD+MobileNetV2 in TF
Train further on mix of data sets
160x160 resolution
Dynamic range post training quantization
of weights (TF Lite default settings)
Export to TF Lite model
46
TML: Tracking
Passersby should only be counted once
track them!
Using motpy library
Detect when in certain “zone”
47
TML: Setup
Raspberry Pi 4 4GB RAM with Raspberry Pi OS
Python code, includes:
• TF Lite interpreter
• motpy tracker
• OpenCV
Valid .tflite model
.tflite compatible labelmap file
48
TML: Results
59% COCO mAP
85% PASCAL VOC mAP
Average detection: 0.2 seconds +/- 5 fps
49
TML: Improvements
• More data: better generalization!
Retrain model for better results
• In-depth optimization using TF Lite:
various quantization strategies + more to come!
50
TML: Demo
51
User group interaction
Questions / remarks?
52
Workshop
● 14/09/2021 afternoon
● Based on the academic use case
● Edge Impulse
53
Networking
54

Contenu connexe

Dernier

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Dernier (20)

COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 

En vedette

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Saba Software
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
Simplilearn
 

En vedette (20)

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
 

2021-06-29-ai-edge-UG3

  • 1. Applying Artificial Intelligence on Edge devices using Deep Learning with Embedded optimizations User group meeting 3 29-06-2021 ai-edge.be iot-incubator.be www.eavise.be VLAIO TETRA HBC.2019.2641
  • 2. Agenda 1. Introduction 2. Overview of frameworks 3. Academic use case 4. Use cases by the user group 1. E.D.&A. 2. TML & Digipolis 3. Melexis 5. Workshop 6. VLAIO 7. Networking 2
  • 3. Neural network deployment • Overview • Design space model • Examples 3
  • 4. Neural network deployment Edge Impulse flow: Preprocessing: • Example Academic use case Deployment 4
  • 5. Neural network deployment • Deployment possibilities in Edge Impulse • 2 options: • Choose “library” (framework) → Download library for your hardware • Choose “firmware” (device) → Acquires data via data forwarder (CLI tool from Edge Impulse), runs network on device, reports back to online tool 5
  • 6. Neural network deployment Library deployment in Edge Impulse • Frameworks: • C++ & Arduino (TFLite) • Cube.MX CMSIS Pack • WebAssembly (node.js) • TensorRT (Nvidia GPU) • Our Target: • Sensortile • STM32L476JGY (Cortex M4) • 80 MHz • 128 KB RAM • 1 MB Flash 6
  • 7. Neural network deployment Library deployment in Edge Impulse • Model: • TFLite model (2D conv) • Possible optimizations: • Quantizing 32 bit float → 8 bit int • EON Compiler (Edge Optimized Neural) • Compared to TFLite for µC compiler: • 25-55% less RAM, 35% less Flash • Same accuracy • Compiles model to C++ • No interpreter necessary • Not loading model @ runtime 7
  • 8. Neural network deployment Library deployment in Edge Impulse • Results with (inference only) • Quantization: 32 bit float → 8 bit int • EON Compiler 8 Compiler Quantisation Accuracy (%) Latency (ms) RAM usage (kB) Flash usage (kB) Normal 32 bit float 85.5 165 90.4 65.4 8 bit int 85.5 35 28.3 53.2 EON 32 bit float 85.5 165 73.0 46.6 8 bit int 85.5 35 20.6 33.5
  • 10. Update Academic use- case Detection of writing A or B by using the 3-axis accelerometers of an STM Sensortile attached to a pen 10
  • 11. Update Academic use-case: Overview 1. Hardware update 2. Data acquisition update 3. Analysis on optimal inference model a. No pre-processing (raw data) b. Spectral analysis - FFT length c. Spectral analysis - LPF d. Spectral analysis - HPF e. Spectrogram - 1D conv f. Spectrogram - 2D conv 4. Conclusion & overall best 11
  • 12. Update Academic use-case: Hardware • Design of Sensortile & pen holder • Several iterations • Click & slide mechanism • Final 3D printed product • Allows access to: • On/off switch • Micro-USB for charging & UART • SWD-pins (Serial Wire Debug) • SD-card slot 12
  • 13. Update Academic use-case: Data 13 • Acquiring more training & test data • During student project-week • Colleagues • Students • New dataset: • A-Left • A-Right • B-Left • B-Right • Idle • Old dataset: • A-Left • B-Left • Idle
  • 14. Update Academic use-case: Analysis 14 • Analysis on optimal inference model • Preprocessing: YES/NO? • Fully interconnected (dense) layer: how many neurons? • Which combination yields the best result? • Minimal loss • Maximal accuracy • Latency/RAM usage
  • 15. Update Academic use-case Recap of the acquired data: (input) One letter equals: • Measurement for 1 second @ 100 Hz (101 datapoints) • 3-axis accelerometer data (3x101 datapoints) • Total = 303 datapoints (inputs) Goal: classify to 5 labels (outputs) 15
  • 16. Update Academic use-case: Raw data 1. Raw Data (no preprocessing) Network topology: FFNN 16 303 inputs hidden layer (x neurons) 5 outputs
  • 17. Update Academic use-case: Raw data x-axis: number of neurons y-axes: performance metrics Full lines: 32 bit float Dashed lines: 8 bit int (quantized) Cyan: RAM with EON compiler (accuracy, loss, latency equal) 17
  • 18. Update Academic use-case: Raw data Raw Data Conclusions - Min. 30 neurons - Accuracy ~80% - Loss = high - RAM: within specs - EON compiler: < 2 kB - Latency (7 → 2 ms) 18
  • 19. Update Academic use-case: Spectrum 2. Spectral Analysis (FFT) Network topology: FFNN 19 FFT inputs hidden layer (x neurons) 5 outputs
  • 20. Update Academic use-case: Spectrum 2. Spectral Analysis Options in Edge Impulse: • Apply filter: None, LPF, HPF • Cut-off frequency • Filter order • FFT length (power of 2) 20 Extracted features: • RMS value after filter • N peaks (value & frequency) • (M-1) Power buckets • with M edges Total features: 1+2N+(M-1)
  • 21. Update Academic use-case: Spectrum 2. Spectral Analysis Options in Edge Impulse: • Apply filter: None, LPF, HPF • Cut-off frequency • Filter order • FFT length (power of 2) 21 Extracted features: • RMS value after filter • N peaks (value & frequency) • (M-1) Power buckets • with M edges Total features: 1+2N+(M-1)
  • 22. Update Academic use-case: Spectrum 2. Spectral Analysis No filter, 5 peaks, 5 buckets → 48 features (inputs) FFT length 64 x-axis: number of neurons y-axes: performance metrics Full lines: 32 bit float Dashed lines: 8 bit int (quantized) Cyan: RAM with EON compiler (accuracy, loss, latency equal) 22
  • 23. Update Academic use-case: Spectrum 2. Spectral Analysis No filter, 5 peaks, 5 buckets → 48 features (inputs) FFT length 128 x-axis: number of neurons y-axes: performance metrics Full lines: 32 bit float Dashed lines: 8 bit int (quantized) Cyan: RAM with EON compiler (accuracy, loss, latency equal) 23
  • 24. Update Academic use-case: Spectrum 2. Spectral Analysis No filter, 5 peaks, 5 buckets → 48 features (inputs) FFT length 1024 x-axis: number of neurons y-axes: performance metrics Full lines: 32 bit float Dashed lines: 8 bit int (quantized) Cyan: RAM with EON compiler (accuracy, loss, latency equal) 24
  • 25. Update Academic use-case: Spectrum FFT Length Conclusions: - Length: almost no effect - Less inputs - → Lower latency - 20-30 neurons - Accuracy ~80% - Loss = lower - Compared to raw data - RAM: within specs - EON compiler: < 1.5 kB - Latency (2 → 1 ms) 25
  • 26. Update Academic use-case: Spectrum 2. Spectral Analysis Options in Edge Impulse: • Apply filter: None, LPF, HPF • Cut-off frequency • Filter order → 6 • FFT length → 128 26 Extracted features: • RMS value after filter • N peaks (value & frequency) • (M-1) Power buckets • with M edges Total features: 1+2N+(M-1)
  • 27. Update Academic use-case: Spectrum 2. Spectral Analysis LPF, 5 peaks, 5 buckets → 48 features (inputs) FFT length 128 32 bit float (not quantized) x-axis: number of neurons y-axes: cutoff frequency (Hz) 27
  • 28. Update Academic use-case: Spectrum 2. Spectral Analysis LPF, 5 peaks, 5 buckets → 48 features (inputs) FFT length 128 8 bit int (quantized) x-axis: number of neurons y-axes: cutoff frequency (Hz) 28
  • 29. Update Academic use-case: Spectrum 2. Spectral Analysis HPF, 5 peaks, 5 buckets → 48 features (inputs) FFT length 128 32 bit float (not quantized) x-axis: number of neurons y-axes: cutoff frequency (Hz) 29
  • 30. Update Academic use-case: Spectrum 2. Spectral Analysis HPF, 5 peaks, 5 buckets → 48 features (inputs) FFT length 128 8 bit int (quantized) x-axis: number of neurons y-axes: cutoff frequency (Hz) 30
  • 31. Update Academic use-case: Spectrum 2. Spectral Analysis Other performance metrics: • Latency: Independent of cut-off frequency (~1-2 ms) • Ram usage: similar as other spectral analysis performance (~2 kB) Conclusions: • LPF better for this use-case (info in lower frequencies) • Filter necessary to reach higher accuracies: > 80% 31
  • 32. Update Academic use-case: Spectrogram 3. Spectrogram = time and frequency information - Split up time domain in frames - Take FFT per time frame Number of network input features: - FFT L:128, 92 frames - 5980 inputs per accelerometer axis! - 17940 inputs total 32
  • 33. 3. Spectrogram = time and frequency information • Large number of inputs • Single dense layer: not sufficient • Solution • 1D convolutional architecture • 2D convolutional architecture Commonly used on images 33 Update Academic use-case: Spectrogram
  • 34. 3. Spectrogram = time and frequency information Results (with EON compiler, 8 bit quantisation): Confusion matrix: 34 Update Academic use-case: Spectrogram Architecture Accuracy (%) Loss Latency (ms) RAM usage (kB) Flash usage (kB) 1D convolutional 84.8 0.39 70 38.2 36.2 2D convolutional 84.2 0.82 948 178.2 121.3
  • 35. Update Academic use-case Overall comparison Y-axis: Accuracy (%) Versus: Latency, Loss, RAM One of the best: • LPF 4 Hz • FFT 128 • 86 neurons 35
  • 36. Update Academic use-case Conclusions: • All methods yield implementable result • Used method is application-specific • Used method depends on requirements • A lot of variables, use prior knowledge to make the right decisions 36
  • 38. Use case by the user group: E.D.&A. • Induction cooker with built-in ventilation unit • Capacitive touch control • Increasing/decreasing ventilation • 2 buttons • 4 states • 7 segment display • Raw ADC values • 12 Hz sampling rate • arbitrary idle value • delta touch +/- 20 • Published on UART @ 9600 baud 38
  • 39. E.D.&A.: Induction cooker ventilation control • Classical signal processing for touch detection • Hardware and firmware setup fixed • Setup developed by intern student (2019) • Labeled data using mechanical finger • data from a single button • idle states • increasing / decreasing induction • wet touch buttons • ... 39
  • 40. Use case by the user group: E.D.&A. Goals 1. Find out if a NN on microcontroller can replace classical processing algorithm a. Best fit model b. Targets: i. Cortex M4 ii. Cortex M0 2. Find out if a NN can improve immunity compared to classical processing algorithm a. Different levels of induction settings b. Wet touch buttons c. Different types of pots and locations 40
  • 43. TML: Introduction 43 Telraam project • Easy traffic counting • Using Raspberry Pi with camera • Insights about traffic density with user supplied data How? • Classic ML & CV: - Background subtraction: slow - Classifier for object blobs: difficult and inaccurate
  • 44. TML: Use Case Goals: • Traffic counting at home • Using Raspberry Pi with camera • 2 labeled data sets available • Detecting 5 different classes: pedestrian, bike, car, truck and other • Frame rate of +/-5 fps 44
  • 45. TML: Methodology Use object detector to detect object class and location Slow (~ seconds/frame) in normal DL framework TF Lite is perfect for low power devices! Combine with Object Detection API 45
  • 46. TML: Detection Pre-trained (MS COCO) SSD+MobileNetV2 in TF Train further on mix of data sets 160x160 resolution Dynamic range post training quantization of weights (TF Lite default settings) Export to TF Lite model 46
  • 47. TML: Tracking Passersby should only be counted once track them! Using motpy library Detect when in certain “zone” 47
  • 48. TML: Setup Raspberry Pi 4 4GB RAM with Raspberry Pi OS Python code, includes: • TF Lite interpreter • motpy tracker • OpenCV Valid .tflite model .tflite compatible labelmap file 48
  • 49. TML: Results 59% COCO mAP 85% PASCAL VOC mAP Average detection: 0.2 seconds +/- 5 fps 49
  • 50. TML: Improvements • More data: better generalization! Retrain model for better results • In-depth optimization using TF Lite: various quantization strategies + more to come! 50
  • 53. Workshop ● 14/09/2021 afternoon ● Based on the academic use case ● Edge Impulse 53