Track 1 session 3 - st dev con 2016 - smart home and building
1. October 4, 2016
Santa Clara Convention Center
Mission City Ballroom
Smart Home & Building
voice remote controls, source localization,
beamforming ASR
Roberto Sannino
3. IoT evolution of Voice Automation:
the IoT voice assistant
3
…to Home / Office
Terminals
From professional
PC applications
…to Smart Mobiles
…to “Anything Connectable”
How can I
help you?
4. Voice Terminal
• Audio capture & render
• Signal processing
• Low power
• Constained geometry
Voice & data
Gateway
Seamless connectivity
MEMS microphones and Audio Quality
at system level
4
Cloud
• Natural Language Processing
• Dialogue Management
• Services
Play Music
Control Lighting, heating, …
News, sport, traffic, weather, …
Answer questions, create to-do lists, shopping lists, …
Place orders online, use other online services: taxi, pizza, …
5. Digital MEMS Microphones 5
SensingA/D and Digital i/f
ASIC
Sensor
Sound Inlet
PDM (Pulse Density Modulation) interface:
• 1 to 3 MHz
• 1-bit resolution
• Fully digital
• Capacitive membrane
• Omnidirectional
• Analog output
Digital MEMS microphones:
• ultra-compact, low-power, omnidirectional
• built with a capacitive sensing element and an IC interface
Bottom port
Top port
Top port metallic
Bottom port metallic
6. Microphone to STM32 Architecture 6
• Serial: SAI/I2S/SPI: 1 or 2 microphones share CLK and data line
• Parallel: GPIO: Up to 16 (or 32) microphones
• DFSDM (Digital Filter for Sigma Delta Modulator) dedicated interface [only on selected STM32 devices]
PDM
Audio IN IIR-HP IIR-LP
FIR-LP
Sinc3
dec=8
FIR-LP
Sinc3
dec=8/10/
16
16 bit
PCM
Digital
Audio
OUT
PCM
Gain
Control
2-Stage decimation filter IIR signal conditioning
PDM to PCM filter SW library for STM32 CubeSoftware
Hardware
Direct acquisition of digital MEMS Microphones
7. BlueCoin: the Robotic Ear
Augmented hearing and motion sensing
7
Sound
Localization
Embedded Processing
Motion, Activity
and Balance
Acoustic Beamforming
Bluetooth Low Energy
9. Indoor Voice Capture: the Problem 9
Audio input (e.g. music, or far-end speaker)
Reference signal (same as Audio Input)
Audio output (e.g. speaker’s voice, clean)
reflections, diffusion, …
Voice Acoustic Echo
background noise
10. Indoor Voice Capture: the Problem 10
Audio input (e.g. music, or far-end speaker)
Reference signal (same as Audio Input)
Audio output (e.g. speaker’s voice, clean)
reflections, diffusion, …
Voice Acoustic Echo
background noise
11. Audio Front End: Example of Signal
Processing Architecture
11
Beamforming
Audio
Analytics
Acoustic Echo
Cancellation
Statistical
Dereverberation
Auto Gain
Control
Trigger ASR
Noise Reduction
reference
Source
Localization
- Voice Activity Detection
- Statistical moments
- Noise estimation
- ...
Speech Recognition
embedded cloud
MEMS microphone
array
12. Audio Front End: Example of Signal
Processing Architecture
12
Beamforming
Audio
Analytics
Acoustic Echo
Cancellation
Statistical
Dereverberation
Auto Gain
Control
Trigger ASR
Noise Reduction
reference
Source
Localization
- Voice Activity Detection
- Statistical moments
- Noise estimation
- ...
Speech Recognition
embedded cloud
MEMS microphone
array
13. Software IP and ST Eco-system
Open Software Design Environment
13
Algorithms and system demonstrators for the Internet of Things.
Unleashing the power of embedded software
Bring your ideas to now!
BlueMicroSystem
STM32 ODE
STM32 Nucleo
development
boards
STM32 Nucleo
expansion
boards
STM32 Cube
software
STM32 Cube
expansion
software
Software libraries
BlueVoiceLink
SmartAcoustics
Example Projects
14. Audio SW IP and Eco-system 14
Audio
Analytics
Statistical
Dereverberation
Auto Gain
Control
Trigger ASR
Noise Reduction
reference
- Voice Activity Detection
- Statistical moments
- Noise estimation
- ...
3rd party ASR
embedded cloud
MEMS microphone
array
osxAcousticSL
osxAcousticBF osxAcousticEC
Each osxAcoustic library may be easily replaced by 3rd party SW IP
All are released under free evaluation and production licensing
15. Spatial Audio Processing 15
Beamforming
Source
Localization
- Voice Activity Detection
- Statistical moments
- Noise estimation
- ...
MEMS microphone
array
Estimates the Direction of Arrival of the Main
sound source
Independent from beamforming
May control the beam direction
Sound Localization: osxAcousticSL
Spatial Filter
Outputs the Audio that comes from a given
direction
Adaptively cancels audio signals coming from
other directions
Beamforming: osxAcousticBF
Freely licensed FW Libraries for STM32
http://goo.gl/4nXh8W
17. First Order Beam Patterns 17
Figure of 8
Simple subtraction
of 2 microphone outputs
Subtraction of 2 microphone outputs,
after one digital delay ∆.
∆ = acoustic latency from [m1] to [m2]
Cardioid
18. ST Beamforming Solution:
osxAcousticBF
18
End-fire cardioid beamforming based on two digital MEMS microphones
• Fine-tuned for ST Digital MEMS Microphones
Scalable performance Vs MIPS to fit application requirements
• 4 algorithm options
Strong BF
Endfire
± 35° around the microphone axis
≈ 84 MIPS of STM32F4
≈ 60°
Basic Cardioid
Endfire
± 85° around the microphone axis
≈ 11 MIPS of STM32F4
≈ 170°
19. osxAcousticBF – Algorithm Options 19
• Strong: back to back cardioid
and adaptive noise removal
filter
∆ = 𝑑
𝑐 ;
𝑐 = 𝑠𝑝𝑒𝑒𝑑 𝑜𝑓 𝑠𝑜𝑢𝑛𝑑
d
out
Delay = ∆
+ -
∆ = 𝑑
𝑐 ;
𝑐 = 𝑠𝑝𝑒𝑒𝑑 𝑜𝑓 𝑠𝑜𝑢𝑛𝑑
d
out
Delay = ∆
+ -
𝐷𝑒𝑛𝑜𝑖𝑠𝑒
• Cardioid basic:
1st -order Differential
Microphone Array (DMA)
• Cardioid denoise: a denoise
filter is added to the end fire
beam forming output
out
∆
+
-
∆
𝐸𝑛ℎ𝑎𝑛𝑐𝑒
𝑅𝑒𝑚𝑜𝑣𝑒
-
+
d
• ASR ready: same as the Strong, without the denoise filter.
Best performance for Automatic Speech Recognition applications.
20. Microphones Sensitivity Matching 20
• Key to optimal performance
• Best directivity results
• Best noise rejection
• Gain compensation API
• Adjust the amplitude of one microphone to match the other’s
• Gain compensation options
• Static gain offline computation
• Dynamic gain compensation
21. Polar Pattern Tests 21
Test setup:
• Microphone Array mounted on a rotating support
• Inter-microphone distance: 4mm
• Rotation in steps of 10 degrees
• Gaussian White Noise played by high quality loudspeaker
• Resulting beampattern
• Blue: omnidirectional microphone
• Red: «Basic cardioid» mode
• Green: «Strong» mode
BlueCoin eval platform
Integrated
MEMS micro-array
22. Beamforming: ASR Test 22
WORDS
NOISE
Male and female
spoken words - at 0°
Gaussian White
Noise - at 90°
Test setup:
Inputs
Output
4 synchronous output channels :
• Omnidirectional microphone
• Basic Cardioid
• ASR Ready
• Strong Cardioid
Recorded words are sent to Google ASR and recognition data are collected
BlueCoin eval platform
Integrated
MEMS micro-array
23. osxAcousticBF: ASR Test Results 23
ASRconfidence
Signal to Noise ratio
omnidirectional
cardioid
ASR
strong
24. Evaluation Systems 24
X-NUCLEO-CCA02M1 supports beamforming based on the 2 onboard MP34DT01-M
Beam steering can be implemented in architectures with >2 microphones by choosing each time a
different ordered couple of microphones
e.g. 4-microphone configurations enable implementation of 8 different cardioid beamforming
µ4 array: MEMS microphone side by side:
the smallest array you can build
4 x MP23DB01MM
25. Sound Source Localization 25
Signals are acquired by one or two couples of microphones in
order to estimate the sound Direction of Arrival (DoA)
Angle 𝛼 = Direction of Arrival
𝛼
26. osxAcousticSL
Sound Source Localization Library
26
• Scalable library allows MIPS Vs resolution trade-off
• Selectable angle resolution, up to 1 degree theoretical
• Selectable Algorithm
• Two algorithms implemented
• XCORR:
• GCC-PHAT:
• A simple Voice Activity Detector is included, based on energy threshold.
• Avoids false recognitions in case of low signal energy
Supports cm-sized microphone arrays
low-MIPS and low-resolution
Supports mm-sized Differential Arrays
27. Source Localization
Application considerations
29
Range
2 microphones cover a
range of 180°
4 microphones cover
a range of 360°
MIPS Performance
On a typical Home application source localization may run as a low priority task
Depending on the use case, localization info may not reqire continuous updates
(e.g. few times per second)
Due to spatial simmetry:
28. Acoustic Echo Cancellation
Removes echo of playback audio in speech capture application
30
AEC
(estimates room
reverberation)
Reverberant Room
Known Audio Source
e. g. music / voice
Single Microphone
application
STM32 is connected to both
the microphone and the
loudspeaker
The Open.AUDIO AEC library is an optimized STM32
port based on the Open Source project Speex:
http://www.speex.org/
osxAcousticEC
29. Putting together SW libraries
SmartAcoustic1
31
Beamforming
Acoustic Echo
Cancellation
reference
audio
Source
Localization
4-MEMS
microphones array
• Example project in source code built on STM32Cube software technology
• Includes acoustic Beam Forming, Echo Cancellation, and Source Localization.
• Immediate test and performance evaluation
User-selectable angle resolution
User-selectable activation treshold
Based on 4 MEMS microphones
360° localization range
User-selectable neam direction
User-selectable beamforming algorithm
Based on 4 MEMS microphones
GUI highlights the chosen microphone couple
Based on a single MEMS microphone
Reference audio is stored on STM32 FLASH
Uses Audio OUT to play back audio while
streaming cleaned speech on USB
30. SmartAcoustic1 32
Evaluation system
Software reference design
Multi –platform support
Supports STM32 Nucleo expansion boards
X-NUCLEO-CCA01M1
X-NUCLEO-CCA02M1
connected to a
NUCLEO-F446RE board
Supports BlueCoin
Integrated Audio and Sensors platform
31. Smart Home Use Case Discussion
The Internet Voice Assistant for Smart Home
33
• Audio capture and playback
• Automatic voice dialogue
• Cloud based
• Mixed Embedded/Cloud
• Internet connection
• Powered
• Plugged to Mains
• Battery Operated
Cloud
Typical Features
32. The Problem: Indoor Voice, Audio, Noise 34
Audio input (e.g. music, or far-end speaker)
Direct Acoustic Echo
Audio output (e.g. speaker’s voice, clean)
background noise
reflections, diffusion, …
Voice
33. Beamforming vs. AEC 35
Beamforming
Acoustic Echo
Cancellation
reference audio
Beamforming:
requires two (or more) microphones,
Is independent from the loudspeaker
AEC:
requires a single microphone,
must connect also to the audio OUT path
• AEC (tries to) cancel the Direct
Acoustic Echo and its reflections
• Beamforming (tries to) cancel every
signal that is not «on the beam»
34. Combining Beamforming and AEC
Beamforming
Acoustic Echo
Cancellation
reference audio
ASR
ASR
One of the microphones
all microphones
Best ASR score
is chosen
Alternative solution, based on ASR confidence ranking
36
35. Combined Beamforming and Localization
in noisy environments
Beamforming
Multiple beamforming in parallel
Select
based
on
ASR
score
ranking
Source localization may be an implicit result
of multiple beamforming & ASR ranking
ASR
Beamforming ASR
Beamforming ASR
Beamforming ASR
cloud
embedded
NOTE: osxAcousticSL Acoustic Source Localization library is not
effective in presence of strong Noise, Reflections and Reverberations.
37
36. Example of System Implementation 38
Beamforming
Acoustic Echo
Cancellation
reference audio OUT
ASR
Concurrent execution of multiple beamforming, AEC, and ASR
Select
based
on
ASR
score
ranking
ASR
Beamforming ASR
Beamforming ASR
Beamforming ASR
one microphone
cloud
embedded
Hint: consider sensing the loudness level to switch off algorithms when they are not needed!
37. MEMS Microphone Array
to Cloud Architecture
39
Integrated Terminal
Audio Front End Signal Processing
Communication
Interface
3rd Party
Cloud-based
ServicesGateway