SlideShare une entreprise Scribd logo
1  sur  81
Télécharger pour lire hors ligne
Interactive Voice Con
Successful smart speakers &
voice-enabled products
Platinum Sponsor
Introducing the Speakers
Paul Beckmann
• PhD, MS, and BS from MIT. All in EE.
• Technical specialties: signal
processing, audio product
development, and tools.
Mike Klasco
• Combined MS/PhD ABT NYU
• Audio product development,
acoustics, transducers, materials
and sourcing
2020 Interactive Voice Con
Founder and CTO of DSP Concepts Founder and CEO, Menlo Scientific
Outline
• Kickoff
• Voice Processing Theory [45 minutes]
• Algorithms
• Measuring performance
• Processor requirements
• Product design guidelines
• Break [15 minutes]
• Demos? [15 minutes]
• What Happens in Practice [30 minutes]
• Microphone integration issues
• The enclosure – the space between the mics and speakers
• Loudspeakers, acoustic
• Q&A [15 minutes]
2020 Interactive Voice Con
Speech Recognition
2020 Interactive Voice Con
Types of Voice Recognition Algorithms
• Voice trigger
• Identifies a single word or phrase like “Alexa” or “Hey Siri”
• Small vocabulary voice recognition
• Fixed vocabulary set for embedded applications. 10’s of
words.
• “Turn on the lights”, “Next track”, etc.
• Full voice recognition
• Large vocabulary set. 1,000’s of words
• “Play Beatles”
• Natural language understanding (NLU)
• Combines application specific information for more flexible
user interface
• “Play Music by the Beatles”, “Give me Beatles Music”, “I want
to listen to music by the Beatles”
• Can be combined with small vocabulary set
2020 Interactive Voice Con
Audio Front End = Microphone Cleaner
2020 Interactive Voice Con
Audio Front
End
Voice
Recognition
Mic Array N Channels 1 Channel
The Audio Front End (AFE) cleans up signals to improve the
performance of the voice recognition. It is like glasses for a camera.
Interfering
Noise
Device
Playback
Desired
Speech
Audio Front End Details
2020 Interactive Voice Con
Echo
Canceler
Trigger Word
& Voice
Recognition
Mic Array N Channels
1 Channel
Direction
of Arrival
Noise
Reduction
Beam-
former
Eliminates
loudspeaker
sound during
device playback
Determines
location of
sound source.
Used to steer
beamformer.
Combines multiple
microphone signals
to improve signal
quality.
Removes
various types of
noise
Comparing Amazon and Google
• 2 microphones only
• 65 to 71 mm spacing
• Mono or stereo
• High-end application processor required
• No variation in products
• No variation in performance
• Performance lags behind AVS
2020 Interactive Voice Con
Google AFE and Trigger
Word
3rd Party
AFE
Amazon
Trigger
Word
ASR
ASR
• Any number of microphones
• Any spacing
• Any number of playback channels
• Application processor or MCU solutions
• Wide variety of designs
• 2 to 7 microphones
• Different form factors
• Better performance
• Low cost designs possible
AVS Integration for AWS IoT
• Cost effective way to add Alexa voice
features
• Connects to the cloud
• Uses an RTOS and lightweight MQTT
network stack
• Suitable for low cost microcontrollers
• Will expand voice to a much larger
number of products
2020 Interactive Voice Con
https://docs.aws.amazon.com/iot/latest/developerguide/avs-integration-aws-iot.html
(AKA. “Alexa for Microcontrollers”)
Trigger Word
• Voice recognition algorithm trained for a single word or phrase
• “Alexa”, “OK Google”, “Bixby”, “Siri”, “Cortana”, etc.
• Available from multiple suppliers
• Amazon, Google, Baidu, etc.
• Sensory “Truly Handsfree”
• PicoVoice / SoundHound / Cyberon / etc.
• They all use machine learning
• Often optimized for low power consumption
• Sound → Voice Activity Detector → Key word detector
• Large models perform better
• Sensory: 17 kbyte → 1 Mbyte
2020 Interactive Voice Con
Characterizing Trigger Performance
• Probability of False Alarm
• How many times does the algorithm
accidentally trigger over a 24-hour period?
• Probability of Miss
• What % of trigger words are not detected by the
algorithm
• Trigger word algorithms have an adjustable
“sensitivity” setting that allows you to tradeoff
false alarms and misses.
• Amazon requires <3 false alarms per 24 hours
of continuous speech
2020 Interactive Voice Con
False Alarm Rate
ProbabilityofDetection
100%
Ideal operating point
Tune sensitivity based on allowable
false alarm rate
Wake Word Performance in Noise
SNR at microphone is main driver
of wake word performance
• Independent of distance
• Independent of room
reflections / reverb (for normal
household environments)
Improve your SNR to improve your wake
word performance.
2020 Interactive Voice Con
Beamforming
2020 Interactive Voice Con
Beamforming Principles
• Beamformers are spatial filters. They
pass signals from certain directions and
reduce signals from other directions.
• Performance depends heavily upon the
geometry of the microphone array
• Fixed beamformers utilize FIR filters
• Time domain or frequency domain
• There are many ways to compute the filter
coefficients (MVDR, DAS, etc.)
2020 Interactive Voice Con
h1[n]
h2[n]
h3[n]
h4[n]
FIR Filters
DSPC Design Method: Maximize SNR
• Inputs to design
• Microphone geometry
• Look angle and beam width
• Diffuse field noise level
• Microphone SNR
• Signal is person’s voice in specified beam
• Noise = diffuse field noise + microphone self
noise
• Iterative design procedure maximizes SNR
2020 Interactive Voice Con
SNR vs. Frequency
2020 Interactive Voice Con
Optimal Array Geometries
2020 Interactive Voice Con
Far Field Products
180 or 360 Degree
Smart speakers
Middle of the room
180 Degree
Set-top box
Side of the room
Flat Line
Array
TVs, appliances
On a wall
High-End
Standard
Low-Cost
40 to 70 mm diameter works.
70 mm works the best
25 mm spacing between mics
75 mm total length
+7 dB +6.5 dB
+5 dB
+2 dB
+3 dB
+2 dB
+4 dB
SNR vs. Mic Geometry
Assumptions:
• 71 mm diameter
• Microphone array is in
diffuse field noise with SNR
= 50 dB
• Speech is at 60 dB in the
direction of the beam
• Beam width is 45 degrees
• Microphone SNR = 65 dB
• Look angle = 0 degrees
2020 Interactive Voice Con
Linear Arrays
• Linear arrays work well when in an end-fire
configuration.
• Requires person to be in a specified location.
• Provides 4 to 5 dB SNR improvement
• Broadside arrays work poorly and should be
avoided.
• Very little SNR improvement to low frequencies where
the bulk of speech energy is
• Use broadside arrays only as a last resort when the
industrial design dictates no other options
• Television
• Wall panel
2020 Interactive Voice Con
End-fire
Broadside
Intuition: beamformers use time
differences to steer beam. In broadside,
voice arrives at the same time at both
mics.
Noise Reduction
2020 Interactive Voice Con
Stationary Noise Reduction
2020 Interactive Voice Con
Before
After
Example demonstrates improvement
in automotive environments
• Effective against:
• Fan noise
• Automotive road noise
• Microphone self noise
• Creates a model of the background
noise and then removes in real-time
• Improves ASR performance by 2 to 3
dB
Interference Canceler
• Effective against noise from:
• TVs
• Appliance self noise
• Air conditioners
• Requires a minimum of 2 microphones
• Combines beamforming, adaptive filtering,
and other statistical signal processing
techniques
• Effective for music and speech interferers
• Improves ASR performance up to 30 dB!
2020 Interactive Voice Con
2 Microphone Example
Adaptive Interference Canceler Performance
2020 Interactive Voice Con
• Measured in a typical living
room environment
• Interfering music noise
played
• Speech at constant level (62
dBC) at DUT
• Varied music level
• Speech and noise 2 meters
from DUT
Echo Plus
7-mic
DSPC 2-
mic
DSPC 4-
mic
8 dB
better
DSPC 6-
mic
11 dB
better
Echo 2
7-mic
Relative to Amazon Echo Plus and Echo 2
AEC
2020 Interactive Voice Con
Acoustic Echo Cancellers (AEC)
• Eliminates loudspeaker sound at the microphone
• Enables Voice UI to function while music or text-to-
speech is active
• Music is usually ducked after the wake word is detected
• Best algorithms operate in the frequency domain
• Better cancellation
• Faster convergence
• Lower computation
• ERL = Echo Return Loss quantifies performance = How
many dB of loudspeaker signal is canceled by the AEC
Demo Setup
Single microphone with
loudspeaker close to the mic.
Mono playback in home
environment.
Factors Affecting AEC Performance
• What type of algorithm are you using?
• Time domain vs frequency domain
• LMS vs Kalman vs Other?
• Echo tail length
• How many msec of audio can you cancel?
• Longer is better but requires more processing
and memory
• Far-field smart speakers require 150 to 200
msec of echo tail
• Reverberation time of the room (lower
is better)
• Linearity of your loudspeakers
2020 Interactive Voice Con
Speaker Distortion Affects AEC
• This is usually the limiting factor for AEC performance
• Loudspeakers distort when playing loud or low frequencies
• Speakers need to be tuned to minimize distortion
• Rule of thumb:
1% THD AEC up to 40 dB
2% THD AEC up to 34 dB
3% THD AEC up to 30 dB
5% THD AEC up to 26 dB
10% THD AEC up to 20 dB
• Product developers must tradeoff low frequency sound quality vs. voice
performance
2020 Interactive Voice Con
Rule of Thumb for Speaker Distortion
1. Play a low frequency sine wave through
your loudspeaker and plot the
spectrum
2. You’ll see harmonics at multiples of the
fundamental frequency
3. The largest harmonic determines the
absolute limit of the echo canceler
4. ERLE performance based on difference
between fundamental and harmonic
5. Repeat at different output levels and
frequencies
2020 Interactive Voice Con
OK. 30 dB down = 30 dB max ERLE.
Bad. 15 dB down = 15 dB max
ERLE
AECs and Speaker Processing
2020 Interactive Voice Con
Reference signal must be taken
after nonlinear processing
DRC = Dynamic range compression.
This includes nonlinear processing like
compressors and limiters
EQ
Ref
DRC DAC AMP
EQ
Ref
DRC DAC AMP
Cross-
Over
Crossovers after the DRC are
allowed. Higher order crossover
perform better.
Multichannel Echo Cancelers
• Some applications
require multichannel
echo cancelers (e.g.,
soundbars)
• For optimal performance,
you need to cancel all the
channels. Downmixing
reduces performance.
• The example to the right
shows what happens
when you have a 3
channel product and
apply a 2 channel AEC
2020 Interactive Voice Con
Full performance when using a
3 channel AEC to cancel L, R,
and C speakers.
Reduced performance when
downmixing to 2 channels and
using a stereo echo canceler.
L’ = L + 0.5 * C
R’ = R + 0.5 * C
Performance reduced
by 5 to 10 dB
Woofer Reference Mic
2020 Interactive Voice Con
• Work done in conjunction with Vesper
• Uses a new high AOP microphone
placed directly in front of the woofer
• Advanced processing improves ERL by
up to 15 dB
• Trigger word performance at max
playback level:
• Standard processing: 63%
• Advanced processing: 91%
• Similar feature used in the HomePod
Performance Testing
Amazon
Amazon Test Setups
2020 Interactive Voice Con
Used for
most tests
Used for AEC test only
Understanding Amazon Results
• False Alarm Tests
• Number of false alarms using Amazon’s 24-hour continuous talking test track
• The lower the better
• Trigger Detection
• % of time that the device wakes up when “Alexa” is spoken
• Tested in silence, kitchen noise, music noise, and during music playback
• The higher the better
• Response Accuracy Rate (RAR)
• % of time that the cloud accurately understood the question (i.e., “Alexa, what is the
capital of China”)
• Tested in silence, kitchen noise, and music noise
• The higher the better
2020 Interactive Voice Con
Testing Scenarios
Silence
No interfering sound, uttering “Alexa” at 62 dBC
Kitchen Noise (0, -3 dB, -6 dB)
Alexa utterance at 62 dBC / Noise at 62, 65, and 68 dBC
Music Noise (0, -3 dB, -6 dB)
Alexa utterance at 62 dBC / Music at 62, 65, and 68 dBC
Acoustic Echo Canceler
Music playback at 90 dBC while trigger words are played at 62 dBC.
2020 Interactive Voice Con
Living Room Results – Trigger Detection
2020 Interactive Voice Con
Living Room Results - RAR
2020 Interactive Voice Con
Processor Requirements
Many Performance Levels
Low Power / Near-field
1 or 2 mics
ARM Cortex-M4
20 to 30 MHz
Basic Far-Field
2-mics. Mono
ARM Cortex-M7 or Cortex-A53
200 MHz
High-Performance Far-Field
4+ mics. Stereo
ARM Cortex-A53
350 to 600 MHz
High-Performance Far-Field
4+ mics. Multichannel
ARM Cortex-A53
900 to 1200 MHz
2020 Interactive Voice Con
Processor Comparisons
2020 Interactive Voice Con
ARM Cortex-M4
ARM Cortex-M7
ARM Cortex-A35
ARM Cortex-A53
ARM Cortex-A72
Tensilica HiFi 4
0.26
0.45
0.37
0.48
0.98
1.00
Processor efficiency per MHz. The larger the better.
ST, NXP, Renesas, Ambiq, Quicklogic
ST, NXP
Mediatek
NXP, Amlogic, Qualcomm
Coming soon!
NXP, Mediatek, Amlogic
ARM Cortex-A53 is the sweet
spot for smart speakers.
Recommended Designs
Smart Speaker Designs
• 360-degree operation
• Microphones on top of product
• 40 to 75 mm diameter
• Physically separate microphones and
loudspeakers for best performance
• Mono or stereo playback
High-End
Standard
2020 Interactive Voice Con
Sound Bar Designs
• 180-degree operation
• Microphones on top of product near center of device
• 60 to 75 mm design
• Physically separate microphones and loudspeakers
for best performance
• Stereo or multichannel playback (up to 7 reference
channels)
• Compatible with Dolby Atmos
High-
End
Standard
2020 Interactive Voice Con
TV Designs
Placement options
• Top is better than bottom
• Further away from speakers
• Bottom usually wins out because of
lower cost
• Mics do not have to be centered
• 2 mics sufficient
2020 Interactive Voice Con
Good
Better
Set-Top Box Designs
• Top of Device
• 180-degree operation
• Microphones on top of product
• Tethered “puck”
• 360-degree operation
• Microphones on top of product
• Support for optional internal
speaker for voice playback
• Audio playback through HDMI
High-
End
Standard
2020 Interactive Voice Con
Appliance / Tablet Designs
• 180-degree operation
• 2 or 4 microphone linear array
• 25 to 75 mm design
• Physically separate microphones
and loudspeakers for best
performance
• Mono or stereo playback
Good
Better
2020 Interactive Voice Con
Design Guidelines – Microphones
2020 Interactive Voice Con
Far Field Products
• Microphones should be placed on the top of the product, if possible.
• Microphones should be on a flat horizontal surface
• Microphones should be visible to the user (not occluded)
• Flat line arrays are not recommended. These are only last choice, if
necessary. (Microphone arrays work best if the microphones are
displaced in the horizontal plane)
• Microphones need to be properly ported (see design guidelines from
microphone vendor)
• 4 microphones is sufficient for most products
Design Guidelines – Microphones
2020 Interactive Voice Con
Far Field Products
• SNR of 65 dB. Higher SNRs provide no benefit for voice recognition but has
benefits for voice communication
• Gain matching:
• +/- 1 dB in the range 200 to 6 kHz (recommended)
• +/- 1dB in 200 to 4 kHz and +/-3 dB in 4k to 7 kHz (required)
• Microphone AOP must be high enough so that the system doesn’t clip when
loudspeakers are played at full volume. Recommendations:
• 120 dB for smart speakers
• 130 dB for sound bars
• 40 to 70 mm microphone spacing is recommended. As small as 20 mm is
possible with some degradation in performance.
Microphone Acoustical Porting
2020 Interactive Voice Con
(No Common Cavity)
MEMS
Mic
Vent
hole
Case
PCB
MEMS
Mic
Vent
hole
You need individual gaskets to
make a direct connection
between each mic and its vent
hole
If you block a microphone hole
with putty, you should see the
level drop by at least 30 dB
MEMS
Mic
Case
PCB
MEMS
Mic
Gasket Gasket
This design with a common
cavity shared by all
microphones won’t work.
Design Guidelines – Microphones (A)
2020 Interactive Voice Con
In Ear Products
• 2 microphones are sufficient for most products
• Use 2 microphones in an end fire configuration
pointing towards the mouth
• Space microphones as far apart as possible. 10
mm is the minimum spacing. 20 mm is
preferred
• Microphone on end of “boom” improves
performance
End of first session
Overview
2020 Interactive Voice Con
What Happens in Practice
• Microphone selection
• The Physical world in front of the
mic
• No Man’s Land between the mic
and speaker (leakage)
• Loudspeakers – good, bad and ugly
• Software integration issues
MEMs Microphone selection cheat sheet
• Analog or digital?
• Analog single-ended or
balanced?
• Top or bottom port?
• Standard size or compact ?
• AOP – Acoustic Overload
Point?
• S/N – Signal to Noise?
• Sensitivity (asic gain)?
• Robustness (IPXX)?
2020 Interactive Voice Con
MEMs Microphones – what is inside?
• MEMs mic element + ASIC in a package
• Wiring between mems mic die and
ASIC
• Typical package envelope of 3.50mm x
2.65mm x 0.98mm
• Smaller foot print on some models but
reduced back volume = reduced s/n
• Faraday shield on some models
2020 Interactive Voice Con
Microphones - Analog vs digital?
What are the mic inputs on codec or soc (System On Chip)?
• Analog single-ended
• Analog pseudo balanced
• Digital – PDM
2020 Interactive Voice Con
Microphones – Top or Bottom port?
• The MEMs smt package can have the
sound aperture either on the top or
bottom
• If on the bottom then the circuit
board it is flow soldered to the flex
pcb) and have a hole that aligns to
the MEMs mic port
• Bottom port warning
• Sealing - back port smt seal eyelet
2020 Interactive Voice Con
Microphones – signal to noise
• S/N was once a deal killer for most
serious applications, MEMs mics
have caught up with ECMs with
commodity analog and digital
MEMs reaching beyond 60 dB s/n.
• Active noise canceling
headphones, hearing aides, voice
command desire 65 dB s/n or
better
• 70+ dB from a few vendors by the
start of 2021 (but this keeps
slipping!)
• Better s/n = less mics?
Some discussion of higher s/n enables
reduction in mics required
2020 Interactive Voice Con
Microphones
• Analog MEMs mics - single-ended or balanced differential outputs?
• balanced output analog is good defensive engineering if your product
will have longer wire runs, digital noise, emi/rf floating around
• How differential is MEMs mic topology?
True differential capacitive MEMs mics use dual grids for improved noise
immunity over single ended for high noise immunity
2020 Interactive Voice Con
Microphones - Digital
• Digital MEMs mics offer greater immunity to interference than analog
MEMs
• time to market considerations avoiding having to tweak and rework your
board layout if noise problems await you, then digital is the way to go
• If the mic performance is critical for your type and class of product analog
may be better with external premium codec (both AOP and noise floor
2020 Interactive Voice Con
Microphones – Acoustic Overload Point (AOP)
• Is AOP due to mic element saturation vs asic overload clipping?
• MEMs analog mics typically have better acoustic overload point (aop) which is
where serious distortion sets in (codec overload before MEMs mic element)
• Analog MEMs overload a bit more gracefully than digital as when an A/D codec
overloads it is a line in the sand and nasty.
• Digital MEMs aop can be as low as 116 dB and more typically 120 dB. Analog
aop tends to be over 120 dB and can be 130+ dB on some MEMs mics.
• Vesper’s piezo MEMs mics have versions with very high AOP.
2020 Interactive Voice Con
Microphones - Directivity
• MEMs mics are omni-directional
• For achieving directional
characteristics they are used in arrays
• One requirement for mic arrays is that
the mics are closely matched in
sensitivity and response and will be
able maintain that uniformity over
time
2020 Interactive Voice Con
The physical world in front of
the mic
2020 Interactive Voice Con
Microphones – the world around the mic
Key topics
• MEMs mics are mounted to flex
PCB using smt reflow along
with the rest of the smt
components
• Port Helmholtz resonance –
moving it out of band
• The port and wind noise
• Laminar entry
• Acoustic mesh
2020 Interactive Voice Con
Microphones -
What are membranes for?
Woven and non-woven used for;
• wind noise, water blocking
• acoustic resistance determines crossover to DSP wind noise filtering
• Dust problems – internal membrane (within package) blocks smt reflow gasses
• Field use issue - shift over time
- gunk in the membrane over the mics facing facing stove top
2020 Interactive Voice Con
Microphones – the world around the mic
Wind noise blocking/acoustic mesh
• Mic element overloaded/
saturated by wind
• Wind pressure must be blocked
acoustically (acoustic resistance
membrane)
• Mic overload cannot be fixed by
DSP (but some turbulence can
be filtered out)
• Acoustic mesh can also block
liquids
• (hydrophobic & oleophobic )
2020 Interactive Voice Con
Microphones – the world around the mic
Port and wind noise
• Laminar entry (flared aperture)
• (turbulence in port to be
avoided)
• Port Helmholtz resonance peak
– moving it out of band
• Acoustic mesh damps peak Q
2020 Interactive Voice Con
The physical world between the
mic and speaker
2020 Interactive Voice Con
Leakage between the mic & speaker
Audio output leakage is both airborne and through the enclosure structure
• Minimizing Airborne leakage
• keep the mic(s) and speakers as far apart as possible
• avoid overlapping the mic(s) pickup pattern and speaker radiation pattern
• Structural transconduction (microphonics)
• Enclosure housing – ribs, joints, wall thickness
• Plastics are not all equal
• speaker sub-enclosure isolation mounts (grommets or gaskets)
• mic isolation
2020 Interactive Voice Con
-
Construction and Materials
• Plastics have different
acoustical characteristics
• Stiffness and damping are
key factors
• Compatibility considerations
• Shrink
• Tool temperature
• Flow
• Impact strength
• Sink marks/wall thickness
2020 Interactive Voice Con
-
Construction and Materials
The Incumbent plastics
• ABS
• PC
• ABS+PC
• PP
2020 Interactive Voice Con
-
Construction and Materials
• TreBlend (Ineos) PA/SAN
• Cellulose Plastics
• Treva (Eastman)
• Symbio (Sappi)
• Thicker walls/ ribs without sink marks
Acoustically engineered plastics
2020 Interactive Voice Con
Genelec M040 – NCE enclosure
The physical world of speakers
and the AEC Achilles heel -
distortion
2020 Interactive Voice Con
- Enclosure Mechanical Engineering E
• Open the window more and more bugs come in
• More power and more bass = no gain without pain
• Increase acoustic output before feedback and AEC breakdown by
reducing the cabinet resonance peak
• Extending low-end response of product will shake things up more
2020 Interactive Voice Con
Speaker Nonlinearities AEC issues
• Speaker distortion nonlinearities are the enemy of AEC
• Loudspeaker nonlinearities effect AEC
• - low-end distortion impact on aec yet not audible for listening
• Fine tuning of suspension and motor nonlinearities are critical
• or source off-the-shelf application-specific speakers optimized for AEC and ANC
2020 Interactive Voice Con
-
50 mm AEC / ANC optimized speakers
• Application-specific ANC and AEC high
linearity /lower distortion speakers to meet
TIA 930
• Typically around 50 mm diameter
• SEAS
• Tymphany
• Stetron
2020 Interactive Voice Con
subVo servo feedback correction
Next generation solution for increased AEC headroom
• subVo bend-sensor provides distortion reduction at
the lower octaves enabling increased AEC
headroom
• Precision position sensor provides error correction
feedback
• 10 dB of feedback = 10 dB of piston range distortion
reduction
2020 Interactive Voice Con
Software Integration Issues
2020 Interactive Voice Con
Software Integration Challenges
• Real-time CPU load
• Wrong interrupt levels
• Dropping samples / blocks
• Non constant latency between mics and reference signals
• Misconfigured PDM filters
• Different clocks for mics and reference signals
2020 Interactive Voice Con
Example #1: Noisy PDM Microphones
PDM to
PCM
Converter
PCM
Samples
PDM
Bitstream
Problem Statement
• ASR accuracy only 72% in quiet speech conditions
• High quality microphone:
• -41 dB sensitivity / 66 dB SNR
• Noise floor expected at 28 dBA
• Noise floor measured at 39 dBA
• Root cause
• PDM to PCM converter was implemented with
16-bit math
• Generated noise floor was at -96 dBFS → 39
dBA
• Solution
• Implement PDM to PCM conversion in software
• ASR accuracy improved to 94%
Example #2: Incorrect thread priorities
CPU Load Problems
Audio processing was taking 18% on average but there
were large spikes. Bluetooth thread priority was
incorrectly set higher than real-time audio processing.
Corrected Thread Priorities
Steady and consistent CPU load
0
20
40
60
80
100
120
140
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
CPU Load over Time
Peak Average
0
10
20
30
40
50
60
70
80
90
100
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
CPU Load over Time
Peak Average
Q&A
2020 Interactive Voice Con

Contenu connexe

Tendances

Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetHostedbyConfluent
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BIDataWorks Summit
 
Camel Day Italia 2021 - Camel K
Camel Day Italia 2021 - Camel KCamel Day Italia 2021 - Camel K
Camel Day Italia 2021 - Camel KNicola Ferraro
 
Introduction to JDF / JMF
Introduction to JDF / JMFIntroduction to JDF / JMF
Introduction to JDF / JMFStefan Meissner
 
Introducing Lenovo XClarity: Simplified Hardware Resource Management
Introducing Lenovo XClarity: Simplified Hardware Resource ManagementIntroducing Lenovo XClarity: Simplified Hardware Resource Management
Introducing Lenovo XClarity: Simplified Hardware Resource ManagementLenovo Data Center
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionJoel Koshy
 
Kafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka ConsumersKafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka ConsumersJean-Paul Azar
 
ksqlDB로 시작하는 스트림 프로세싱
ksqlDB로 시작하는 스트림 프로세싱ksqlDB로 시작하는 스트림 프로세싱
ksqlDB로 시작하는 스트림 프로세싱confluent
 
Confluent Tech Talk Korea
Confluent Tech Talk KoreaConfluent Tech Talk Korea
Confluent Tech Talk Koreaconfluent
 
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV ClusterMethod of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Clusterbyonggon chun
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideIBM
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Grayharryvanhaaren
 
Micron CXL product and architecture update
Micron CXL product and architecture updateMicron CXL product and architecture update
Micron CXL product and architecture updateMemory Fabric Forum
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developersconfluent
 
From distributed caches to in-memory data grids
From distributed caches to in-memory data gridsFrom distributed caches to in-memory data grids
From distributed caches to in-memory data gridsMax Alexejev
 

Tendances (20)

Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
Camel Day Italia 2021 - Camel K
Camel Day Italia 2021 - Camel KCamel Day Italia 2021 - Camel K
Camel Day Italia 2021 - Camel K
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Kafka PPT.pptx
Kafka PPT.pptxKafka PPT.pptx
Kafka PPT.pptx
 
Introduction to JDF / JMF
Introduction to JDF / JMFIntroduction to JDF / JMF
Introduction to JDF / JMF
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Introducing Lenovo XClarity: Simplified Hardware Resource Management
Introducing Lenovo XClarity: Simplified Hardware Resource ManagementIntroducing Lenovo XClarity: Simplified Hardware Resource Management
Introducing Lenovo XClarity: Simplified Hardware Resource Management
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolution
 
Kafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka ConsumersKafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka Consumers
 
ksqlDB로 시작하는 스트림 프로세싱
ksqlDB로 시작하는 스트림 프로세싱ksqlDB로 시작하는 스트림 프로세싱
ksqlDB로 시작하는 스트림 프로세싱
 
Confluent Tech Talk Korea
Confluent Tech Talk KoreaConfluent Tech Talk Korea
Confluent Tech Talk Korea
 
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV ClusterMethod of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
 
Micron CXL product and architecture update
Micron CXL product and architecture updateMicron CXL product and architecture update
Micron CXL product and architecture update
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
From distributed caches to in-memory data grids
From distributed caches to in-memory data gridsFrom distributed caches to in-memory data grids
From distributed caches to in-memory data grids
 

Similaire à Interactive Voice Con: Optimizing Voice Processing for Smart Speakers & Devices

The Next-Gen Technologies Driving Immersion
The Next-Gen Technologies Driving ImmersionThe Next-Gen Technologies Driving Immersion
The Next-Gen Technologies Driving ImmersionQualcomm Research
 
Polycom soundstation premier data sheet
Polycom soundstation premier data sheetPolycom soundstation premier data sheet
Polycom soundstation premier data sheetbest4systems
 
Audio Low Power and Closed Lid Enhancements for Intel Platforms
Audio Low Power and Closed Lid Enhancements for Intel PlatformsAudio Low Power and Closed Lid Enhancements for Intel Platforms
Audio Low Power and Closed Lid Enhancements for Intel PlatformsHenry Wong
 
Understanding Voice Intelligibility
Understanding Voice IntelligibilityUnderstanding Voice Intelligibility
Understanding Voice IntelligibilityDimitar Kalendzhiev
 
Sound Matters in Multiscreen Entertainment Delivery - TVNext 2012
Sound Matters in Multiscreen Entertainment Delivery - TVNext 2012Sound Matters in Multiscreen Entertainment Delivery - TVNext 2012
Sound Matters in Multiscreen Entertainment Delivery - TVNext 2012Ellis Reid
 
"Embracing Web 2.0 and New Media Communications"
"Embracing Web 2.0 and New Media Communications""Embracing Web 2.0 and New Media Communications"
"Embracing Web 2.0 and New Media Communications"arester
 
HD Voice: The Hurdles and how to overcome the codec war
HD Voice: The Hurdles and how to overcome the codec warHD Voice: The Hurdles and how to overcome the codec war
HD Voice: The Hurdles and how to overcome the codec warJohn Gallagher
 
HD Voice, telecom operators
HD Voice, telecom operatorsHD Voice, telecom operators
HD Voice, telecom operatorsJohn Gallagher
 
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...
Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...ST_World
 
AudioCodes Session Border Controller Update
AudioCodes Session Border Controller UpdateAudioCodes Session Border Controller Update
AudioCodes Session Border Controller UpdateJohn D'Annunzio
 
Frequency toutorial
Frequency toutorial Frequency toutorial
Frequency toutorial ruwaghmare
 
TomW_Radio-Romania-lecture
TomW_Radio-Romania-lectureTomW_Radio-Romania-lecture
TomW_Radio-Romania-lectureTom Williams
 
A COMPLETE GUIDE FOR BUILDING AN INTERVIEW RECORDING.pptx
A COMPLETE GUIDE FOR BUILDING AN INTERVIEW RECORDING.pptxA COMPLETE GUIDE FOR BUILDING AN INTERVIEW RECORDING.pptx
A COMPLETE GUIDE FOR BUILDING AN INTERVIEW RECORDING.pptxJessicaWein1
 
Polycom soundstation vtx1000 data sheet
Polycom soundstation vtx1000 data sheetPolycom soundstation vtx1000 data sheet
Polycom soundstation vtx1000 data sheetbest4systems
 
RTASC Lite - Real Time Audio System Check Lite
RTASC Lite - Real Time Audio System Check LiteRTASC Lite - Real Time Audio System Check Lite
RTASC Lite - Real Time Audio System Check LiteDru Wynings
 
Has video really killed the audio star?
Has video really killed the audio star?Has video really killed the audio star?
Has video really killed the audio star?Cisco Canada
 

Similaire à Interactive Voice Con: Optimizing Voice Processing for Smart Speakers & Devices (20)

The Next-Gen Technologies Driving Immersion
The Next-Gen Technologies Driving ImmersionThe Next-Gen Technologies Driving Immersion
The Next-Gen Technologies Driving Immersion
 
Polycom soundstation premier data sheet
Polycom soundstation premier data sheetPolycom soundstation premier data sheet
Polycom soundstation premier data sheet
 
SPEECH CODING
SPEECH CODINGSPEECH CODING
SPEECH CODING
 
Audio Low Power and Closed Lid Enhancements for Intel Platforms
Audio Low Power and Closed Lid Enhancements for Intel PlatformsAudio Low Power and Closed Lid Enhancements for Intel Platforms
Audio Low Power and Closed Lid Enhancements for Intel Platforms
 
Understanding Voice Intelligibility
Understanding Voice IntelligibilityUnderstanding Voice Intelligibility
Understanding Voice Intelligibility
 
CantataCS
CantataCSCantataCS
CantataCS
 
Sound Matters in Multiscreen Entertainment Delivery - TVNext 2012
Sound Matters in Multiscreen Entertainment Delivery - TVNext 2012Sound Matters in Multiscreen Entertainment Delivery - TVNext 2012
Sound Matters in Multiscreen Entertainment Delivery - TVNext 2012
 
"Embracing Web 2.0 and New Media Communications"
"Embracing Web 2.0 and New Media Communications""Embracing Web 2.0 and New Media Communications"
"Embracing Web 2.0 and New Media Communications"
 
HD Voice: The Hurdles and how to overcome the codec war
HD Voice: The Hurdles and how to overcome the codec warHD Voice: The Hurdles and how to overcome the codec war
HD Voice: The Hurdles and how to overcome the codec war
 
HD Voice, telecom operators
HD Voice, telecom operatorsHD Voice, telecom operators
HD Voice, telecom operators
 
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...
Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...
 
AudioCodes Session Border Controller Update
AudioCodes Session Border Controller UpdateAudioCodes Session Border Controller Update
AudioCodes Session Border Controller Update
 
Frequency toutorial
Frequency toutorial Frequency toutorial
Frequency toutorial
 
TomW_Radio-Romania-lecture
TomW_Radio-Romania-lectureTomW_Radio-Romania-lecture
TomW_Radio-Romania-lecture
 
A COMPLETE GUIDE FOR BUILDING AN INTERVIEW RECORDING.pptx
A COMPLETE GUIDE FOR BUILDING AN INTERVIEW RECORDING.pptxA COMPLETE GUIDE FOR BUILDING AN INTERVIEW RECORDING.pptx
A COMPLETE GUIDE FOR BUILDING AN INTERVIEW RECORDING.pptx
 
Demo consolas eng
Demo consolas engDemo consolas eng
Demo consolas eng
 
Polycom soundstation vtx1000 data sheet
Polycom soundstation vtx1000 data sheetPolycom soundstation vtx1000 data sheet
Polycom soundstation vtx1000 data sheet
 
RTASC Lite - Real Time Audio System Check Lite
RTASC Lite - Real Time Audio System Check LiteRTASC Lite - Real Time Audio System Check Lite
RTASC Lite - Real Time Audio System Check Lite
 
Has video really killed the audio star?
Has video really killed the audio star?Has video really killed the audio star?
Has video really killed the audio star?
 
Multi-Hall_Brochure_LowRes
Multi-Hall_Brochure_LowResMulti-Hall_Brochure_LowRes
Multi-Hall_Brochure_LowRes
 

Dernier

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Dernier (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Interactive Voice Con: Optimizing Voice Processing for Smart Speakers & Devices

  • 1. Interactive Voice Con Successful smart speakers & voice-enabled products Platinum Sponsor
  • 2. Introducing the Speakers Paul Beckmann • PhD, MS, and BS from MIT. All in EE. • Technical specialties: signal processing, audio product development, and tools. Mike Klasco • Combined MS/PhD ABT NYU • Audio product development, acoustics, transducers, materials and sourcing 2020 Interactive Voice Con Founder and CTO of DSP Concepts Founder and CEO, Menlo Scientific
  • 3. Outline • Kickoff • Voice Processing Theory [45 minutes] • Algorithms • Measuring performance • Processor requirements • Product design guidelines • Break [15 minutes] • Demos? [15 minutes] • What Happens in Practice [30 minutes] • Microphone integration issues • The enclosure – the space between the mics and speakers • Loudspeakers, acoustic • Q&A [15 minutes] 2020 Interactive Voice Con
  • 5. Types of Voice Recognition Algorithms • Voice trigger • Identifies a single word or phrase like “Alexa” or “Hey Siri” • Small vocabulary voice recognition • Fixed vocabulary set for embedded applications. 10’s of words. • “Turn on the lights”, “Next track”, etc. • Full voice recognition • Large vocabulary set. 1,000’s of words • “Play Beatles” • Natural language understanding (NLU) • Combines application specific information for more flexible user interface • “Play Music by the Beatles”, “Give me Beatles Music”, “I want to listen to music by the Beatles” • Can be combined with small vocabulary set 2020 Interactive Voice Con
  • 6. Audio Front End = Microphone Cleaner 2020 Interactive Voice Con Audio Front End Voice Recognition Mic Array N Channels 1 Channel The Audio Front End (AFE) cleans up signals to improve the performance of the voice recognition. It is like glasses for a camera. Interfering Noise Device Playback Desired Speech
  • 7. Audio Front End Details 2020 Interactive Voice Con Echo Canceler Trigger Word & Voice Recognition Mic Array N Channels 1 Channel Direction of Arrival Noise Reduction Beam- former Eliminates loudspeaker sound during device playback Determines location of sound source. Used to steer beamformer. Combines multiple microphone signals to improve signal quality. Removes various types of noise
  • 8. Comparing Amazon and Google • 2 microphones only • 65 to 71 mm spacing • Mono or stereo • High-end application processor required • No variation in products • No variation in performance • Performance lags behind AVS 2020 Interactive Voice Con Google AFE and Trigger Word 3rd Party AFE Amazon Trigger Word ASR ASR • Any number of microphones • Any spacing • Any number of playback channels • Application processor or MCU solutions • Wide variety of designs • 2 to 7 microphones • Different form factors • Better performance • Low cost designs possible
  • 9. AVS Integration for AWS IoT • Cost effective way to add Alexa voice features • Connects to the cloud • Uses an RTOS and lightweight MQTT network stack • Suitable for low cost microcontrollers • Will expand voice to a much larger number of products 2020 Interactive Voice Con https://docs.aws.amazon.com/iot/latest/developerguide/avs-integration-aws-iot.html (AKA. “Alexa for Microcontrollers”)
  • 10. Trigger Word • Voice recognition algorithm trained for a single word or phrase • “Alexa”, “OK Google”, “Bixby”, “Siri”, “Cortana”, etc. • Available from multiple suppliers • Amazon, Google, Baidu, etc. • Sensory “Truly Handsfree” • PicoVoice / SoundHound / Cyberon / etc. • They all use machine learning • Often optimized for low power consumption • Sound → Voice Activity Detector → Key word detector • Large models perform better • Sensory: 17 kbyte → 1 Mbyte 2020 Interactive Voice Con
  • 11. Characterizing Trigger Performance • Probability of False Alarm • How many times does the algorithm accidentally trigger over a 24-hour period? • Probability of Miss • What % of trigger words are not detected by the algorithm • Trigger word algorithms have an adjustable “sensitivity” setting that allows you to tradeoff false alarms and misses. • Amazon requires <3 false alarms per 24 hours of continuous speech 2020 Interactive Voice Con False Alarm Rate ProbabilityofDetection 100% Ideal operating point Tune sensitivity based on allowable false alarm rate
  • 12. Wake Word Performance in Noise SNR at microphone is main driver of wake word performance • Independent of distance • Independent of room reflections / reverb (for normal household environments) Improve your SNR to improve your wake word performance. 2020 Interactive Voice Con
  • 14. Beamforming Principles • Beamformers are spatial filters. They pass signals from certain directions and reduce signals from other directions. • Performance depends heavily upon the geometry of the microphone array • Fixed beamformers utilize FIR filters • Time domain or frequency domain • There are many ways to compute the filter coefficients (MVDR, DAS, etc.) 2020 Interactive Voice Con h1[n] h2[n] h3[n] h4[n] FIR Filters
  • 15. DSPC Design Method: Maximize SNR • Inputs to design • Microphone geometry • Look angle and beam width • Diffuse field noise level • Microphone SNR • Signal is person’s voice in specified beam • Noise = diffuse field noise + microphone self noise • Iterative design procedure maximizes SNR 2020 Interactive Voice Con
  • 16. SNR vs. Frequency 2020 Interactive Voice Con
  • 17. Optimal Array Geometries 2020 Interactive Voice Con Far Field Products 180 or 360 Degree Smart speakers Middle of the room 180 Degree Set-top box Side of the room Flat Line Array TVs, appliances On a wall High-End Standard Low-Cost 40 to 70 mm diameter works. 70 mm works the best 25 mm spacing between mics 75 mm total length +7 dB +6.5 dB +5 dB +2 dB +3 dB +2 dB +4 dB
  • 18. SNR vs. Mic Geometry Assumptions: • 71 mm diameter • Microphone array is in diffuse field noise with SNR = 50 dB • Speech is at 60 dB in the direction of the beam • Beam width is 45 degrees • Microphone SNR = 65 dB • Look angle = 0 degrees 2020 Interactive Voice Con
  • 19. Linear Arrays • Linear arrays work well when in an end-fire configuration. • Requires person to be in a specified location. • Provides 4 to 5 dB SNR improvement • Broadside arrays work poorly and should be avoided. • Very little SNR improvement to low frequencies where the bulk of speech energy is • Use broadside arrays only as a last resort when the industrial design dictates no other options • Television • Wall panel 2020 Interactive Voice Con End-fire Broadside Intuition: beamformers use time differences to steer beam. In broadside, voice arrives at the same time at both mics.
  • 21. Stationary Noise Reduction 2020 Interactive Voice Con Before After Example demonstrates improvement in automotive environments • Effective against: • Fan noise • Automotive road noise • Microphone self noise • Creates a model of the background noise and then removes in real-time • Improves ASR performance by 2 to 3 dB
  • 22. Interference Canceler • Effective against noise from: • TVs • Appliance self noise • Air conditioners • Requires a minimum of 2 microphones • Combines beamforming, adaptive filtering, and other statistical signal processing techniques • Effective for music and speech interferers • Improves ASR performance up to 30 dB! 2020 Interactive Voice Con 2 Microphone Example
  • 23. Adaptive Interference Canceler Performance 2020 Interactive Voice Con • Measured in a typical living room environment • Interfering music noise played • Speech at constant level (62 dBC) at DUT • Varied music level • Speech and noise 2 meters from DUT Echo Plus 7-mic DSPC 2- mic DSPC 4- mic 8 dB better DSPC 6- mic 11 dB better Echo 2 7-mic Relative to Amazon Echo Plus and Echo 2
  • 25. Acoustic Echo Cancellers (AEC) • Eliminates loudspeaker sound at the microphone • Enables Voice UI to function while music or text-to- speech is active • Music is usually ducked after the wake word is detected • Best algorithms operate in the frequency domain • Better cancellation • Faster convergence • Lower computation • ERL = Echo Return Loss quantifies performance = How many dB of loudspeaker signal is canceled by the AEC Demo Setup Single microphone with loudspeaker close to the mic. Mono playback in home environment.
  • 26. Factors Affecting AEC Performance • What type of algorithm are you using? • Time domain vs frequency domain • LMS vs Kalman vs Other? • Echo tail length • How many msec of audio can you cancel? • Longer is better but requires more processing and memory • Far-field smart speakers require 150 to 200 msec of echo tail • Reverberation time of the room (lower is better) • Linearity of your loudspeakers 2020 Interactive Voice Con
  • 27. Speaker Distortion Affects AEC • This is usually the limiting factor for AEC performance • Loudspeakers distort when playing loud or low frequencies • Speakers need to be tuned to minimize distortion • Rule of thumb: 1% THD AEC up to 40 dB 2% THD AEC up to 34 dB 3% THD AEC up to 30 dB 5% THD AEC up to 26 dB 10% THD AEC up to 20 dB • Product developers must tradeoff low frequency sound quality vs. voice performance 2020 Interactive Voice Con
  • 28. Rule of Thumb for Speaker Distortion 1. Play a low frequency sine wave through your loudspeaker and plot the spectrum 2. You’ll see harmonics at multiples of the fundamental frequency 3. The largest harmonic determines the absolute limit of the echo canceler 4. ERLE performance based on difference between fundamental and harmonic 5. Repeat at different output levels and frequencies 2020 Interactive Voice Con OK. 30 dB down = 30 dB max ERLE. Bad. 15 dB down = 15 dB max ERLE
  • 29. AECs and Speaker Processing 2020 Interactive Voice Con Reference signal must be taken after nonlinear processing DRC = Dynamic range compression. This includes nonlinear processing like compressors and limiters EQ Ref DRC DAC AMP EQ Ref DRC DAC AMP Cross- Over Crossovers after the DRC are allowed. Higher order crossover perform better.
  • 30. Multichannel Echo Cancelers • Some applications require multichannel echo cancelers (e.g., soundbars) • For optimal performance, you need to cancel all the channels. Downmixing reduces performance. • The example to the right shows what happens when you have a 3 channel product and apply a 2 channel AEC 2020 Interactive Voice Con Full performance when using a 3 channel AEC to cancel L, R, and C speakers. Reduced performance when downmixing to 2 channels and using a stereo echo canceler. L’ = L + 0.5 * C R’ = R + 0.5 * C Performance reduced by 5 to 10 dB
  • 31. Woofer Reference Mic 2020 Interactive Voice Con • Work done in conjunction with Vesper • Uses a new high AOP microphone placed directly in front of the woofer • Advanced processing improves ERL by up to 15 dB • Trigger word performance at max playback level: • Standard processing: 63% • Advanced processing: 91% • Similar feature used in the HomePod
  • 33. Amazon Test Setups 2020 Interactive Voice Con Used for most tests Used for AEC test only
  • 34. Understanding Amazon Results • False Alarm Tests • Number of false alarms using Amazon’s 24-hour continuous talking test track • The lower the better • Trigger Detection • % of time that the device wakes up when “Alexa” is spoken • Tested in silence, kitchen noise, music noise, and during music playback • The higher the better • Response Accuracy Rate (RAR) • % of time that the cloud accurately understood the question (i.e., “Alexa, what is the capital of China”) • Tested in silence, kitchen noise, and music noise • The higher the better 2020 Interactive Voice Con
  • 35. Testing Scenarios Silence No interfering sound, uttering “Alexa” at 62 dBC Kitchen Noise (0, -3 dB, -6 dB) Alexa utterance at 62 dBC / Noise at 62, 65, and 68 dBC Music Noise (0, -3 dB, -6 dB) Alexa utterance at 62 dBC / Music at 62, 65, and 68 dBC Acoustic Echo Canceler Music playback at 90 dBC while trigger words are played at 62 dBC. 2020 Interactive Voice Con
  • 36. Living Room Results – Trigger Detection 2020 Interactive Voice Con
  • 37. Living Room Results - RAR 2020 Interactive Voice Con
  • 39. Many Performance Levels Low Power / Near-field 1 or 2 mics ARM Cortex-M4 20 to 30 MHz Basic Far-Field 2-mics. Mono ARM Cortex-M7 or Cortex-A53 200 MHz High-Performance Far-Field 4+ mics. Stereo ARM Cortex-A53 350 to 600 MHz High-Performance Far-Field 4+ mics. Multichannel ARM Cortex-A53 900 to 1200 MHz 2020 Interactive Voice Con
  • 40. Processor Comparisons 2020 Interactive Voice Con ARM Cortex-M4 ARM Cortex-M7 ARM Cortex-A35 ARM Cortex-A53 ARM Cortex-A72 Tensilica HiFi 4 0.26 0.45 0.37 0.48 0.98 1.00 Processor efficiency per MHz. The larger the better. ST, NXP, Renesas, Ambiq, Quicklogic ST, NXP Mediatek NXP, Amlogic, Qualcomm Coming soon! NXP, Mediatek, Amlogic ARM Cortex-A53 is the sweet spot for smart speakers.
  • 42. Smart Speaker Designs • 360-degree operation • Microphones on top of product • 40 to 75 mm diameter • Physically separate microphones and loudspeakers for best performance • Mono or stereo playback High-End Standard 2020 Interactive Voice Con
  • 43. Sound Bar Designs • 180-degree operation • Microphones on top of product near center of device • 60 to 75 mm design • Physically separate microphones and loudspeakers for best performance • Stereo or multichannel playback (up to 7 reference channels) • Compatible with Dolby Atmos High- End Standard 2020 Interactive Voice Con
  • 44. TV Designs Placement options • Top is better than bottom • Further away from speakers • Bottom usually wins out because of lower cost • Mics do not have to be centered • 2 mics sufficient 2020 Interactive Voice Con Good Better
  • 45. Set-Top Box Designs • Top of Device • 180-degree operation • Microphones on top of product • Tethered “puck” • 360-degree operation • Microphones on top of product • Support for optional internal speaker for voice playback • Audio playback through HDMI High- End Standard 2020 Interactive Voice Con
  • 46. Appliance / Tablet Designs • 180-degree operation • 2 or 4 microphone linear array • 25 to 75 mm design • Physically separate microphones and loudspeakers for best performance • Mono or stereo playback Good Better 2020 Interactive Voice Con
  • 47. Design Guidelines – Microphones 2020 Interactive Voice Con Far Field Products • Microphones should be placed on the top of the product, if possible. • Microphones should be on a flat horizontal surface • Microphones should be visible to the user (not occluded) • Flat line arrays are not recommended. These are only last choice, if necessary. (Microphone arrays work best if the microphones are displaced in the horizontal plane) • Microphones need to be properly ported (see design guidelines from microphone vendor) • 4 microphones is sufficient for most products
  • 48. Design Guidelines – Microphones 2020 Interactive Voice Con Far Field Products • SNR of 65 dB. Higher SNRs provide no benefit for voice recognition but has benefits for voice communication • Gain matching: • +/- 1 dB in the range 200 to 6 kHz (recommended) • +/- 1dB in 200 to 4 kHz and +/-3 dB in 4k to 7 kHz (required) • Microphone AOP must be high enough so that the system doesn’t clip when loudspeakers are played at full volume. Recommendations: • 120 dB for smart speakers • 130 dB for sound bars • 40 to 70 mm microphone spacing is recommended. As small as 20 mm is possible with some degradation in performance.
  • 49. Microphone Acoustical Porting 2020 Interactive Voice Con (No Common Cavity) MEMS Mic Vent hole Case PCB MEMS Mic Vent hole You need individual gaskets to make a direct connection between each mic and its vent hole If you block a microphone hole with putty, you should see the level drop by at least 30 dB MEMS Mic Case PCB MEMS Mic Gasket Gasket This design with a common cavity shared by all microphones won’t work.
  • 50. Design Guidelines – Microphones (A) 2020 Interactive Voice Con In Ear Products • 2 microphones are sufficient for most products • Use 2 microphones in an end fire configuration pointing towards the mouth • Space microphones as far apart as possible. 10 mm is the minimum spacing. 20 mm is preferred • Microphone on end of “boom” improves performance
  • 51. End of first session
  • 52. Overview 2020 Interactive Voice Con What Happens in Practice • Microphone selection • The Physical world in front of the mic • No Man’s Land between the mic and speaker (leakage) • Loudspeakers – good, bad and ugly • Software integration issues
  • 53. MEMs Microphone selection cheat sheet • Analog or digital? • Analog single-ended or balanced? • Top or bottom port? • Standard size or compact ? • AOP – Acoustic Overload Point? • S/N – Signal to Noise? • Sensitivity (asic gain)? • Robustness (IPXX)? 2020 Interactive Voice Con
  • 54. MEMs Microphones – what is inside? • MEMs mic element + ASIC in a package • Wiring between mems mic die and ASIC • Typical package envelope of 3.50mm x 2.65mm x 0.98mm • Smaller foot print on some models but reduced back volume = reduced s/n • Faraday shield on some models 2020 Interactive Voice Con
  • 55. Microphones - Analog vs digital? What are the mic inputs on codec or soc (System On Chip)? • Analog single-ended • Analog pseudo balanced • Digital – PDM 2020 Interactive Voice Con
  • 56. Microphones – Top or Bottom port? • The MEMs smt package can have the sound aperture either on the top or bottom • If on the bottom then the circuit board it is flow soldered to the flex pcb) and have a hole that aligns to the MEMs mic port • Bottom port warning • Sealing - back port smt seal eyelet 2020 Interactive Voice Con
  • 57. Microphones – signal to noise • S/N was once a deal killer for most serious applications, MEMs mics have caught up with ECMs with commodity analog and digital MEMs reaching beyond 60 dB s/n. • Active noise canceling headphones, hearing aides, voice command desire 65 dB s/n or better • 70+ dB from a few vendors by the start of 2021 (but this keeps slipping!) • Better s/n = less mics? Some discussion of higher s/n enables reduction in mics required 2020 Interactive Voice Con
  • 58. Microphones • Analog MEMs mics - single-ended or balanced differential outputs? • balanced output analog is good defensive engineering if your product will have longer wire runs, digital noise, emi/rf floating around • How differential is MEMs mic topology? True differential capacitive MEMs mics use dual grids for improved noise immunity over single ended for high noise immunity 2020 Interactive Voice Con
  • 59. Microphones - Digital • Digital MEMs mics offer greater immunity to interference than analog MEMs • time to market considerations avoiding having to tweak and rework your board layout if noise problems await you, then digital is the way to go • If the mic performance is critical for your type and class of product analog may be better with external premium codec (both AOP and noise floor 2020 Interactive Voice Con
  • 60. Microphones – Acoustic Overload Point (AOP) • Is AOP due to mic element saturation vs asic overload clipping? • MEMs analog mics typically have better acoustic overload point (aop) which is where serious distortion sets in (codec overload before MEMs mic element) • Analog MEMs overload a bit more gracefully than digital as when an A/D codec overloads it is a line in the sand and nasty. • Digital MEMs aop can be as low as 116 dB and more typically 120 dB. Analog aop tends to be over 120 dB and can be 130+ dB on some MEMs mics. • Vesper’s piezo MEMs mics have versions with very high AOP. 2020 Interactive Voice Con
  • 61. Microphones - Directivity • MEMs mics are omni-directional • For achieving directional characteristics they are used in arrays • One requirement for mic arrays is that the mics are closely matched in sensitivity and response and will be able maintain that uniformity over time 2020 Interactive Voice Con
  • 62. The physical world in front of the mic 2020 Interactive Voice Con
  • 63. Microphones – the world around the mic Key topics • MEMs mics are mounted to flex PCB using smt reflow along with the rest of the smt components • Port Helmholtz resonance – moving it out of band • The port and wind noise • Laminar entry • Acoustic mesh 2020 Interactive Voice Con
  • 64. Microphones - What are membranes for? Woven and non-woven used for; • wind noise, water blocking • acoustic resistance determines crossover to DSP wind noise filtering • Dust problems – internal membrane (within package) blocks smt reflow gasses • Field use issue - shift over time - gunk in the membrane over the mics facing facing stove top 2020 Interactive Voice Con
  • 65. Microphones – the world around the mic Wind noise blocking/acoustic mesh • Mic element overloaded/ saturated by wind • Wind pressure must be blocked acoustically (acoustic resistance membrane) • Mic overload cannot be fixed by DSP (but some turbulence can be filtered out) • Acoustic mesh can also block liquids • (hydrophobic & oleophobic ) 2020 Interactive Voice Con
  • 66. Microphones – the world around the mic Port and wind noise • Laminar entry (flared aperture) • (turbulence in port to be avoided) • Port Helmholtz resonance peak – moving it out of band • Acoustic mesh damps peak Q 2020 Interactive Voice Con
  • 67. The physical world between the mic and speaker 2020 Interactive Voice Con
  • 68. Leakage between the mic & speaker Audio output leakage is both airborne and through the enclosure structure • Minimizing Airborne leakage • keep the mic(s) and speakers as far apart as possible • avoid overlapping the mic(s) pickup pattern and speaker radiation pattern • Structural transconduction (microphonics) • Enclosure housing – ribs, joints, wall thickness • Plastics are not all equal • speaker sub-enclosure isolation mounts (grommets or gaskets) • mic isolation 2020 Interactive Voice Con
  • 69. - Construction and Materials • Plastics have different acoustical characteristics • Stiffness and damping are key factors • Compatibility considerations • Shrink • Tool temperature • Flow • Impact strength • Sink marks/wall thickness 2020 Interactive Voice Con
  • 70. - Construction and Materials The Incumbent plastics • ABS • PC • ABS+PC • PP 2020 Interactive Voice Con
  • 71. - Construction and Materials • TreBlend (Ineos) PA/SAN • Cellulose Plastics • Treva (Eastman) • Symbio (Sappi) • Thicker walls/ ribs without sink marks Acoustically engineered plastics 2020 Interactive Voice Con Genelec M040 – NCE enclosure
  • 72. The physical world of speakers and the AEC Achilles heel - distortion 2020 Interactive Voice Con
  • 73. - Enclosure Mechanical Engineering E • Open the window more and more bugs come in • More power and more bass = no gain without pain • Increase acoustic output before feedback and AEC breakdown by reducing the cabinet resonance peak • Extending low-end response of product will shake things up more 2020 Interactive Voice Con
  • 74. Speaker Nonlinearities AEC issues • Speaker distortion nonlinearities are the enemy of AEC • Loudspeaker nonlinearities effect AEC • - low-end distortion impact on aec yet not audible for listening • Fine tuning of suspension and motor nonlinearities are critical • or source off-the-shelf application-specific speakers optimized for AEC and ANC 2020 Interactive Voice Con
  • 75. - 50 mm AEC / ANC optimized speakers • Application-specific ANC and AEC high linearity /lower distortion speakers to meet TIA 930 • Typically around 50 mm diameter • SEAS • Tymphany • Stetron 2020 Interactive Voice Con
  • 76. subVo servo feedback correction Next generation solution for increased AEC headroom • subVo bend-sensor provides distortion reduction at the lower octaves enabling increased AEC headroom • Precision position sensor provides error correction feedback • 10 dB of feedback = 10 dB of piston range distortion reduction 2020 Interactive Voice Con
  • 77. Software Integration Issues 2020 Interactive Voice Con
  • 78. Software Integration Challenges • Real-time CPU load • Wrong interrupt levels • Dropping samples / blocks • Non constant latency between mics and reference signals • Misconfigured PDM filters • Different clocks for mics and reference signals 2020 Interactive Voice Con
  • 79. Example #1: Noisy PDM Microphones PDM to PCM Converter PCM Samples PDM Bitstream Problem Statement • ASR accuracy only 72% in quiet speech conditions • High quality microphone: • -41 dB sensitivity / 66 dB SNR • Noise floor expected at 28 dBA • Noise floor measured at 39 dBA • Root cause • PDM to PCM converter was implemented with 16-bit math • Generated noise floor was at -96 dBFS → 39 dBA • Solution • Implement PDM to PCM conversion in software • ASR accuracy improved to 94%
  • 80. Example #2: Incorrect thread priorities CPU Load Problems Audio processing was taking 18% on average but there were large spikes. Bluetooth thread priority was incorrectly set higher than real-time audio processing. Corrected Thread Priorities Steady and consistent CPU load 0 20 40 60 80 100 120 140 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 CPU Load over Time Peak Average 0 10 20 30 40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 CPU Load over Time Peak Average