Thesis Defense Presentation

VALIDATION OF A REAL-TIME VIRTUAL AUDITORY
SYSTEM FOR DYNAMIC SOUND STIMULI AND ITS
APPLICATION TO SOUND LOCALIZATION

Brett Rinehold

Outline
Motivation
 
Introduction
 
Background
 
Loudspeaker Presentation
 
HRTF Interpolation
 
Acoustic Waveform Comparison
 
Static Sound Presentation
 
Dynamic Sound Presentation
 
Static Sound with a Dynamic Head Presentation
 
Psychophysical Experiment
 
Discussion


Motivation
To validate a real-time system that updates head-related
 
impulse responses

Goal is to show that the acoustic waveforms measured
 
on KEMAR match between real and virtual presentations

Applications:
 
Explore the effects on sound localization with the presentation
 
of dynamic sound

Introduction: What is Real/Virtual Audio?
Real Audio consists of presenting sounds over
 
loudspeakers

Virtual Audio consists of presenting acoustic waveforms
 
over headphones.
Advantages
 
Cost-effective
 
Portable
 
Doesn’t depend on room effects
 

Disadvantages
 
Unrealistic


Introduction: Sound Localization
Interaural Time Difference – ITD – Differences between
 
sound arrival times at the two ears
Predominant cue in the low frequencies < 2kHz
 

Interaural Level Difference – ILD – Differences between
 
sound levels in the two ears
Predominant cue in the higher frequencies due to head
 
shadowing ~> 2kHz
Encoded in Head-Related Transfer Function (HRTF)
 
ILD in Magnitude
 
ITD in Phase


Background of RTVAS System
Developed by Jacob Scarpaci (2006)
 
Uses a Real-Time Kernel in Linux to update HRTF filters
 

Key to system is that the HRTF being convolved
with input signal is the difference between where
the sound should be and where the subject’s
head position is.

Project Motivation/Aims
Goal is to validate that the Real-Time Virtual Auditory
 
System, developed by Jacob Scarpaci (2006), correctly
updates HRTFs in accordance with head location relative
to sound location.
Approach to validation:
 
Compare acoustic waveforms measured on KEMAR when
 
presented with sound over headphones to those presented
over loudspeakers.
Mathematical, signals approach
 

Perform a behavioral task where subjects are to track dynamic
 
sound played over headphones or loudspeakers.
Perceptual approach


Methods: Real Presentation - Panning

Loudspeaker setup to create a virtual speaker
    Nonlinear (Leakey, 1959)
(shown as dashed outline) by interpolation
CH 1 = 1/ 2 (sin(! ) / 2 sin(! pos ))
between two symmetrically located speakers
about 0 degrees azimuth. CH 2 = 1/ 2 + (sin(! ) / 2 sin(! pos ))

HRTF Measurement
Empirical KEMAR
 
17th order MLS used to measure HRTF at every degree from -90 to 90 degrees.
 
All measurements were windowed to 226 coefficients using a modified
 
Hanning window to remove reverberations.
Minimum-Phase plus Linear Phase Interpolation
 
Interpolated from every 5 degree empirical measurements.
 
Magnitude function was derived using a linear weighted average of the log
 
magnitude functions from the empirical measurements.
Minimum Phase function was derived from the magnitude function.
 

Linear Phase component was added corresponding to the ITD calculated for
 
that position.



Acoustic Waveform Comparison: Static
Sound/Static Head Methods
Presented either a speech waveform or noise waveform at three
 
different static locations: 5, 23, and -23 degrees

During the free-field presentation the positions were created by
 
using the panning technique (outlined previously) from speakers.
Used 4 different KEMAR HRTF sets in the virtual presentation
 
Empirical, Min-Phase Interp., Empirical Headphone TF, Min-Phase
 
Headphone TF
Recorded sounds on KEMAR with microphones located at the
 
position corresponding to the human eardrum.

Static Sound/Static Head: Analysis
Correlated the waveforms recorded over loudspeakers
 
with the waveforms recorded over headphones for a
given set of HRTFs.
Correlated time, magnitude, and phase functions
 
Allowed for a maximum delay of 4ms in time to allow for
 
transmission delays
Broke signals into third-octave bands with the following
 
center frequencies:
  [200 250 315 400 500 630 800 1000 1250 1600 2000 2500 3150 4000
5000 6300 8000 10000]
Correlated time, magnitude, and phase within each band and calculated
 
the delay(lag) needed to impose on one signal to achieve maximum
correlation.
Looked at differences in binaural cues within each band


Across Time/Frequency Correlations of
Static Noise

Acoustic Waveform Comparisons: Static
Sound/Static Head Results Cont.

Difference in ITDs from Free-Field and
Headphones for Static Noise

Difference in ILDs from Free-Field and
Headphones for Static Noise

Dynamic Sound/Static Head: Methods
Presented a speech or a noise waveform either over
 
loudspeakers or headphones using panning or convolution
algorithm

Sound was presented from 0 to 30 degrees
 
Used same 4 HRTF sets


Across Time/Frequency Correlation of
Dynamic Noise

Acoustic Waveform Comparison: Dynamic
Sound/Static Head Noise Results Cont.

Headphones for Dynamic Noise

Static Sound/Dynamic Head: Methods
Speech or noise waveform was presented over
 
loudspeakers or headphone at a fixed position, 30
degrees.
4 HRTF sets were used
 
KEMAR was moved from 30 degrees to 0 degree position
 
while sound was presented.
Head position was monitored using Intersense® IS900
 
VWT head tracker.

Static Sound/Dynamic Head: Analysis
Similar data analysis was performed in this case as in the
 
previous two cases.
Only tracks that followed the same trajectory were
 
correlated.
Acceptance Criteria was less than 1 or 1.5 degree difference
 
between the tracks.

Across Time/Frequency Correlation for
Dynamic Head/Static Noise

Acoustic Waveform Comparison: Static
Sound/Dynamic Head Noise Results Cont.

Headphones for Static Noise/Dynamic Head

Headphones for Static Noise/Dynamic Head.

Waveform Comparison Discussion
Interaural cues match up very well across the different
 
conditions as well as between loudspeakers and
headphones.
Result from higher correlations in the magnitude and phase
 
functions.
Differences (correlation) in waveforms may not matter
 
perceptually if receiving same binaural cues.
Output algorithm in the RTVAS seems to present correct
 
directional oriented sounds as well as correctly adjusting
to head movement.

Psychophysical Experiment: Details
6 Normal Hearing Subjects
 
4 Male, 2 Female
 

Sound was presented over headphones or loudspeakers
 
Task was to track, using their head, a moving sound
 
source.
HRTFs tested were, Empirical KEMAR, Minimum-Phase
 
KEMAR, Individual (Interpolated using Minimum-Phase)

Psychophysical Experiment: Details cont.
Sound Details
 
White noise
 
Frequency content was 200Hz to 10kHz
 
Presented at 65dB SPL
 
5 second in duration
 

Track Details
 
15(sin((2pi/5)t)+ sin((2pi/2)t*rand))


Psychophysical Experiment: Virtual Setup

Head Movement Training – Subjects just moved head (no sound)
 
5 repetitions where subjects’ task was to put the square (representing
 
head) in another box.
Also centers head.
 
Training – All using Empirical KEMAR
 
10 trials where subject was shown, via plot, the path of the sound before
 
it played.

10 trials where the same track as before was presented but no visual
 
cue was available.
10 trials where subject was shown, via plot, the path but path was
 
random from trial to trial.
10 trials where tracks were random and no visualization.


Psychophysical Experiment: Setup cont.
Experiment (Headphones)
 
10 trials using Empirical KEMAR HRTFs
 
10 trials using Minimum-Phase KEMAR HRTFs
 
10 trials using Individual HRTFs
 
Repeated 3 times
 

Loudspeaker Training
 
Same as headphones but trials were reduced to 5.
 

Loudspeaker Experiment
 
30 trials repeated only once
 
Subjects were instructed to press a button as soon as they
 
heard the sound. This started the head tracking.

Individual Response to Complexity of Tracks

Overall Coherence in Performance

Deeper Look into Individual HRTF Case

Psychophysical Experiment: Discussion
Coherence
 
The coherence or correlation measure is statistically insignificant in
 
the empirical and minimum phase interpolation case from that over
loudspeakers.
Coherence of individual HRTFs was surprisingly worse.
 
Coherence also stays strong as the complexity of the track varies.
 

Latency
 
Individual HRTFs show a more variability in latency.
 
Might be able to track changes quicker using their own HRTFs
 

Loudspeaker latency is negative which means that subjects are
 
predicting the path.
Could be because subjects are predicting the path since sound always go
 
to the right first as well as a result from the delay in pressing the button

Psychophysical Experiment: Discussion
Cont.
RMS
 
No significant difference in total RMS error as well as RMS
 
undershoot error between Empirical and Minimum-Phase
HRTFs from loudspeakers.
Subjects generally undershoot the path of the sound.
 
Could be a motor problem, i.e. laziness, as well as perception.


Overall Conclusions
Coherence of acoustic recordings may not be the best
 
measure for validation
Reverberation or panning techniques
 

If perception is the only thing that matters, than have to
 
conclude that algorithm works

Future Work
Look at different methods for presenting dynamic sound
 
over loudspeakers.
Try different room environments.
 
Closer look at differences between headphones
 
Particularly looking at open canal tube-phones to see if
 
subjects could distinguish between real and virtual sources.
Various psychophysical experiments that involve dynamic
 
sound (speech, masking)
Sound localization
 
Source separation


Acknowledgements
Other
Committee  
 
Dave Freedman
Steven Colburn  
 
Jake Scarpaci
Barb Shinn-Cunningham  
 
Nathaniel Durlach My Subjects
   
Binaural Gang All in Attendance
   
Todd Jennings
 
Le Wang
 
Tim Streeter
 
Varun Parmar
 
Akshay Navaladi
 
Antje Ihlefeld


Methods: Real Presentation Continued

Input stimulus was a 17th
Title: Speaker
 
Presentation

order mls sequence sampled
Source Created

at 50kHz.
Speaker Position

-5
Corresponds to a duration of
 
10 0
~2.6sec
5
-10
15 0
Waveforms were recorded
 
10
-20
on KEMAR (Knowles
-10

Electronic Manikin for
30 0
10

Acoustic Research)
20
-40
-30
-10
45 0
10
30
40

Results: Real Presentation

•  HRTFs measured when sound was presented over loudspeakers using the
linear and nonlinear interpolation functions

Linear Nonlinear

Results: Correlation Coefficients at all Spatial Locations for
Interpolated Sound over Loudspeakers

Correlation
Title: Correlation Coefficients
 
Linear Function Non-linear Function
Speaker Virtual
Left Right Left Right
Location Position
between a
-40 0.98799 0.9758 0.98655 0.97769
-30 0.97427 0.96611 0.97534 0.96777

virtual point
-10 0.96842 0.94612 0.96858 0.9466
45 0 0.95736 0.91602 0.95693 0.91709
10 0.96374 0.95282 0.96384 0.95276
source and a
30 0.97532 0.97095 0.97644 0.97084
40 0.98397 0.98194 0.98268 0.98177

real source
-20 0.98372 0.97316 0.98385 0.97357
-10 0.98054 0.9564 0.98054 0.95649
30 0 0.97184 0.93755 0.97171 0.93774
10 0.97151 0.96414 0.97147 0.96448
20 0.97844 0.97768 0.97883 0.97762
-10 0.993 0.97775 0.99301 0.97787
15 0 0.97821 0.95517 0.97817 0.95503
10 0.98406 0.98576 0.98412 0.98572
-5 0.99326 0.97585 0.99328 0.97601
10 0 0.98927 0.96086 0.98924 0.96077
5 0.99319 0.98977 0.99312 0.98977

Very strong correlation, generally, for all spatial locations
Weaker correlation as speakers become more spatially separated
Weakest correlation when created sound is furthest from both speakers (0
degrees)

Spatial Separation of Loudspeakers

Correlation coefficients
 
for a virtually created
sound source at -10
degrees at various
spatial separations of the
loudspeakers

Correlation declines as the loudspeakers become more spatially separated

Example of Psuedo-Anechoic HRTFs

• Correlation coefficients are slightly better when reverberations are taken out of the impulse
responses
• Linear Reverberant: 0.98054, 0.9564 (Left, Right Ears)
• Linear Psuedo-Anechoic: 0.98545, 0.96019 (Left, Right Ears)
• Nonlinear Reverberant: 0.98054, 0.95649 (Left, Right Ears)
• Nonlinear Pseudo-Anechoic: 0.9855, 0.96007 (Left, Right Ears)

Correlation Coefficients at all Spatial Locations for Interpolated
Sound over Loudspeakers (Pseudo-Anechoic)

Table 3. Correlation Coefficients for Psuedo-Anechoic HRTFs
Linear Function Non-linear Function
Speaker Virtual
Left Right Left Right
Location Position
-40 0.96567 0.99168 0.96416 0.98421
-30 0.96223 0.95356 0.96138 0.95815
-10 0.96348 0.93433 0.96299 0.93902
45 0 0.95471 0.89491 0.95436 0.89968
10 0.95856 0.93652 0.95913 0.93953
30 0.97678 0.945 0.97825 0.94013
40 0.99563 0.9814 0.99 0.98018
-20 0.98762 0.97555 0.98767 0.97663
-10 0.98545 0.96019 0.9855 0.96007
30 0 0.97281 0.93616 0.97284 0.93623
10 0.97927 0.96945 0.97912 0.96968
20 0.97904 0.98188 0.97846 0.98183
-10 0.99608 0.98114 0.99592 0.98167
15 0 0.97891 0.95475 0.9788 0.95461
10 0.9928 0.98922 0.99287 0.9892
-5 0.99738 0.98141 0.99736 0.98162
10 0 0.99329 0.96323 0.99333 0.9632
5 0.99731 0.9946 0.99736 0.99462

Correlations generally are better when reverberant energy is taken out of the impulse
 
responses.

Thesis Defense Presentation

Recommandé

Contenu connexe

Similaire à Thesis Defense Presentation

Similaire à Thesis Defense Presentation (20)

Dernier

Dernier (20)

Thesis Defense Presentation