SlideShare une entreprise Scribd logo
1  sur  61
VALIDATION OF A REAL-TIME VIRTUAL AUDITORY
 SYSTEM FOR DYNAMIC SOUND STIMULI AND ITS
        APPLICATION TO SOUND LOCALIZATION


                                  Brett Rinehold
Outline
     Motivation
 
     Introduction
 
     Background
 
     Loudspeaker Presentation
 
     HRTF Interpolation
 
     Acoustic Waveform Comparison
 
          Static Sound Presentation
      
          Dynamic Sound Presentation
      
          Static Sound with a Dynamic Head Presentation
      
     Psychophysical Experiment
 
     Discussion
 
Motivation
     To validate a real-time system that updates head-related
 
     impulse responses

     Goal is to show that the acoustic waveforms measured
 
     on KEMAR match between real and virtual presentations

     Applications:
 
          Explore the effects on sound localization with the presentation
      
          of dynamic sound
Introduction: What is Real/Virtual Audio?
     Real Audio consists of presenting sounds over
 
     loudspeakers

     Virtual Audio consists of presenting acoustic waveforms
 
     over headphones.
     Advantages
 
          Cost-effective
      
          Portable
      
          Doesn’t depend on room effects
      

     Disadvantages
 
          Unrealistic
      
Introduction: Sound Localization
     Interaural Time Difference – ITD – Differences between
 
     sound arrival times at the two ears
          Predominant cue in the low frequencies < 2kHz
      

     Interaural Level Difference – ILD – Differences between
 
     sound levels in the two ears
          Predominant cue in the higher frequencies due to head
      
          shadowing ~> 2kHz
     Encoded in Head-Related Transfer Function (HRTF)
 
          ILD in Magnitude
      
          ITD in Phase
      
Background of RTVAS System
     Developed by Jacob Scarpaci (2006)
 
     Uses a Real-Time Kernel in Linux to update HRTF filters
 




                        Key to system is that the HRTF being convolved
                        with input signal is the difference between where
                        the sound should be and where the subject’s
                        head position is.
Project Motivation/Aims
     Goal is to validate that the Real-Time Virtual Auditory
 
     System, developed by Jacob Scarpaci (2006), correctly
     updates HRTFs in accordance with head location relative
     to sound location.
     Approach to validation:
 
          Compare acoustic waveforms measured on KEMAR when
      
          presented with sound over headphones to those presented
          over loudspeakers.
               Mathematical, signals approach
           

          Perform a behavioral task where subjects are to track dynamic
      
          sound played over headphones or loudspeakers.
               Perceptual approach
           
Methods: Real Presentation - Panning




     Loudspeaker setup to create a virtual speaker
                                                      Nonlinear (Leakey, 1959)
     (shown as dashed outline) by interpolation
                                                        CH 1 = 1/ 2  (sin(! ) / 2 sin(! pos ))
     between two symmetrically located speakers
     about 0 degrees azimuth.                           CH 2 = 1/ 2 + (sin(! ) / 2 sin(! pos ))
HRTF Measurement
     Empirical KEMAR
 
          17th order MLS used to measure HRTF at every degree from -90 to 90 degrees.
      
     All measurements were windowed to 226 coefficients using a modified
 
     Hanning window to remove reverberations.
     Minimum-Phase plus Linear Phase Interpolation
 
          Interpolated from every 5 degree empirical measurements.
      
          Magnitude function was derived using a linear weighted average of the log
      
          magnitude functions from the empirical measurements.
          Minimum Phase function was derived from the magnitude function.
      



          Linear Phase component was added corresponding to the ITD calculated for
      
          that position.


      
Acoustic Waveform Comparison: Static
Sound/Static Head Methods
     Presented either a speech waveform or noise waveform at three
 
     different static locations: 5, 23, and -23 degrees




     During the free-field presentation the positions were created by
 
     using the panning technique (outlined previously) from speakers.
     Used 4 different KEMAR HRTF sets in the virtual presentation
 
          Empirical, Min-Phase Interp., Empirical Headphone TF, Min-Phase
      
          Headphone TF
     Recorded sounds on KEMAR with microphones located at the
 
     position corresponding to the human eardrum.
Static Sound/Static Head: Analysis
     Correlated the waveforms recorded over loudspeakers
 
     with the waveforms recorded over headphones for a
     given set of HRTFs.
          Correlated time, magnitude, and phase functions
      
          Allowed for a maximum delay of 4ms in time to allow for
      
          transmission delays
     Broke signals into third-octave bands with the following
 
     center frequencies:
       [200 250 315 400 500 630 800 1000 1250 1600 2000 2500 3150 4000
          5000 6300 8000 10000]
          Correlated time, magnitude, and phase within each band and calculated
      
          the delay(lag) needed to impose on one signal to achieve maximum
          correlation.
          Looked at differences in binaural cues within each band
      
Across Time/Frequency Correlations of
Static Noise
Acoustic Waveform Comparisons: Static
Sound/Static Head Results Cont.
Acoustic Waveform Comparisons: Static
Sound/Static Head Results Cont.
Acoustic Waveform Comparisons: Static
Sound/Static Head Results Cont.
Difference in ITDs from Free-Field and
Headphones for Static Noise
Difference in ILDs from Free-Field and
Headphones for Static Noise
Dynamic Sound/Static Head: Methods
     Presented a speech or a noise waveform either over
 
     loudspeakers or headphones using panning or convolution
     algorithm




     Sound was presented from 0 to 30 degrees
 
     Used same 4 HRTF sets
 
Across Time/Frequency Correlation of
Dynamic Noise
Acoustic Waveform Comparison: Dynamic
Sound/Static Head Noise Results Cont.
Acoustic Waveform Comparison: Dynamic
Sound/Static Head Noise Results Cont.
Acoustic Waveform Comparison: Dynamic
Sound/Static Head Noise Results Cont.
Difference in ITDs from Free-Field and
Headphones for Dynamic Noise
Difference in ILDs from Free-Field and
Headphones for Dynamic Noise
Static Sound/Dynamic Head: Methods
     Speech or noise waveform was presented over
 
     loudspeakers or headphone at a fixed position, 30
     degrees.
     4 HRTF sets were used
 
     KEMAR was moved from 30 degrees to 0 degree position
 
     while sound was presented.
     Head position was monitored using Intersense® IS900
 
     VWT head tracker.
Static Sound/Dynamic Head: Analysis
     Similar data analysis was performed in this case as in the
 
     previous two cases.
     Only tracks that followed the same trajectory were
 
     correlated.
          Acceptance Criteria was less than 1 or 1.5 degree difference
      
          between the tracks.
Across Time/Frequency Correlation for
Dynamic Head/Static Noise
Acoustic Waveform Comparison: Static
Sound/Dynamic Head Noise Results Cont.
Acoustic Waveform Comparison: Static
Sound/Dynamic Head Noise Results Cont.
Acoustic Waveform Comparison: Static
Sound/Dynamic Head Noise Results Cont.
Difference in ITDs from Free-Field and
Headphones for Static Noise/Dynamic Head
Difference in ILDs from Free-Field and
Headphones for Static Noise/Dynamic Head.
Waveform Comparison Discussion
     Interaural cues match up very well across the different
 
     conditions as well as between loudspeakers and
     headphones.
          Result from higher correlations in the magnitude and phase
      
          functions.
     Differences (correlation) in waveforms may not matter
 
     perceptually if receiving same binaural cues.
     Output algorithm in the RTVAS seems to present correct
 
     directional oriented sounds as well as correctly adjusting
     to head movement.
Psychophysical Experiment: Details
     6 Normal Hearing Subjects
 
          4 Male, 2 Female
      

     Sound was presented over headphones or loudspeakers
 
     Task was to track, using their head, a moving sound
 
     source.
     HRTFs tested were, Empirical KEMAR, Minimum-Phase
 
     KEMAR, Individual (Interpolated using Minimum-Phase)
Psychophysical Experiment: Details cont.
     Sound Details
 
          White noise
      
               Frequency content was 200Hz to 10kHz
           
               Presented at 65dB SPL
           
               5 second in duration
           



     Track Details
 
          15(sin((2pi/5)t)+ sin((2pi/2)t*rand))
      
Psychophysical Experiment: Virtual Setup

     Head Movement Training – Subjects just moved head (no sound)
 
          5 repetitions where subjects’ task was to put the square (representing
      
          head) in another box.
          Also centers head.
      
     Training – All using Empirical KEMAR
 
          10 trials where subject was shown, via plot, the path of the sound before
      
          it played.



          10 trials where the same track as before was presented but no visual
      
          cue was available.
          10 trials where subject was shown, via plot, the path but path was
      
          random from trial to trial.
          10 trials where tracks were random and no visualization.
      
Psychophysical Experiment: Setup cont.
     Experiment (Headphones)
 
          10 trials using Empirical KEMAR HRTFs
      
          10 trials using Minimum-Phase KEMAR HRTFs
      
          10 trials using Individual HRTFs
      
          Repeated 3 times
      

     Loudspeaker Training
 
          Same as headphones but trials were reduced to 5.
      

     Loudspeaker Experiment
 
          30 trials repeated only once
      
          Subjects were instructed to press a button as soon as they
      
          heard the sound. This started the head tracking.
Individual Tracking Results
Individual RMS/RMS Error
Individual Response to Complexity of Tracks
Overall Coherence in Performance
Overall Latency in Tracking
RMS/RMS Error of Tracking
Complexity of Track Analysis
Deeper Look into Individual HRTF Case
Psychophysical Experiment: Discussion
     Coherence
 
          The coherence or correlation measure is statistically insignificant in
      
          the empirical and minimum phase interpolation case from that over
          loudspeakers.
          Coherence of individual HRTFs was surprisingly worse.
      
          Coherence also stays strong as the complexity of the track varies.
      

     Latency
 
          Individual HRTFs show a more variability in latency.
      
               Might be able to track changes quicker using their own HRTFs
           

          Loudspeaker latency is negative which means that subjects are
      
          predicting the path.
               Could be because subjects are predicting the path since sound always go
           
               to the right first as well as a result from the delay in pressing the button
Psychophysical Experiment: Discussion
Cont.
     RMS
 
          No significant difference in total RMS error as well as RMS
      
          undershoot error between Empirical and Minimum-Phase
          HRTFs from loudspeakers.
          Subjects generally undershoot the path of the sound.
      
               Could be a motor problem, i.e. laziness, as well as perception.
           
Overall Conclusions
     Coherence of acoustic recordings may not be the best
 
     measure for validation
          Reverberation or panning techniques
      



     If perception is the only thing that matters, than have to
 
     conclude that algorithm works
Future Work
     Look at different methods for presenting dynamic sound
 
     over loudspeakers.
     Try different room environments.
 
     Closer look at differences between headphones
 
          Particularly looking at open canal tube-phones to see if
      
          subjects could distinguish between real and virtual sources.
     Various psychophysical experiments that involve dynamic
 
     sound (speech, masking)
          Sound localization
      
          Source separation
      
Acknowledgements
                                       Other
     Committee                     
 
                                            Dave Freedman
          Steven Colburn                
      
                                            Jake Scarpaci
          Barb Shinn-Cunningham         
      
          Nathaniel Durlach            My Subjects
                                  
     Binaural Gang                     All in Attendance
                                  
          Todd Jennings
      
          Le Wang
      
          Tim Streeter
      
          Varun Parmar
      
          Akshay Navaladi
      
          Antje Ihlefeld
      
THANK YOU
Backup Slides
Methods: Real Presentation Continued

                                     Input stimulus was a 17th
    Title: Speaker
                                 
    Presentation

                                     order mls sequence sampled
      Source         Created

                                     at 50kHz.
      Speaker        Position

                        -5
                                          Corresponds to a duration of
                                      
         10              0
                                          ~2.6sec
                         5
                       -10
         15              0
                                     Waveforms were recorded
                                 
                       10
                       -20
                                     on KEMAR (Knowles
                       -10

                                     Electronic Manikin for
         30              0
                       10

                                     Acoustic Research)
                       20
                       -40
                       -30
                       -10
         45              0
                       10
                       30
                       40
Results: Real Presentation

 •  HRTFs measured when sound was presented over loudspeakers using the
 linear and nonlinear interpolation functions


   Linear                                  Nonlinear
Results: Correlation Coefficients at all Spatial Locations for
    Interpolated Sound over Loudspeakers


                                                                                                   Correlation
Title: Correlation Coefficients
                                                                                               
                             Linear Function                  Non-linear Function
Speaker        Virtual
                             Left           Right             Left          Right
Location       Position
                                                                                                   between a
                         -40       0.98799           0.9758         0.98655         0.97769
                         -30       0.97427          0.96611         0.97534         0.96777

                                                                                                   virtual point
                         -10       0.96842          0.94612         0.96858          0.9466
      45                   0       0.95736          0.91602         0.95693         0.91709
                          10       0.96374          0.95282         0.96384         0.95276
                                                                                                   source and a
                          30       0.97532          0.97095         0.97644         0.97084
                          40       0.98397          0.98194         0.98268         0.98177

                                                                                                   real source
                         -20       0.98372          0.97316         0.98385         0.97357
                         -10       0.98054           0.9564         0.98054         0.95649
      30                   0       0.97184          0.93755         0.97171         0.93774
                          10       0.97151          0.96414         0.97147         0.96448
                          20       0.97844          0.97768         0.97883         0.97762
                         -10         0.993          0.97775         0.99301         0.97787
      15                   0       0.97821          0.95517         0.97817         0.95503
                          10       0.98406          0.98576         0.98412         0.98572
                          -5       0.99326          0.97585         0.99328         0.97601
      10                   0       0.98927          0.96086         0.98924         0.96077
                           5       0.99319          0.98977         0.99312         0.98977

Very strong correlation, generally, for all spatial locations
Weaker correlation as speakers become more spatially separated
Weakest correlation when created sound is furthest from both speakers (0
degrees)
Spatial Separation of Loudspeakers

                                                    Correlation coefficients
                                                
                                                    for a virtually created
                                                    sound source at -10
                                                    degrees at various
                                                    spatial separations of the
                                                    loudspeakers


Correlation declines as the loudspeakers become more spatially separated
Example of Psuedo-Anechoic HRTFs




• Correlation coefficients are slightly better when reverberations are taken out of the impulse
responses
      • Linear Reverberant: 0.98054, 0.9564 (Left, Right Ears)
      • Linear Psuedo-Anechoic: 0.98545, 0.96019 (Left, Right Ears)
      • Nonlinear Reverberant: 0.98054, 0.95649 (Left, Right Ears)
      • Nonlinear Pseudo-Anechoic: 0.9855, 0.96007 (Left, Right Ears)
Correlation Coefficients at all Spatial Locations for Interpolated
     Sound over Loudspeakers (Pseudo-Anechoic)

     Table 3. Correlation Coefficients for Psuedo-Anechoic HRTFs
                                 Linear Function              Non-linear Function
     Speaker       Virtual
                                 Left           Right         Left          Right
     Location      Position
                            -40        0.96567        0.99168       0.96416         0.98421
                            -30        0.96223        0.95356       0.96138         0.95815
                            -10        0.96348        0.93433       0.96299         0.93902
           45                 0        0.95471        0.89491       0.95436         0.89968
                             10        0.95856        0.93652       0.95913         0.93953
                             30        0.97678          0.945       0.97825         0.94013
                             40        0.99563         0.9814          0.99         0.98018
                            -20        0.98762        0.97555       0.98767         0.97663
                            -10        0.98545        0.96019        0.9855         0.96007
           30                 0        0.97281        0.93616       0.97284         0.93623
                             10        0.97927        0.96945       0.97912         0.96968
                             20        0.97904        0.98188       0.97846         0.98183
                            -10        0.99608        0.98114       0.99592         0.98167
           15                 0        0.97891        0.95475        0.9788         0.95461
                             10         0.9928        0.98922       0.99287          0.9892
                             -5        0.99738        0.98141       0.99736         0.98162
           10                 0        0.99329        0.96323       0.99333          0.9632
                              5        0.99731         0.9946       0.99736         0.99462


      Correlations generally are better when reverberant energy is taken out of the impulse
 
      responses.
HRTF Window Function
HRTF Magnitude Comparison
Headphone Transfer Function

Contenu connexe

Similaire à Thesis Defense Presentation

Defense - Sound space rendering based on the virtual Sound space rendering ba...
Defense - Sound space rendering based on the virtual Sound space rendering ba...Defense - Sound space rendering based on the virtual Sound space rendering ba...
Defense - Sound space rendering based on the virtual Sound space rendering ba...JunjieShi3
 
Feasibility of EEG Super-Resolution Using Deep Convolutional Networks
Feasibility of EEG Super-Resolution Using Deep Convolutional NetworksFeasibility of EEG Super-Resolution Using Deep Convolutional Networks
Feasibility of EEG Super-Resolution Using Deep Convolutional NetworksSangjun Han
 
EEG Basics monish.pptx
EEG Basics monish.pptxEEG Basics monish.pptx
EEG Basics monish.pptxMohinishS
 
Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...
Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...
Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...RicardoVallejo30
 
SUMSEM-2021-22_ECE6007_ETH_VL2021220701295_Reference_Material_I_04-07-2022_EE...
SUMSEM-2021-22_ECE6007_ETH_VL2021220701295_Reference_Material_I_04-07-2022_EE...SUMSEM-2021-22_ECE6007_ETH_VL2021220701295_Reference_Material_I_04-07-2022_EE...
SUMSEM-2021-22_ECE6007_ETH_VL2021220701295_Reference_Material_I_04-07-2022_EE...BharathSrinivasG
 
ECHOES & DELPH Seismic - Advances in geophysical sensor data acquisition
ECHOES & DELPH Seismic - Advances in geophysical sensor data acquisitionECHOES & DELPH Seismic - Advances in geophysical sensor data acquisition
ECHOES & DELPH Seismic - Advances in geophysical sensor data acquisitionIXSEA-DELPH
 
Audiometry class by Dr. Kavitha Ashok Kumar MSU Malaysia
Audiometry class by Dr. Kavitha Ashok Kumar MSU MalaysiaAudiometry class by Dr. Kavitha Ashok Kumar MSU Malaysia
Audiometry class by Dr. Kavitha Ashok Kumar MSU MalaysiaKavitha Ashokb
 
Introduction to EEG: Instrument and Acquisition
Introduction to EEG: Instrument and AcquisitionIntroduction to EEG: Instrument and Acquisition
Introduction to EEG: Instrument and Acquisitionkj_jantzen
 
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...a3labdsp
 
How to play audio from a microcontroller
How to play audio from a microcontrollerHow to play audio from a microcontroller
How to play audio from a microcontrollerMahadev Gopalakrishnan
 
Otoacoustic Emissions : A comparison between simulation and lab measures.
Otoacoustic Emissions : A comparison between simulation and lab measures.Otoacoustic Emissions : A comparison between simulation and lab measures.
Otoacoustic Emissions : A comparison between simulation and lab measures.Nicolò Paternoster
 
February 26 esp 179 noise
February 26  esp 179 noiseFebruary 26  esp 179 noise
February 26 esp 179 noiseCEQAplanner
 
Build Your Own VR Display Course - SIGGRAPH 2017: Part 4
Build Your Own VR Display Course - SIGGRAPH 2017: Part 4Build Your Own VR Display Course - SIGGRAPH 2017: Part 4
Build Your Own VR Display Course - SIGGRAPH 2017: Part 4StanfordComputationalImaging
 
Geometric distortion artifact remedy
Geometric distortion artifact remedyGeometric distortion artifact remedy
Geometric distortion artifact remedyGamal Mahdaly
 

Similaire à Thesis Defense Presentation (20)

Defense - Sound space rendering based on the virtual Sound space rendering ba...
Defense - Sound space rendering based on the virtual Sound space rendering ba...Defense - Sound space rendering based on the virtual Sound space rendering ba...
Defense - Sound space rendering based on the virtual Sound space rendering ba...
 
Feasibility of EEG Super-Resolution Using Deep Convolutional Networks
Feasibility of EEG Super-Resolution Using Deep Convolutional NetworksFeasibility of EEG Super-Resolution Using Deep Convolutional Networks
Feasibility of EEG Super-Resolution Using Deep Convolutional Networks
 
EEG Basics monish.pptx
EEG Basics monish.pptxEEG Basics monish.pptx
EEG Basics monish.pptx
 
Equalisers
EqualisersEqualisers
Equalisers
 
Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...
Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...
Recording Distortion Product Otoacoustic Emissions using the Adaptive Noise C...
 
example based audio editing
example based audio editingexample based audio editing
example based audio editing
 
SUMSEM-2021-22_ECE6007_ETH_VL2021220701295_Reference_Material_I_04-07-2022_EE...
SUMSEM-2021-22_ECE6007_ETH_VL2021220701295_Reference_Material_I_04-07-2022_EE...SUMSEM-2021-22_ECE6007_ETH_VL2021220701295_Reference_Material_I_04-07-2022_EE...
SUMSEM-2021-22_ECE6007_ETH_VL2021220701295_Reference_Material_I_04-07-2022_EE...
 
ECHOES & DELPH Seismic - Advances in geophysical sensor data acquisition
ECHOES & DELPH Seismic - Advances in geophysical sensor data acquisitionECHOES & DELPH Seismic - Advances in geophysical sensor data acquisition
ECHOES & DELPH Seismic - Advances in geophysical sensor data acquisition
 
Audiometry class by Dr. Kavitha Ashok Kumar MSU Malaysia
Audiometry class by Dr. Kavitha Ashok Kumar MSU MalaysiaAudiometry class by Dr. Kavitha Ashok Kumar MSU Malaysia
Audiometry class by Dr. Kavitha Ashok Kumar MSU Malaysia
 
Introduction to EEG: Instrument and Acquisition
Introduction to EEG: Instrument and AcquisitionIntroduction to EEG: Instrument and Acquisition
Introduction to EEG: Instrument and Acquisition
 
TESTS FOR AUDITORY ASSESSMENT
TESTS FOR AUDITORY ASSESSMENTTESTS FOR AUDITORY ASSESSMENT
TESTS FOR AUDITORY ASSESSMENT
 
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
 
How to play audio from a microcontroller
How to play audio from a microcontrollerHow to play audio from a microcontroller
How to play audio from a microcontroller
 
3D Spatial Response
3D Spatial Response3D Spatial Response
3D Spatial Response
 
EEG course.pptx
EEG course.pptxEEG course.pptx
EEG course.pptx
 
Otoacoustic Emissions : A comparison between simulation and lab measures.
Otoacoustic Emissions : A comparison between simulation and lab measures.Otoacoustic Emissions : A comparison between simulation and lab measures.
Otoacoustic Emissions : A comparison between simulation and lab measures.
 
February 26 esp 179 noise
February 26  esp 179 noiseFebruary 26  esp 179 noise
February 26 esp 179 noise
 
3 D Sound
3 D Sound3 D Sound
3 D Sound
 
Build Your Own VR Display Course - SIGGRAPH 2017: Part 4
Build Your Own VR Display Course - SIGGRAPH 2017: Part 4Build Your Own VR Display Course - SIGGRAPH 2017: Part 4
Build Your Own VR Display Course - SIGGRAPH 2017: Part 4
 
Geometric distortion artifact remedy
Geometric distortion artifact remedyGeometric distortion artifact remedy
Geometric distortion artifact remedy
 

Dernier

Welding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsWelding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsIndiaMART InterMESH Limited
 
Data Analytics Strategy Toolkit and Templates
Data Analytics Strategy Toolkit and TemplatesData Analytics Strategy Toolkit and Templates
Data Analytics Strategy Toolkit and TemplatesAurelien Domont, MBA
 
How to Conduct a Service Gap Analysis for Your Business
How to Conduct a Service Gap Analysis for Your BusinessHow to Conduct a Service Gap Analysis for Your Business
How to Conduct a Service Gap Analysis for Your BusinessHelp Desk Migration
 
Unveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesUnveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesDoe Paoro
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckHajeJanKamps
 
Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)
Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)
Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)Lviv Startup Club
 
Andrii Rodionov: What can go wrong in a distributed system – experience from ...
Andrii Rodionov: What can go wrong in a distributed system – experience from ...Andrii Rodionov: What can go wrong in a distributed system – experience from ...
Andrii Rodionov: What can go wrong in a distributed system – experience from ...Lviv Startup Club
 
Entrepreneurial ecosystem- Wider context
Entrepreneurial ecosystem- Wider contextEntrepreneurial ecosystem- Wider context
Entrepreneurial ecosystem- Wider contextP&CO
 
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...ssuserf63bd7
 
Customizable Contents Restoration Training
Customizable Contents Restoration TrainingCustomizable Contents Restoration Training
Customizable Contents Restoration TrainingCalvinarnold843
 
NAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataNAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
Simplify Your Funding: Quick and Easy Business Loans
Simplify Your Funding: Quick and Easy Business LoansSimplify Your Funding: Quick and Easy Business Loans
Simplify Your Funding: Quick and Easy Business LoansNugget Global
 
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...SOFTTECHHUB
 
WSMM Media and Entertainment Feb_March_Final.pdf
WSMM Media and Entertainment Feb_March_Final.pdfWSMM Media and Entertainment Feb_March_Final.pdf
WSMM Media and Entertainment Feb_March_Final.pdfJamesConcepcion7
 
Exploring Elite Translation Services in Your Vicinity
Exploring Elite Translation Services in Your VicinityExploring Elite Translation Services in Your Vicinity
Exploring Elite Translation Services in Your VicinityThe Spanish Group
 
WSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfWSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfJamesConcepcion7
 
Rakhi sets symbolizing the bond of love.pptx
Rakhi sets symbolizing the bond of love.pptxRakhi sets symbolizing the bond of love.pptx
Rakhi sets symbolizing the bond of love.pptxRakhi Bazaar
 
Vladyslav Fliahin: Applications of Gen AI in CV (UA)
Vladyslav Fliahin: Applications of Gen AI in CV (UA)Vladyslav Fliahin: Applications of Gen AI in CV (UA)
Vladyslav Fliahin: Applications of Gen AI in CV (UA)Lviv Startup Club
 
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdfSherl Simon
 

Dernier (20)

Welding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsWelding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan Dynamics
 
Data Analytics Strategy Toolkit and Templates
Data Analytics Strategy Toolkit and TemplatesData Analytics Strategy Toolkit and Templates
Data Analytics Strategy Toolkit and Templates
 
How to Conduct a Service Gap Analysis for Your Business
How to Conduct a Service Gap Analysis for Your BusinessHow to Conduct a Service Gap Analysis for Your Business
How to Conduct a Service Gap Analysis for Your Business
 
Unveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesUnveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic Experiences
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deck
 
Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)
Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)
Kyryl Truskovskyi: Training and Serving Open-Sourced Foundational Models (UA)
 
Andrii Rodionov: What can go wrong in a distributed system – experience from ...
Andrii Rodionov: What can go wrong in a distributed system – experience from ...Andrii Rodionov: What can go wrong in a distributed system – experience from ...
Andrii Rodionov: What can go wrong in a distributed system – experience from ...
 
Entrepreneurial ecosystem- Wider context
Entrepreneurial ecosystem- Wider contextEntrepreneurial ecosystem- Wider context
Entrepreneurial ecosystem- Wider context
 
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
 
Customizable Contents Restoration Training
Customizable Contents Restoration TrainingCustomizable Contents Restoration Training
Customizable Contents Restoration Training
 
NAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataNAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors Data
 
Simplify Your Funding: Quick and Easy Business Loans
Simplify Your Funding: Quick and Easy Business LoansSimplify Your Funding: Quick and Easy Business Loans
Simplify Your Funding: Quick and Easy Business Loans
 
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
 
WAM Corporate Presentation April 12 2024.pdf
WAM Corporate Presentation April 12 2024.pdfWAM Corporate Presentation April 12 2024.pdf
WAM Corporate Presentation April 12 2024.pdf
 
WSMM Media and Entertainment Feb_March_Final.pdf
WSMM Media and Entertainment Feb_March_Final.pdfWSMM Media and Entertainment Feb_March_Final.pdf
WSMM Media and Entertainment Feb_March_Final.pdf
 
Exploring Elite Translation Services in Your Vicinity
Exploring Elite Translation Services in Your VicinityExploring Elite Translation Services in Your Vicinity
Exploring Elite Translation Services in Your Vicinity
 
WSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfWSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdf
 
Rakhi sets symbolizing the bond of love.pptx
Rakhi sets symbolizing the bond of love.pptxRakhi sets symbolizing the bond of love.pptx
Rakhi sets symbolizing the bond of love.pptx
 
Vladyslav Fliahin: Applications of Gen AI in CV (UA)
Vladyslav Fliahin: Applications of Gen AI in CV (UA)Vladyslav Fliahin: Applications of Gen AI in CV (UA)
Vladyslav Fliahin: Applications of Gen AI in CV (UA)
 
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
5-Step Framework to Convert Any Business into a Wealth Generation Machine.pdf
 

Thesis Defense Presentation

  • 1. VALIDATION OF A REAL-TIME VIRTUAL AUDITORY SYSTEM FOR DYNAMIC SOUND STIMULI AND ITS APPLICATION TO SOUND LOCALIZATION Brett Rinehold
  • 2. Outline Motivation   Introduction   Background   Loudspeaker Presentation   HRTF Interpolation   Acoustic Waveform Comparison   Static Sound Presentation   Dynamic Sound Presentation   Static Sound with a Dynamic Head Presentation   Psychophysical Experiment   Discussion  
  • 3. Motivation To validate a real-time system that updates head-related   impulse responses Goal is to show that the acoustic waveforms measured   on KEMAR match between real and virtual presentations Applications:   Explore the effects on sound localization with the presentation   of dynamic sound
  • 4. Introduction: What is Real/Virtual Audio? Real Audio consists of presenting sounds over   loudspeakers Virtual Audio consists of presenting acoustic waveforms   over headphones. Advantages   Cost-effective   Portable   Doesn’t depend on room effects   Disadvantages   Unrealistic  
  • 5. Introduction: Sound Localization Interaural Time Difference – ITD – Differences between   sound arrival times at the two ears Predominant cue in the low frequencies < 2kHz   Interaural Level Difference – ILD – Differences between   sound levels in the two ears Predominant cue in the higher frequencies due to head   shadowing ~> 2kHz Encoded in Head-Related Transfer Function (HRTF)   ILD in Magnitude   ITD in Phase  
  • 6. Background of RTVAS System Developed by Jacob Scarpaci (2006)   Uses a Real-Time Kernel in Linux to update HRTF filters   Key to system is that the HRTF being convolved with input signal is the difference between where the sound should be and where the subject’s head position is.
  • 7. Project Motivation/Aims Goal is to validate that the Real-Time Virtual Auditory   System, developed by Jacob Scarpaci (2006), correctly updates HRTFs in accordance with head location relative to sound location. Approach to validation:   Compare acoustic waveforms measured on KEMAR when   presented with sound over headphones to those presented over loudspeakers. Mathematical, signals approach   Perform a behavioral task where subjects are to track dynamic   sound played over headphones or loudspeakers. Perceptual approach  
  • 8. Methods: Real Presentation - Panning Loudspeaker setup to create a virtual speaker     Nonlinear (Leakey, 1959) (shown as dashed outline) by interpolation CH 1 = 1/ 2 (sin(! ) / 2 sin(! pos )) between two symmetrically located speakers about 0 degrees azimuth. CH 2 = 1/ 2 + (sin(! ) / 2 sin(! pos ))
  • 9. HRTF Measurement Empirical KEMAR   17th order MLS used to measure HRTF at every degree from -90 to 90 degrees.   All measurements were windowed to 226 coefficients using a modified   Hanning window to remove reverberations. Minimum-Phase plus Linear Phase Interpolation   Interpolated from every 5 degree empirical measurements.   Magnitude function was derived using a linear weighted average of the log   magnitude functions from the empirical measurements. Minimum Phase function was derived from the magnitude function.   Linear Phase component was added corresponding to the ITD calculated for   that position.  
  • 10. Acoustic Waveform Comparison: Static Sound/Static Head Methods Presented either a speech waveform or noise waveform at three   different static locations: 5, 23, and -23 degrees During the free-field presentation the positions were created by   using the panning technique (outlined previously) from speakers. Used 4 different KEMAR HRTF sets in the virtual presentation   Empirical, Min-Phase Interp., Empirical Headphone TF, Min-Phase   Headphone TF Recorded sounds on KEMAR with microphones located at the   position corresponding to the human eardrum.
  • 11. Static Sound/Static Head: Analysis Correlated the waveforms recorded over loudspeakers   with the waveforms recorded over headphones for a given set of HRTFs. Correlated time, magnitude, and phase functions   Allowed for a maximum delay of 4ms in time to allow for   transmission delays Broke signals into third-octave bands with the following   center frequencies:   [200 250 315 400 500 630 800 1000 1250 1600 2000 2500 3150 4000 5000 6300 8000 10000] Correlated time, magnitude, and phase within each band and calculated   the delay(lag) needed to impose on one signal to achieve maximum correlation. Looked at differences in binaural cues within each band  
  • 13. Acoustic Waveform Comparisons: Static Sound/Static Head Results Cont.
  • 14. Acoustic Waveform Comparisons: Static Sound/Static Head Results Cont.
  • 15. Acoustic Waveform Comparisons: Static Sound/Static Head Results Cont.
  • 16. Difference in ITDs from Free-Field and Headphones for Static Noise
  • 17. Difference in ILDs from Free-Field and Headphones for Static Noise
  • 18. Dynamic Sound/Static Head: Methods Presented a speech or a noise waveform either over   loudspeakers or headphones using panning or convolution algorithm Sound was presented from 0 to 30 degrees   Used same 4 HRTF sets  
  • 20. Acoustic Waveform Comparison: Dynamic Sound/Static Head Noise Results Cont.
  • 21. Acoustic Waveform Comparison: Dynamic Sound/Static Head Noise Results Cont.
  • 22. Acoustic Waveform Comparison: Dynamic Sound/Static Head Noise Results Cont.
  • 23. Difference in ITDs from Free-Field and Headphones for Dynamic Noise
  • 24. Difference in ILDs from Free-Field and Headphones for Dynamic Noise
  • 25. Static Sound/Dynamic Head: Methods Speech or noise waveform was presented over   loudspeakers or headphone at a fixed position, 30 degrees. 4 HRTF sets were used   KEMAR was moved from 30 degrees to 0 degree position   while sound was presented. Head position was monitored using Intersense® IS900   VWT head tracker.
  • 26. Static Sound/Dynamic Head: Analysis Similar data analysis was performed in this case as in the   previous two cases. Only tracks that followed the same trajectory were   correlated. Acceptance Criteria was less than 1 or 1.5 degree difference   between the tracks.
  • 27. Across Time/Frequency Correlation for Dynamic Head/Static Noise
  • 28. Acoustic Waveform Comparison: Static Sound/Dynamic Head Noise Results Cont.
  • 29. Acoustic Waveform Comparison: Static Sound/Dynamic Head Noise Results Cont.
  • 30. Acoustic Waveform Comparison: Static Sound/Dynamic Head Noise Results Cont.
  • 31. Difference in ITDs from Free-Field and Headphones for Static Noise/Dynamic Head
  • 32. Difference in ILDs from Free-Field and Headphones for Static Noise/Dynamic Head.
  • 33. Waveform Comparison Discussion Interaural cues match up very well across the different   conditions as well as between loudspeakers and headphones. Result from higher correlations in the magnitude and phase   functions. Differences (correlation) in waveforms may not matter   perceptually if receiving same binaural cues. Output algorithm in the RTVAS seems to present correct   directional oriented sounds as well as correctly adjusting to head movement.
  • 34. Psychophysical Experiment: Details 6 Normal Hearing Subjects   4 Male, 2 Female   Sound was presented over headphones or loudspeakers   Task was to track, using their head, a moving sound   source. HRTFs tested were, Empirical KEMAR, Minimum-Phase   KEMAR, Individual (Interpolated using Minimum-Phase)
  • 35. Psychophysical Experiment: Details cont. Sound Details   White noise   Frequency content was 200Hz to 10kHz   Presented at 65dB SPL   5 second in duration   Track Details   15(sin((2pi/5)t)+ sin((2pi/2)t*rand))  
  • 36. Psychophysical Experiment: Virtual Setup Head Movement Training – Subjects just moved head (no sound)   5 repetitions where subjects’ task was to put the square (representing   head) in another box. Also centers head.   Training – All using Empirical KEMAR   10 trials where subject was shown, via plot, the path of the sound before   it played. 10 trials where the same track as before was presented but no visual   cue was available. 10 trials where subject was shown, via plot, the path but path was   random from trial to trial. 10 trials where tracks were random and no visualization.  
  • 37. Psychophysical Experiment: Setup cont. Experiment (Headphones)   10 trials using Empirical KEMAR HRTFs   10 trials using Minimum-Phase KEMAR HRTFs   10 trials using Individual HRTFs   Repeated 3 times   Loudspeaker Training   Same as headphones but trials were reduced to 5.   Loudspeaker Experiment   30 trials repeated only once   Subjects were instructed to press a button as soon as they   heard the sound. This started the head tracking.
  • 40. Individual Response to Complexity of Tracks
  • 41. Overall Coherence in Performance
  • 42. Overall Latency in Tracking
  • 43. RMS/RMS Error of Tracking
  • 45. Deeper Look into Individual HRTF Case
  • 46. Psychophysical Experiment: Discussion Coherence   The coherence or correlation measure is statistically insignificant in   the empirical and minimum phase interpolation case from that over loudspeakers. Coherence of individual HRTFs was surprisingly worse.   Coherence also stays strong as the complexity of the track varies.   Latency   Individual HRTFs show a more variability in latency.   Might be able to track changes quicker using their own HRTFs   Loudspeaker latency is negative which means that subjects are   predicting the path. Could be because subjects are predicting the path since sound always go   to the right first as well as a result from the delay in pressing the button
  • 47. Psychophysical Experiment: Discussion Cont. RMS   No significant difference in total RMS error as well as RMS   undershoot error between Empirical and Minimum-Phase HRTFs from loudspeakers. Subjects generally undershoot the path of the sound.   Could be a motor problem, i.e. laziness, as well as perception.  
  • 48. Overall Conclusions Coherence of acoustic recordings may not be the best   measure for validation Reverberation or panning techniques   If perception is the only thing that matters, than have to   conclude that algorithm works
  • 49. Future Work Look at different methods for presenting dynamic sound   over loudspeakers. Try different room environments.   Closer look at differences between headphones   Particularly looking at open canal tube-phones to see if   subjects could distinguish between real and virtual sources. Various psychophysical experiments that involve dynamic   sound (speech, masking) Sound localization   Source separation  
  • 50. Acknowledgements Other Committee     Dave Freedman Steven Colburn     Jake Scarpaci Barb Shinn-Cunningham     Nathaniel Durlach My Subjects     Binaural Gang All in Attendance     Todd Jennings   Le Wang   Tim Streeter   Varun Parmar   Akshay Navaladi   Antje Ihlefeld  
  • 53. Methods: Real Presentation Continued Input stimulus was a 17th Title: Speaker   Presentation order mls sequence sampled Source Created at 50kHz. Speaker Position -5 Corresponds to a duration of   10 0 ~2.6sec 5 -10 15 0 Waveforms were recorded   10 -20 on KEMAR (Knowles -10 Electronic Manikin for 30 0 10 Acoustic Research) 20 -40 -30 -10 45 0 10 30 40
  • 54. Results: Real Presentation •  HRTFs measured when sound was presented over loudspeakers using the linear and nonlinear interpolation functions Linear Nonlinear
  • 55. Results: Correlation Coefficients at all Spatial Locations for Interpolated Sound over Loudspeakers Correlation Title: Correlation Coefficients   Linear Function Non-linear Function Speaker Virtual Left Right Left Right Location Position between a -40 0.98799 0.9758 0.98655 0.97769 -30 0.97427 0.96611 0.97534 0.96777 virtual point -10 0.96842 0.94612 0.96858 0.9466 45 0 0.95736 0.91602 0.95693 0.91709 10 0.96374 0.95282 0.96384 0.95276 source and a 30 0.97532 0.97095 0.97644 0.97084 40 0.98397 0.98194 0.98268 0.98177 real source -20 0.98372 0.97316 0.98385 0.97357 -10 0.98054 0.9564 0.98054 0.95649 30 0 0.97184 0.93755 0.97171 0.93774 10 0.97151 0.96414 0.97147 0.96448 20 0.97844 0.97768 0.97883 0.97762 -10 0.993 0.97775 0.99301 0.97787 15 0 0.97821 0.95517 0.97817 0.95503 10 0.98406 0.98576 0.98412 0.98572 -5 0.99326 0.97585 0.99328 0.97601 10 0 0.98927 0.96086 0.98924 0.96077 5 0.99319 0.98977 0.99312 0.98977 Very strong correlation, generally, for all spatial locations Weaker correlation as speakers become more spatially separated Weakest correlation when created sound is furthest from both speakers (0 degrees)
  • 56. Spatial Separation of Loudspeakers Correlation coefficients   for a virtually created sound source at -10 degrees at various spatial separations of the loudspeakers Correlation declines as the loudspeakers become more spatially separated
  • 57. Example of Psuedo-Anechoic HRTFs • Correlation coefficients are slightly better when reverberations are taken out of the impulse responses • Linear Reverberant: 0.98054, 0.9564 (Left, Right Ears) • Linear Psuedo-Anechoic: 0.98545, 0.96019 (Left, Right Ears) • Nonlinear Reverberant: 0.98054, 0.95649 (Left, Right Ears) • Nonlinear Pseudo-Anechoic: 0.9855, 0.96007 (Left, Right Ears)
  • 58. Correlation Coefficients at all Spatial Locations for Interpolated Sound over Loudspeakers (Pseudo-Anechoic) Table 3. Correlation Coefficients for Psuedo-Anechoic HRTFs Linear Function Non-linear Function Speaker Virtual Left Right Left Right Location Position -40 0.96567 0.99168 0.96416 0.98421 -30 0.96223 0.95356 0.96138 0.95815 -10 0.96348 0.93433 0.96299 0.93902 45 0 0.95471 0.89491 0.95436 0.89968 10 0.95856 0.93652 0.95913 0.93953 30 0.97678 0.945 0.97825 0.94013 40 0.99563 0.9814 0.99 0.98018 -20 0.98762 0.97555 0.98767 0.97663 -10 0.98545 0.96019 0.9855 0.96007 30 0 0.97281 0.93616 0.97284 0.93623 10 0.97927 0.96945 0.97912 0.96968 20 0.97904 0.98188 0.97846 0.98183 -10 0.99608 0.98114 0.99592 0.98167 15 0 0.97891 0.95475 0.9788 0.95461 10 0.9928 0.98922 0.99287 0.9892 -5 0.99738 0.98141 0.99736 0.98162 10 0 0.99329 0.96323 0.99333 0.9632 5 0.99731 0.9946 0.99736 0.99462 Correlations generally are better when reverberant energy is taken out of the impulse   responses.