SlideShare une entreprise Scribd logo
1  sur  11
SOUND SOURCE LOCALIZATION WITH MICROPHONE ARRAYS
Ramin Anushiravani
Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign
ABSTRACT
In this paper I discuss three basic and important meth-
ods for finding the direction of arrival (DOA) in a far
field environment for sound sources. The first two ap-
proaches are based on Beamforming techniques: Delay
and Sum Beamformer and Minimum Variance Distor-
tionless Response Beamformer (MVDR). The third ap-
proach is a subspace method that uses the well-known
algorithm, Multiple Signal Classification (MUSIC). I
demonstrated the accuracy of each algorithm by local-
izing sound sources in an office environment using a
uniform linear array (ULA).
Index Terms— direction of arrival, Beamforming,
subspace method, uniform linear array
1. INTRODUCTION
Sound source localization has many applications in
speech enhancement such as speech denoising and
dereverberation [1-3] by forming a beam toward the
speaker and therefore reducing the noise reverberation
from other directions. This is especially useful for hear-
ing aids industry [4]. Sound source localization can also
be used for surveillance purposes such as finding the
direction of gun shots for in public places by scanning
the environment for sudden sound activities and classi-
fying sounds to one of the available database [5]. More
recently, sound source localization techniques has been
used to reconstruct spatial audio for the purpose of en-
tertainment and improving teleconferencing experience
[6]. In order to motivate the background behind the
DOA, I first discuss how living creatures localize sound
sources and then represent a simple time delay model
for localizing sound sources. In section III, I discuss the
general signal model for beamforming techniques. In
section VI, I talk about uniform linear arrays and specif-
ically two of their characteristics, spatial aliasing and
beampattern. In section V; I discuss three algorithms for
sound source localization, delay and sum, MVDR and
MUSIC. In section VI, I discuss the experiment setup
and finally in section VII I evaluate the results for the
algorithms mentioned in section V.
2. A MODEL FOR SOUND LOCALIZATION
In a crowded party, we can focus and form a beam
toward the listener and therefore enhance the speech
quality of the speaker, but how?
Human beings use variety of information from the
sound source and environment for localization. For
example, if a sound is located closer to the right ear, it
reaches it earlier than the left ear and it will also be big-
ger in amplitude due to head shadow effect. Brain uses
the time delay and level difference cues between the two
to localize the sound as shown in figure 1. However, if
a sound is coming from a back or front of the head, they
both will have similar time delay and level difference.
That’s when spectral information such as pinna shape,
shoulders, hair and even one’s clothing will help the
listener localize where the sound is coming from. In
addition to human, animals, insects and even parasites
also have some sort of source localization systems. An
interesting case is Ormia, a small fly that eat off crickets.
Ormia’s localization system is quite extraordinary and
many scientists have been trying to model microphones
and hearing of its sound localization systems [7]. The
distance between Ormia’s ears is of micrometer and
so the time and the level difference between the two
ears is too small to help the parasite localize sound.
Ormia’s ears, however, communicate with each other in
a complex manner and that help increasing the resolu-
tion of the localization by almost 20 times its physical
constraint [8].
2.1. Time Delay Model
As discussed earlier, time delay is an important cue for
localizing sound. We can build an exaggerated simpli-
fied model for sound localization based of time delay
for human ears as shown in figure 2.
The assumption on a time delay model is that, the
listener does not have a head (there is no level difference
cues or spectral information), only two ears distant from
each other by the size of the head (about 22 cm). Since in
most practical scenario the sound source is in far field,
we can assume that sound waves are parallel to each
Fig. 1. Time delay and level difference cues for sound
source localization
Fig. 2. A time delay model for source localization
other from microphones point of view. One of the mi-
crophones (ears) can then be fixed as a reference, and the
time delay for the other microphone is calculated using
geometry. The time delay of arrival can be represented
as the angle of arrival as shown in equation 1.
τ =
dsin(θ)
c
(1)
In frequency domain time delay is represented as,
e−jωτ
, where ω is the freuency of the source signal.
Speech signal are broadband signals, and so this nar-
rowband assumption will not work in practice.
Given a reference signal and a delayed signal, one can
localize a narrowband signal by simply undoing the
delay from the delayed signal as shown in equation 2.
argmax{
θ
n
n=0
n
m=0
||delayed{n} ref{n + m}||} = (2a)
argmax(
θ
n
n=0
||Cref,delayed[m]||) (2b)
Where n is the number of samples, m undo the delay
from the delayed signal and Cxy is the cross correlation
matrix between the two signals. Once we solved for m,
we can then translate this sample difference into time
difference in second and use equation one to find the
DOA. Basic is idea to use cross correlation and see when
we have a match between the two. The block diagram
for this simple time delay model is shown in figure 3.
Fig. 3. Block diagram for aligning to narrowband signals
3. SIGNAL MODEL
Beamforming is a powerful tool in array signal process-
ing. Beamforming can be thinking of as spatial filtering
for detecting and estimating the output of a sensor array
such that the SNR of other the signal is increased, or the
beampattern of the array is narrower and more accurate.
Different filters can be derived using the signal model
discussed below which is discussed in more details in
[9]. A simple beamforming system is shown in figure 4.
The signal recorded at each sensor can be represented
as,
yn(t) = gn(t) ∗ s(t) + vn(t) (3)
Where y is the recorded signal at each sensor, g is the
spatial response corresponding to the location of the
source s, the clean source signal and v is zero mean
Gaussian noise signal. In frequency domain equation 3
can be represented as,
Yn(k) = Gn(k)S(k) + Vn(k) = (4a)
d(k)X1(k) + V(k) (4b)
Where d is the time delay between the two signal in
frequency domain, and X1 is the recorded signal at the
reference microphone. The output of the beamformer
can be shown as following,
Z(k) = WH
Y(k) (5)
Where W are spatial filters, beamformer weights, and Z
is the output of the beamformer. Substituting equation
4.b in equation 5,
WH
[d(k)X1(k) + V(k)] = (6a)
X1,f (k) + Vrn(k) (6b)
Where,
X1, f = WH
(k)d(k)X1(k)
Vrn(k) = WH
(k)V(k)
Fig. 4. Beamformer
4. UNIFORM LINEAR ARRAY
There are variety of microphone arrays that can be used
based on its application e.g. circular arrays, spherical
arrays, biologically inspired arrays, ad-hoc arrays, etc.
Uniform linear arrays are usually used for comparing
Fig. 5. Uniform Linear Array
source localization algorithms due to their simple geom-
etry and the fact that they are cheap and commercially
available as shown in figure 5. We can represent the
signal at each microphone as,
ym(t) = si(t)
M
i=1
ej(m−1)µi
+ vm(t) (7a)
y = As(t) + v(t) (7b)
Where, M is the number of microphones, µi = −2π
λ lsin(θi)
is called the spatial frequency, l is the distance between
two adjacent microphone,λ is the wavelength of the
source signal, θ is the angle of arrival and A is called
the steering vectors, or spatial responses specific to the
array.
A = a(µ1) . . . a(µi) . . . a(µM) (8a)
=


1 1 . . . 1
ejµ1
ejµ2
. . . ejµd
...
...
...
...
ej(M−1)µ1
ej(M−1)µ2
. . . ej(M−1)µd


(8b)
Expanding one of the terms in matrix (8b) we have,
e
−2πjk(Fs)lsin(θ)
cN . Where k is the digital frequency, Fs is the
sampling rate, c is the speed of sound, and N is the
number of DFT points.
4.1. Beampattern
We can visualize the steering vectors of ULA in polar
plots over all angles, for any number elements, for a
specific frequency and an arbitrary input. This visual-
ization of steering vectors is called the beampattern. For
example, in a two microphone case from section II the
steering vectors are,
1 1 . . . 1
e−jωτ1
e−jωτ2
. . . e−jωτn
(9)
Where n depends on desired scanning resolutions, num-
ber of angles. In figure 6 we can see the steering vector
for different number of M’s and spacing l’s for 1kHz
and 4kHz. Note that since ULA is symmetric from back
and front and time delay model was used to derive the
steering vectors, the back and front of the ULA beam-
pattern are symmetric, and the array is unable to dis-
tinguish back sounds from front sound. In order to
fix this cone of confusion one can simply use direction
microphone that faces the front of the array, though di-
rectional microphone are usually more expensive and
lower in quality with respect to omnidirectional micro-
phones.
As the frequency increases, the main lobes in the beam-
pattern get narrower, however it create grading lobes
and also lobes that are as big as the main lobe, which
is called spatial aliasing and it is discussed in more de-
tail in the next section. Decreasing l help with avoiding
grading lobes, but it also widen the main lobe. As the
number of elements increases in an array, there is an
obvious advantage of narrower beam and smaller grad-
ing lobes. More microphones, however, requires more
space, power and cost.
Fig. 6. Beampattern
4.1.1. Spatial Aliasing
As discussed briefly in the previous subsection, spatial
aliasing is an artifact that directs the main beam to an-
other angle. Spatial aliasing is similar to aliasing. Alias-
ing happens when the bandwidth of the signal is more
than half the sampling rate and that results in overlap-
ping between spectral information in the signal. Spatial
aliasing happens when lowest wavelength (λ = c/f) of
the signal is less than half the spacing between adjacent
microphones which results in multiple main lobes in the
beampattern [10]. Figure 7 shows spatial aliasing over
all angles and frequencies for two microphones that are
22 cm apart.
Fig. 7. Spatial Aliasing
We can see that for lower frequencies, microphone
array has an omnidirectional response, there is no main
lobe. Spatial aliasing occurs at about 1600Hz and in-
creases as we go higher in frequency. Different designs
and techniques can significantly reduce the spatial alias-
ing. For example, increasing the distance between the
two microphones for higher frequencies, or in general
using frequency dependent spacing between two ele-
ments, rather than using uniform spacing [11].
5. SOUND SOURCE LOCALIZATION
In this section I will discuss three methods for localizing
a sound source. Two of which are based on beamform-
ing techniques: Delay and Sum beamformer and MVDR
beamformer. And, finally MUSIC which is a subspace
algorithm.
5.1. Beamforming
Basic concepts of beamforming were discussed in sec-
tion III. In this section we are going to develope two
types of filters for W which one is fixed and the other is
adaptive to follow the signal.
5.1.1. Delay And Sum
Delay and sum beamformer is a fixed beamformer. That
means, it does not adapt itself to the signal. The basic
idea behind delay and sum is to scan environments us-
ing the microphone array beampattern at every angle
and calculate when the output power of the signal is
at its maximum [13]. Looking back at equation 5 and
figure 4 the array output power is,
P(w) =
1
k
K
k=1
|Z(k)2
| = WH
RyyW (10)
Where Ryy = Y(k)YH
(k) and W = A(θ). Equation 10 can
be rewritten as,
a
θ
rgmaxP(θ) = a
θ
rgmax{A(θ)H
RyyA(θ)} (11)
5.1.2. MVDR
Minimum Variance Distortionless Response, MVDR,
also known as the Capon beamformer [12], is a delay
and sum beamformer with an additional constraint on
the output power.
WMVDR = a
w
rgminWH
RyyW s.t. WH
A(θ) = 1 (12)
Conceptually, this constraint means that the power at
the look direction of the source must be a unity gain
(g(θi) = 1) and then the power at every other angle
must be minimized accordingly. This is a minimization
problem that leads to the lagrangian below,
J(w, λ) = WH
RyyW + λ(WH
A(θ) − 1)(A(θ)H
W − 1) (13)
Taking the gradient of J with respect to λ and W as
derived in details in [14], we can find W and therefore
the output power of the beamformer as,
WMVDR(θ) =
RyyA(θ)
A(θ)R−1
yyA(θ)
(14)
PMVDR(θ) =
1
A(θ)R−1
yyA(θ)
(15)
This additional constraint on delay and sum beam-
former will reduce the distortion on the output power
while keeping the look angle maximum [16]. Note
that both delay and sum and MVDR require a good
estimation of the recorded signal covariance. In case of
MVDR, we need this covariance to be an invertible one
which would require at least as many observations as
the number of sensors.
5.2. Subspace algorithm
5.2.1. MUSIC
Multiple Signal Classification, MUSIC [14] , is a subspace
method for localizing sources. Looking back at equation
7.b we can define the following expressions,
Ryy =
1
N
N
n=1
y(n)yH
(n) =
1
N
YYH
(16a)
E{Ryy} = A(θ)RssAH
(θ) + σ2
NI (16b)
Where Rss = 1
N SSH
is the clean signal covariance and
we assume that we have access to N samples of the
recorded signals at a time. We can then use Eigen Value
Decomposition to decompose Ryy to signal and noise
subspaces.
Ryy = [Us Un]


λ1 . . . 0
...
...
...
0 . . . λM


UH
s
UH
n
(17)
Where Us is a signal space, Un is a noise subspace and
λ1 > λ2 > · · · > λM. It makes sense that the signal sub-
space spans the steering vector subspace, and that the
noise subspace is perpendicular to the steering vector
subspace as shown below.
span(Us) → span(A(θ)) (18a)
Un ⊥ A(θ) → UH
n A(θ) = 0 (18b)
We can use the orthognonality between the noise sub-
space and the array steering vectors to find the direction
arrival by defining the output power from MUSIC algo-
rithm as one of the following,
PMUSIC(θ) =
1
||UH
n A(θ)||
=
1
A(θ)HUnUH
n A(θ)
(19a)
PMUSIC(θ) =
A(θ)A(θ)H
A(θ)HUnUH
n A(θ)
(19b)
Equation (19.a) is known as the MUSIC Pseudo Spec-
trum, and (19.b) is known as the MUSIC Spatial Spec-
trum. The poles of either one of these equation points
to the direction of the signal source. One disadvantage
with MUSIC algorithm is that we need to determine the
number of sources that needs be detected an in advance
as well as having at least one extra sensor for the noise
subspace. That is with M sensor we can localize up to
M − 1 sources.
6. EXPERIMENT SETUP
I used PlayStation EYE as my uniform linear array. PS
EYE has 4 microphones inside which are about 2cm apart
with the sampling rate of 16kHz. I ran two sets of exper-
iments: (1) One source at about 15 degrees and two mi-
crophones. I used a plastic bag as my sound source, the
spectrogram is provided in figure9. The microphones
marked with circles are the two sensors I used for this
case. (2) I used two sources located at 15 and −25
degrees, a loud fan and a speech signal as my sound
sources and I used all four microphones in PS EYE to
record them. The spectrogram for these cases can also
be seen in figure 9. In each case, I used a ruler to ap-
proximately find the location of the sound source where
the right most microphones in the PS EYE is marked
-90 degrees and the left most microphone is marked +90
degrees.
Fig. 8. Playstation EYE
Fig. 9. Spectrogram for one and two sources scenarios
7. RESULTS
I evaluated all three algorithms from section V for the
two cases discussed in section VI and plot the power
from each algorithm over all angles defined in section
VI. The results for case 1 and 2 are shown in Figure 10
respectively.
Fig. 10. Output power for one and two sources scenarios
For one source scenario, all three algorithms were
able to detect the DOA correctly. Delay and Sum beam-
former was not able to minimize the grading lobes and
distortion in the signal. MVDR was able to improve that
by minimizing the variance of the distortion. MUSIC
algorithm localization looks almost as a delta response
with one peak at the source location. For the two sound
sources scenario, delay and sum and MVDR are not able
to resolve the resolution between the two sources. MU-
SIC, however, is able to localize the two sound sources.
As you can see, MUSIC is giving the best results for both
scenarios. MUSIC, however, is also very sensitive to the
frame of analysis. It is important to note that MUSIC
needs lots of frames to form a reasonably well defined
noise subspace. A quantitative way of evaluating these
algorithms is based on how narrow the localization ac-
curacy is, such as root mean square error (RMSE) [15].
RMSE =
1
k
K
k=1
(θestk
− θtruek
)2 (20)
Where, k is the number of blocks (group of frames).
RMSE Delay and Sum MVDR MUSIC
1st scenario 0.7035 0.1012 0.0851
2nd scenario 0.4992 0.4990 0.1903
RMSE is only one metric for comparing sound source
localization algorithms. One must also take noise and
reverberation into account when localizing a sound
source, e.g. is the algorithm robust enough to distin-
guish the direct sound from the reflections? Which
algorithm is able to create an output with higher SNR?
Etc.
8. APPENDIX
In this section, I have included the Matlab code for vi-
sualizing the beampattern for ULA and also the Mat-
lab codes for sound source localization algorithms dis-
cussed in section V.
8.1. Matlab Code for Beampattern for ULA
%% Ramin Anushiravani
% 11/24/14
% Linear Mic array
close all; clear all; clc
dis = 0.02;
fs = 9000;%48000;
fftPoint = 1024;
numfft = 1:1:fftPoint/2;
f = numfft*fs/fftPoint; %hertz
res = 1;
theta = -pi:res*pi/180:pi;
c = 345;
numMic =10;
for i = 1: numMic
SV(:,:,i) =(exp(1i*2*pi.*f'*(i-1)*dis*sin(theta)/c));
%delay and sum
end
Out = abs(sum(SV,3)/numMic);
for i = 1:10:fftPoint/2
% figure(1);subplot(1,2,1);
polar(theta,Out(i,:));
title(['Frequeny ' , ...
num2str(round(i*fs/fftPoint)), ' Hz']);
%subplot(1,2,2);plot(theta,20*log10(Out(i,:)));
%axis([-pi pi -20 1]);title('Beam Pattern');
pause
end
figure;imagesc(theta*180/pi,numfft*fs/fftPoint,Out);
xlabel('angle');ylabel('frequency-Hz');
title('Beam Pattern');axis xy
colormap hot
% %% Simulation signals.
% angle = pi/4;
% bin = 100;
% f1 = bin*fs/fftPoint;
%
% tdelay = dis*sin(angle)/c;
% L = fftPoint;
% t = (0:L-1)/fs;
%
% sig1 = sin(2*pi*f1*t);
% sig2 = fft(sig1).
%*exp(-1i*2*pi*([0:L/2 -L/2+1:-1])*tdelay*fs/L);
% fft is symmetric.
%
% sig3 = real(ifft(sig2));
% TT= [sig3; sig1];
% x = TT';
% audiowrite('sim.wav',TT',fs);
%
%
8.2. Sound Source Localization
%Ramin Anushiravani
% March 1st,14
clc;clear all; close all;
%% 11/24/14
numMic =2; % #mics
%% Theta
addpath('sounds551');
theta = -pi/2:pi/179:pi/2;
%if 0 to pi cos, if -pi/2 to pi/2 sin.
c = 345;
%% 2 chan
if numMic ==2
% d = 0.22*cos(theta) ;
% t = d/c; %time delay between mics
[sig fs]= audioread('mystery angle.wav');
end
%% 4 chan
if numMic ==4
% [sig1 fs] = audioread('cup-01.wav');
% [sig2 fs] = audioread('cup-02.wav');
% [sig3 fs] = audioread('cup-03.wav');
% [sig4 fs] = audioread('cup-04.wav');
[sig1 fs] = audioread('fan speaker-01.wav');
[sig2 fs] = audioread('fan speaker-02.wav');
[sig3 fs] = audioread('fan speaker-03.wav');
[sig4 fs] = audioread('fan speaker-04.wav');
Fs = 16000;
s4 = resample(sig4,Fs,fs);
s2 = resample(sig3,Fs,fs);
s3 = resample(sig2,Fs,fs);
s1 =resample(sig1,Fs,fs);
else
%% 2 chan
i = [1 11];
sig1a = sig(i(1):i(2)*fs,1);
sig2a = sig(i(1):i(2)*fs,2);
%get the signals first
%% 2 chan
s2 = sig2a;
%ch.2 is closer to the source.
%steering vectors was based on the ch.2 as ref.
s1 = sig1a;
end
%% ffts
fftPoint = 1024;
R = fftPoint;
L = R/4;
k = 1:1:fftPoint/2;
w = 2*pi.*(k-1)*fs/fftPoint;
%% FFT points
N = numMic;
numfft = 1:1:fftPoint/2;
f = numfft*fs/fftPoint;
%%
%% Steering Vectors
if numMic ==2
dis = 0.22;
% SV 4 chan
for i = 1: numMic
SV(:,:,i) =...
(exp(1i*2*pi.*f'*(i-1)*dis*sin(theta)/c));
end
SV1 = SV(:,:,1);
SV2 = SV(:,:,2);
%% SV 2 chan
for i = 1: fftPoint/2
for j = 1: length(theta)
SVt(i,j)= {[SV1(i,j); SV2(i,j)]};
%Steering vector for each freq and angle
end
end
else
%% distance between mics 4
dis = 0.02;
% SV 4 chan
for i = 1: numMic
SV(:,:,i) =...
(exp(1i*2*pi.*f'*(i-1)*dis*sin(theta)/c));
end
SV1 = SV(:,:,1);
SV2 = SV(:,:,2);
SV3 = SV(:,:,3);
SV4 = SV(:,:,4);
% SV 4
for i = 1: fftPoint/2
for j = 1: length(theta)
SVtt(i,j)= ...
{[SV1(i,j); SV2(i,j);SV3(i,j);SV4(i,j)]};
%Steering vector for each freq and angle
end
end
%
end
%% Beampattern
%% Steer 2
if numMic==2
for j = 1 : length(theta)
for i = 1 : fftPoint/2-1
steer (i,j) = abs(sum(SVt{i,j},1))/N;
%summing up steering vectors for all mics.
end
end
end
%% Steer 4
if numMic ==4
for j = 1 : length(theta)
for i = 1 : fftPoint/2-1
steer4 (i,j) = abs(sum(SVtt{i,j},1))/N;
%summing up steering vectors for all mics.
end
end
%% Plot beam pattern
% for i = 1 : fftPoint/2
% polar(theta,steer4(i,:)) ;
%title(['frequency
%' num2str(floor(i*(fs/2)/fftPoint)) ' Hz']);
%pause(0.01)
% end
end
%% STFT
if numMic ==2
[sig1, t1] = enframe(s1,hamming(R),L);
[sig2, t2] = enframe(s2,hamming(R),L);
for i = 1 : length(sig1(:,1))
Sig1(i,:) = fft(sig1(i,:),fftPoint);
Sig2(i,:) = fft(sig2(i,:),fftPoint);
end
Sig1 = Sig1(:,1:end/2-1);
Sig2 = Sig2(:,1:end/2-1);
end
%% 4 chan
if numMic==4
[sig1, t1] = enframe(s1,hamming(R),L);
[sig2, t2] = enframe(s2,hamming(R),L);
[sig3, t3] = enframe(s3,hamming(R),L);
[sig4, t4] = enframe(s4,hamming(R),L);
for i = 1 : length(sig1(:,1))
Sig1(i,:) = fft(sig1(i,:),fftPoint);
Sig2(i,:) = fft(sig2(i,:),fftPoint);
Sig3(i,:) = fft(sig3(i,:),fftPoint);
Sig4(i,:) = fft(sig4(i,:),fftPoint);
end
% take first half
Sig1 = Sig1(:,1:end/2-1);
Sig2 = Sig2(:,1:end/2-1);
Sig3 = Sig3(:,1:end/2-1);
Sig4 = Sig4(:,1:end/2-1);
end
%% making blocks out of frames 2 chan
if numMic==2
nframe =500;
n = [1 nframe];
for p = 1:floor(length(Sig1(:,1))/ nframe)
Sigb1(p,:) = {Sig1(n(1):n(2),:)};
Sigb2(p,:) = {Sig2(n(1):n(2),:)};
n = n + nframe;
end
for i = 1:length(Sigb1)
for k = 1: fftPoint/2-1
SIG(i,k) = {[Sigb1{i}(:,k),Sigb2{i}(:,k)]};
%we need to find the cov between each
%block for each signal at one frequency bin
%=> energy of the signal at the freq bin
end
end
% SIG is number of block times the
% number of frequency bins, each entry
% contains a cell, of all frames in
%each block for each signal in that
% specific freq bin.
end
%% making blocks out of frames 4 chan
if numMic==4
nframe =350;
n = [1 nframe];
for p = 1:floor(length(Sig1(:,1))/ nframe)
Sigb1(p,:) = {Sig1(n(1):n(2),:)};
Sigb2(p,:) = {Sig2(n(1):n(2),:)};
Sigb3(p,:) = {Sig3(n(1):n(2),:)};
Sigb4(p,:) = {Sig4(n(1):n(2),:)};
n = n + nframe;
end
for i = 1:length(Sigb1)
for k = 1: fftPoint/2-1
SIG(i,k) = ...
{[Sigb1{i}(:,k),Sigb2{i}(:,k),Sigb3{i}(:,k),Sigb4{i}(:,k)]};
%we need to find the cov between each block for each
%signal at one frequency bin =>
%energy of the signal at the freq bin
end
end
end
%% Rxx
for i =1 : length(SIG(:,1)) % Goes through frames
for j = 1: fftPoint/2-1 % goes through frequency bins
Rxx(i,j)= {(transpose(SIG{i,j})*conj(SIG{i,j}))};
% each cell represent the covariance
% for that frame in that freq bin (3*3)
end
end
%% delay and sum 2
if numMic==2
for k = 1: length(Rxx(:,1))
for i = 1: fftPoint/2-1
for j = 1: length(theta)
Power1(i,j,k) = ...
abs(SVt{i,j}'*(Rxx{k,i})*SVt{i,j});
end
end
end
%% capon 2
for k = 1: length(Rxx(:,1))
for i = 1: fftPoint/2-1
for j = 1: length(theta)
Power2(i,j,k) = ...
1/abs(SVt{i,j}'*pinv(Rxx{k,i})*SVt{i,j});
end
end
end
%% MUSIC 2
for k = 1: length(SIG(:,1))
for i = 1: fftPoint/2-1
[u e] = eigs(Rxx{k,i});
e diag = diag(e);
[e sort e idx] = sort(e diag,'descend');
u sort = u(:,e idx);
noise subspace = u sort(:,1);
% Defined based on the dimentions of Rxx
for j = 1: length(theta)
Power3(i,j,k) = 1/abs(SVt{i,j}'...
*(noise subspace*noise subspace')*SVt{i,j});
end
end
end
end
%% delay and sum 4
if numMic==4
for k = 1: length(Rxx(:,1))
for i = 1: fftPoint/2-1
for j = 1: length(theta)
Power1(i,j,k) = ...
abs(SVtt{i,j}'*(Rxx{k,i})*SVtt{i,j});
end
end
end
%% capon 4
for k = 1: length(Rxx(:,1))
for i = 1: fftPoint/2-1
for j = 1: length(theta)
Power2(i,j,k) = 1/abs(SVtt{i,j}'...
*pinv(Rxx{k,i})*SVtt{i,j});
end
end
end
%% MUSIC 4
for k = 1: length(SIG(:,1))
for i = 1: fftPoint/2-1
[u e] = eig(Rxx{k,i});
e diag = diag(e);
[e sort e idx] = sort(e diag,'descend');
u sort = u(:,e idx);
noise subspace = u sort(:,3:4);
% Defined based on the dimentions of Rxx
for j = 1: length(theta)
Power3(i,j,k) = (1/abs((SVtt{i,j}')...
*(noise subspace*noise subspace')*(SVtt{i,j})));
end
end
end
end
%% Power
for i = 1: length(SIG(:,1))
PowerSq1(i) = {squeeze(Power1(:,:,i))};
% power for each block
PowerSq2(i) = {squeeze(Power2(:,:,i))};
PowerSq3(i) = {squeeze(Power3(:,:,i))};
end
%
for i = 1: length(SIG(:,1))
sumPower1(i,:) = sum(PowerSq1{i},1);
% sum over all freq, max power at every angle
sumPower2(i,:) = sum(PowerSq2{i},1);
sumPower3(i,:) = sum(PowerSq3{i},1);
end
%% plot delay and sum and MVDR
figure(1);subplot(1,3,1);
plot((theta*180/pi),sumPower1)
;title('D and S'); xlabel('Angle'); ylabel('Power')
subplot(1,3,2);plot((theta*180/pi),sumPower2)
;title('MVDR'); xlabel('Angle'); ylabel('Power')
%% MUSIC
subplot(1,3,3); plot(((theta)*180/pi)+9,sumPower3)
;title('MUSIC'); xlabel('Angle'); ylabel('Power')
axis([-90 90 2 max(max(sumPower3))*1.2])
%% RMSE
if numMic ==2
x=mean(sumPower1,1);
[X ind] = max(x);
vec = [zeros(1,ind-1),...
X,zeros(1,size(sumPower1,2)-ind)];
er1= sqrt(norm(x-vec)/(norm(x)*numMic))
x=mean(sumPower2,1);
[X ind] = max(x);
vec = [zeros(1,ind-1),X,...
zeros(1,size(sumPower1,2)-ind)];
er2= sqrt(norm(x-vec)/(norm(x)*numMic))
x=mean(sumPower3,1);
[X ind] = max(x);
vec = [zeros(1,ind-1),X,...
zeros(1,size(sumPower1,2)-ind)];
er3= sqrt(norm(x-vec)/(norm(x)*numMic))
else
x=mean(sumPower1,1);
[X ind] = max(x);
vec = [zeros(1,ind-1),X,...
zeros(1,size(sumPower1,2)-ind)];
er1= sqrt(norm(x-vec)/(norm(x)*numMic))
x=mean(sumPower2,1);
[X ind] = max(x);
vec = [zeros(1,ind-1),X,...
zeros(1,size(sumPower1,2)-ind)];
er2= sqrt(norm(x-vec)/(norm(x)*numMic))
x=mean(sumPower3,1);
[X ind] = findpeaks(x,'NPeaks',4) ;
vec = [zeros(1,ind(2)-1),X(2)...
,zeros(1,ind(4)-ind(2)-1),X(4),...
zeros(1,size(sumPower1,2)-ind(4))];
er3= sqrt(norm(x-vec)/(norm(x)*numMic))
end
9. REFERENCES
1 Farrell, K.; Mammone, R.; Flanagan, J.L., ”Beam-
forming microphone arrays for speech enhancement,”
Acoustics, Speech, andSignalProcessing, 1992. ICASSP-
92., 1992 IEEE International Conference on , vol.1, no.,
pp.285,288 vol.1, 23-26 Mar 1992
2 Cauchi, B , Joint Dereverberation and noise reducion
using Beamforming and single-channel speech enhance-
ment scheme , Reverb Challenge 214
3 Habets, E.A.P.; Benesty, J., ”A Two-Stage Beamforming
Approach for Noise Reduction and Dereverberation,”
Audio, Speech, and Language Processing, IEEE Trans-
actions on , vol.21, no.5, pp.945,958, May 2013
4 Speech enhancement with multichannel Wiener filter
techniques in multimicrophone binaural hearing aids
Van den Bogaert, Tim and Doclo, Simon and Wouters,
Jan and Moonen, Marc, The Journal of the Acoustical
Society of America, 125, 360-371 (2009)
5 Antnio L. L, R , Delay-and-sum beamforming for direc-
tion of arrival estimation applied to gunshot acoustics ,
Proceedings of the SPIE
6 Shengkui , Z , 3D BINAURAL AUDIO CAPTURE
AND REPRODUCTION USING A MINIATURE MI-
CROPHONE ARRAY , Conference on Digital Audio
Effects
7 Miles, Q. Su, W. Cui, and M. Shetye, R , A low-
noise differential microphone inspired by the ears of the
parasitoid fly Ormia ochracea , Acoustic Society
8 Sound source localization inspired by the ears of
the Ormia ochracea Kuntzman, Michael L. and Hall,
Neal A., Applied Physics Letters, 105, 033701 (2014)
9 Benesty, Jacek P. Dmochowski , Microphone Arrays:
Fundamental Concepts , Springer
10 Iain, M , A Microphone Array Tutorial
11 Greensted, A , ”Delay Sum Beamforming”.Retrieved
January , 2012
12 J. Capon. High-resolution frequency-wavenumber
spectrum analysis. Proc. IEEE, 57(8), 14081418 (1969).
13 Bhuiya, F. Islam, M , Analysis of Direction of Arrival
Techniques Using Uniform Linear Array , International
Journal of Computer Theory and Engineering
14 Kawitkar, R , Performance of Different Types of
Array Structures Based on Multiple Signal Classifica-
tion (MUSIC) algorithm, International Conference on
MEMS NANO, and Smart Systems
15 Richter, I , Spatial Filtering and DoA Estimation
MVDR Beamformer and MUSIC Algorithm , Sensor
Array Signal Processing
16 S P. Boyd, R , ROBUST MINIMUM VARIANCE
BEAMFORMING

Contenu connexe

Tendances

Acoustic echo cancellation
Acoustic echo cancellationAcoustic echo cancellation
Acoustic echo cancellation
chintanajoshi
 

Tendances (20)

Radar 2009 a 11 waveforms and pulse compression
Radar 2009 a 11 waveforms and pulse compressionRadar 2009 a 11 waveforms and pulse compression
Radar 2009 a 11 waveforms and pulse compression
 
Noise in AM systems.ppt
Noise in AM systems.pptNoise in AM systems.ppt
Noise in AM systems.ppt
 
Noise
NoiseNoise
Noise
 
ISI & niquist Criterion.pptx
ISI & niquist Criterion.pptxISI & niquist Criterion.pptx
ISI & niquist Criterion.pptx
 
Antennas slideshare part 2
Antennas slideshare part 2Antennas slideshare part 2
Antennas slideshare part 2
 
NYQUIST CRITERION FOR ZERO ISI
NYQUIST CRITERION FOR ZERO ISINYQUIST CRITERION FOR ZERO ISI
NYQUIST CRITERION FOR ZERO ISI
 
fading channels
 fading channels fading channels
fading channels
 
Magic tee
Magic tee  Magic tee
Magic tee
 
Chapter 3- pulsed radar system and MTI
Chapter 3- pulsed radar system and MTIChapter 3- pulsed radar system and MTI
Chapter 3- pulsed radar system and MTI
 
Acoustic echo cancellation
Acoustic echo cancellationAcoustic echo cancellation
Acoustic echo cancellation
 
10. types of small scale fading
10. types of small scale fading10. types of small scale fading
10. types of small scale fading
 
Aliasing and Antialiasing filter
Aliasing and Antialiasing filterAliasing and Antialiasing filter
Aliasing and Antialiasing filter
 
Image Denoising Using Wavelet
Image Denoising Using WaveletImage Denoising Using Wavelet
Image Denoising Using Wavelet
 
Decimation and Interpolation
Decimation and InterpolationDecimation and Interpolation
Decimation and Interpolation
 
Matched filter
Matched filterMatched filter
Matched filter
 
Radar Systems- Unit-III : MTI and Pulse Doppler Radars
Radar Systems- Unit-III : MTI and Pulse Doppler RadarsRadar Systems- Unit-III : MTI and Pulse Doppler Radars
Radar Systems- Unit-III : MTI and Pulse Doppler Radars
 
Beamforming antennas (1)
Beamforming antennas (1)Beamforming antennas (1)
Beamforming antennas (1)
 
Overlap Add, Overlap Save(digital signal processing)
Overlap Add, Overlap Save(digital signal processing)Overlap Add, Overlap Save(digital signal processing)
Overlap Add, Overlap Save(digital signal processing)
 
Direction of arrival estimation using music algorithm
Direction of arrival estimation using music algorithmDirection of arrival estimation using music algorithm
Direction of arrival estimation using music algorithm
 
Noise
NoiseNoise
Noise
 

En vedette

Modals, revision 12, Prepared by LORETA VAINAUSKIENE, Kruonis Gymnasium, Lith...
Modals, revision 12, Prepared by LORETA VAINAUSKIENE, Kruonis Gymnasium, Lith...Modals, revision 12, Prepared by LORETA VAINAUSKIENE, Kruonis Gymnasium, Lith...
Modals, revision 12, Prepared by LORETA VAINAUSKIENE, Kruonis Gymnasium, Lith...
Loreta Vainauskiene
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Lior Rokach
 

En vedette (20)

FPGA Based Acoustic Source Localization Project
FPGA Based Acoustic Source Localization ProjectFPGA Based Acoustic Source Localization Project
FPGA Based Acoustic Source Localization Project
 
Sound source localization
Sound source localizationSound source localization
Sound source localization
 
Kinect Microphone Array case study
Kinect Microphone Array case studyKinect Microphone Array case study
Kinect Microphone Array case study
 
recommender_systems
recommender_systemsrecommender_systems
recommender_systems
 
Food Preservation for Farming Communities in Nepal: A Low Cost Engineering So...
Food Preservation for Farming Communities in Nepal: A Low Cost Engineering So...Food Preservation for Farming Communities in Nepal: A Low Cost Engineering So...
Food Preservation for Farming Communities in Nepal: A Low Cost Engineering So...
 
Audioprocessing
AudioprocessingAudioprocessing
Audioprocessing
 
Immersive audio rendering for interactive complex virtual architectural envir...
Immersive audio rendering for interactive complex virtual architectural envir...Immersive audio rendering for interactive complex virtual architectural envir...
Immersive audio rendering for interactive complex virtual architectural envir...
 
Track 1 session 3 - st dev con 2016 - smart home and building
Track 1   session 3 - st dev con 2016 - smart home and buildingTrack 1   session 3 - st dev con 2016 - smart home and building
Track 1 session 3 - st dev con 2016 - smart home and building
 
Final research
Final research Final research
Final research
 
IDCC 1536 Salaires dans la CCN des distributeurs conseils hors domicile
IDCC 1536 Salaires dans la CCN des distributeurs conseils hors domicile IDCC 1536 Salaires dans la CCN des distributeurs conseils hors domicile
IDCC 1536 Salaires dans la CCN des distributeurs conseils hors domicile
 
Modals, revision 12, Prepared by LORETA VAINAUSKIENE, Kruonis Gymnasium, Lith...
Modals, revision 12, Prepared by LORETA VAINAUSKIENE, Kruonis Gymnasium, Lith...Modals, revision 12, Prepared by LORETA VAINAUSKIENE, Kruonis Gymnasium, Lith...
Modals, revision 12, Prepared by LORETA VAINAUSKIENE, Kruonis Gymnasium, Lith...
 
IDCC 2335 Avenant sur les salaires minima
IDCC 2335 Avenant sur les salaires minima IDCC 2335 Avenant sur les salaires minima
IDCC 2335 Avenant sur les salaires minima
 
Curriculum
CurriculumCurriculum
Curriculum
 
La reforme des_minima_sociaux_se_met_en_place
La reforme des_minima_sociaux_se_met_en_placeLa reforme des_minima_sociaux_se_met_en_place
La reforme des_minima_sociaux_se_met_en_place
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Localization of brain lesion by Prof Dr Bashir Ahmed Dar Sopore Kashmir
Localization of brain lesion by Prof Dr Bashir Ahmed Dar Sopore KashmirLocalization of brain lesion by Prof Dr Bashir Ahmed Dar Sopore Kashmir
Localization of brain lesion by Prof Dr Bashir Ahmed Dar Sopore Kashmir
 
Deep Learning for Recommender Systems - Budapest RecSys Meetup
Deep Learning for Recommender Systems  - Budapest RecSys MeetupDeep Learning for Recommender Systems  - Budapest RecSys Meetup
Deep Learning for Recommender Systems - Budapest RecSys Meetup
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Exposé sur le sommeil et EEG
Exposé sur le sommeil et EEGExposé sur le sommeil et EEG
Exposé sur le sommeil et EEG
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
 

Similaire à Sound Source Localization with microphone arrays

Catalogue 2013 en
Catalogue 2013 enCatalogue 2013 en
Catalogue 2013 en
Guy Crt
 
Handling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive TrajectoriesHandling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive Trajectories
Matthieu Hodgkinson
 

Similaire à Sound Source Localization with microphone arrays (20)

Dsp final report
Dsp final reportDsp final report
Dsp final report
 
Direction of Arrival Estimation Based on MUSIC Algorithm Using Uniform and No...
Direction of Arrival Estimation Based on MUSIC Algorithm Using Uniform and No...Direction of Arrival Estimation Based on MUSIC Algorithm Using Uniform and No...
Direction of Arrival Estimation Based on MUSIC Algorithm Using Uniform and No...
 
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
 
Catalogue 2013 en
Catalogue 2013 enCatalogue 2013 en
Catalogue 2013 en
 
Automatic mic adjustment using dc motor
Automatic mic adjustment using dc motorAutomatic mic adjustment using dc motor
Automatic mic adjustment using dc motor
 
Fading models text
Fading models textFading models text
Fading models text
 
Improvement of Fading Channel Modeling Performance for Wireless Channel
Improvement of Fading Channel Modeling Performance for Wireless Channel Improvement of Fading Channel Modeling Performance for Wireless Channel
Improvement of Fading Channel Modeling Performance for Wireless Channel
 
Stability of Target Resonance Modes: Ina Quadrature Polarization Context
Stability of Target Resonance Modes: Ina Quadrature Polarization ContextStability of Target Resonance Modes: Ina Quadrature Polarization Context
Stability of Target Resonance Modes: Ina Quadrature Polarization Context
 
A time domain clean approach for the identification of acoustic moving source...
A time domain clean approach for the identification of acoustic moving source...A time domain clean approach for the identification of acoustic moving source...
A time domain clean approach for the identification of acoustic moving source...
 
IR UWB TOA Estimation Techniques and Comparison
IR UWB TOA Estimation Techniques and ComparisonIR UWB TOA Estimation Techniques and Comparison
IR UWB TOA Estimation Techniques and Comparison
 
Performance Analysis of Adaptive DOA Estimation Algorithms For Mobile Applica...
Performance Analysis of Adaptive DOA Estimation Algorithms For Mobile Applica...Performance Analysis of Adaptive DOA Estimation Algorithms For Mobile Applica...
Performance Analysis of Adaptive DOA Estimation Algorithms For Mobile Applica...
 
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...
 
Sierpinski fractal circular antenna
Sierpinski fractal circular antennaSierpinski fractal circular antenna
Sierpinski fractal circular antenna
 
L shaped slot loaded semicircular patch antenna for wideband operation
L shaped slot loaded semicircular patch antenna for wideband operation L shaped slot loaded semicircular patch antenna for wideband operation
L shaped slot loaded semicircular patch antenna for wideband operation
 
An Ultrasound Image Despeckling Approach Based on Principle Component Analysis
An Ultrasound Image Despeckling Approach Based on Principle Component AnalysisAn Ultrasound Image Despeckling Approach Based on Principle Component Analysis
An Ultrasound Image Despeckling Approach Based on Principle Component Analysis
 
Directional omni-directional antennas
Directional omni-directional antennasDirectional omni-directional antennas
Directional omni-directional antennas
 
U0 vqmt qxodk=
U0 vqmt qxodk=U0 vqmt qxodk=
U0 vqmt qxodk=
 
F04924352
F04924352F04924352
F04924352
 
Error Rate Performance of Interleaved Coded OFDM For Undersea Acoustic Links
Error Rate Performance of Interleaved Coded OFDM For Undersea Acoustic LinksError Rate Performance of Interleaved Coded OFDM For Undersea Acoustic Links
Error Rate Performance of Interleaved Coded OFDM For Undersea Acoustic Links
 
Handling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive TrajectoriesHandling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive Trajectories
 

Plus de Ramin Anushiravani (7)

Techfest jan17
Techfest jan17Techfest jan17
Techfest jan17
 
3D audio
3D audio3D audio
3D audio
 
3D Audio playback for single channel audio using visual cues
3D Audio playback for single channel audio using visual cues3D Audio playback for single channel audio using visual cues
3D Audio playback for single channel audio using visual cues
 
A computer vision approach to speech enhancement
A computer vision approach to speech enhancementA computer vision approach to speech enhancement
A computer vision approach to speech enhancement
 
Poster cs543
Poster cs543Poster cs543
Poster cs543
 
3D Spatial Response
3D Spatial Response3D Spatial Response
3D Spatial Response
 
example based audio editing
example based audio editingexample based audio editing
example based audio editing
 

Dernier

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
rknatarajan
 

Dernier (20)

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 

Sound Source Localization with microphone arrays

  • 1. SOUND SOURCE LOCALIZATION WITH MICROPHONE ARRAYS Ramin Anushiravani Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign ABSTRACT In this paper I discuss three basic and important meth- ods for finding the direction of arrival (DOA) in a far field environment for sound sources. The first two ap- proaches are based on Beamforming techniques: Delay and Sum Beamformer and Minimum Variance Distor- tionless Response Beamformer (MVDR). The third ap- proach is a subspace method that uses the well-known algorithm, Multiple Signal Classification (MUSIC). I demonstrated the accuracy of each algorithm by local- izing sound sources in an office environment using a uniform linear array (ULA). Index Terms— direction of arrival, Beamforming, subspace method, uniform linear array 1. INTRODUCTION Sound source localization has many applications in speech enhancement such as speech denoising and dereverberation [1-3] by forming a beam toward the speaker and therefore reducing the noise reverberation from other directions. This is especially useful for hear- ing aids industry [4]. Sound source localization can also be used for surveillance purposes such as finding the direction of gun shots for in public places by scanning the environment for sudden sound activities and classi- fying sounds to one of the available database [5]. More recently, sound source localization techniques has been used to reconstruct spatial audio for the purpose of en- tertainment and improving teleconferencing experience [6]. In order to motivate the background behind the DOA, I first discuss how living creatures localize sound sources and then represent a simple time delay model for localizing sound sources. In section III, I discuss the general signal model for beamforming techniques. In section VI, I talk about uniform linear arrays and specif- ically two of their characteristics, spatial aliasing and beampattern. In section V; I discuss three algorithms for sound source localization, delay and sum, MVDR and MUSIC. In section VI, I discuss the experiment setup and finally in section VII I evaluate the results for the algorithms mentioned in section V. 2. A MODEL FOR SOUND LOCALIZATION In a crowded party, we can focus and form a beam toward the listener and therefore enhance the speech quality of the speaker, but how? Human beings use variety of information from the sound source and environment for localization. For example, if a sound is located closer to the right ear, it reaches it earlier than the left ear and it will also be big- ger in amplitude due to head shadow effect. Brain uses the time delay and level difference cues between the two to localize the sound as shown in figure 1. However, if a sound is coming from a back or front of the head, they both will have similar time delay and level difference. That’s when spectral information such as pinna shape, shoulders, hair and even one’s clothing will help the listener localize where the sound is coming from. In addition to human, animals, insects and even parasites also have some sort of source localization systems. An interesting case is Ormia, a small fly that eat off crickets. Ormia’s localization system is quite extraordinary and many scientists have been trying to model microphones and hearing of its sound localization systems [7]. The distance between Ormia’s ears is of micrometer and so the time and the level difference between the two ears is too small to help the parasite localize sound. Ormia’s ears, however, communicate with each other in a complex manner and that help increasing the resolu- tion of the localization by almost 20 times its physical constraint [8]. 2.1. Time Delay Model As discussed earlier, time delay is an important cue for localizing sound. We can build an exaggerated simpli- fied model for sound localization based of time delay for human ears as shown in figure 2. The assumption on a time delay model is that, the listener does not have a head (there is no level difference cues or spectral information), only two ears distant from each other by the size of the head (about 22 cm). Since in most practical scenario the sound source is in far field, we can assume that sound waves are parallel to each
  • 2. Fig. 1. Time delay and level difference cues for sound source localization Fig. 2. A time delay model for source localization other from microphones point of view. One of the mi- crophones (ears) can then be fixed as a reference, and the time delay for the other microphone is calculated using geometry. The time delay of arrival can be represented as the angle of arrival as shown in equation 1. τ = dsin(θ) c (1) In frequency domain time delay is represented as, e−jωτ , where ω is the freuency of the source signal. Speech signal are broadband signals, and so this nar- rowband assumption will not work in practice. Given a reference signal and a delayed signal, one can localize a narrowband signal by simply undoing the delay from the delayed signal as shown in equation 2. argmax{ θ n n=0 n m=0 ||delayed{n} ref{n + m}||} = (2a) argmax( θ n n=0 ||Cref,delayed[m]||) (2b) Where n is the number of samples, m undo the delay from the delayed signal and Cxy is the cross correlation matrix between the two signals. Once we solved for m, we can then translate this sample difference into time difference in second and use equation one to find the DOA. Basic is idea to use cross correlation and see when we have a match between the two. The block diagram for this simple time delay model is shown in figure 3. Fig. 3. Block diagram for aligning to narrowband signals 3. SIGNAL MODEL Beamforming is a powerful tool in array signal process- ing. Beamforming can be thinking of as spatial filtering for detecting and estimating the output of a sensor array such that the SNR of other the signal is increased, or the beampattern of the array is narrower and more accurate. Different filters can be derived using the signal model discussed below which is discussed in more details in [9]. A simple beamforming system is shown in figure 4. The signal recorded at each sensor can be represented as, yn(t) = gn(t) ∗ s(t) + vn(t) (3)
  • 3. Where y is the recorded signal at each sensor, g is the spatial response corresponding to the location of the source s, the clean source signal and v is zero mean Gaussian noise signal. In frequency domain equation 3 can be represented as, Yn(k) = Gn(k)S(k) + Vn(k) = (4a) d(k)X1(k) + V(k) (4b) Where d is the time delay between the two signal in frequency domain, and X1 is the recorded signal at the reference microphone. The output of the beamformer can be shown as following, Z(k) = WH Y(k) (5) Where W are spatial filters, beamformer weights, and Z is the output of the beamformer. Substituting equation 4.b in equation 5, WH [d(k)X1(k) + V(k)] = (6a) X1,f (k) + Vrn(k) (6b) Where, X1, f = WH (k)d(k)X1(k) Vrn(k) = WH (k)V(k) Fig. 4. Beamformer 4. UNIFORM LINEAR ARRAY There are variety of microphone arrays that can be used based on its application e.g. circular arrays, spherical arrays, biologically inspired arrays, ad-hoc arrays, etc. Uniform linear arrays are usually used for comparing Fig. 5. Uniform Linear Array source localization algorithms due to their simple geom- etry and the fact that they are cheap and commercially available as shown in figure 5. We can represent the signal at each microphone as, ym(t) = si(t) M i=1 ej(m−1)µi + vm(t) (7a) y = As(t) + v(t) (7b) Where, M is the number of microphones, µi = −2π λ lsin(θi) is called the spatial frequency, l is the distance between two adjacent microphone,λ is the wavelength of the source signal, θ is the angle of arrival and A is called the steering vectors, or spatial responses specific to the array. A = a(µ1) . . . a(µi) . . . a(µM) (8a) =   1 1 . . . 1 ejµ1 ejµ2 . . . ejµd ... ... ... ... ej(M−1)µ1 ej(M−1)µ2 . . . ej(M−1)µd   (8b) Expanding one of the terms in matrix (8b) we have, e −2πjk(Fs)lsin(θ) cN . Where k is the digital frequency, Fs is the sampling rate, c is the speed of sound, and N is the number of DFT points. 4.1. Beampattern We can visualize the steering vectors of ULA in polar plots over all angles, for any number elements, for a specific frequency and an arbitrary input. This visual- ization of steering vectors is called the beampattern. For example, in a two microphone case from section II the steering vectors are, 1 1 . . . 1 e−jωτ1 e−jωτ2 . . . e−jωτn (9) Where n depends on desired scanning resolutions, num- ber of angles. In figure 6 we can see the steering vector
  • 4. for different number of M’s and spacing l’s for 1kHz and 4kHz. Note that since ULA is symmetric from back and front and time delay model was used to derive the steering vectors, the back and front of the ULA beam- pattern are symmetric, and the array is unable to dis- tinguish back sounds from front sound. In order to fix this cone of confusion one can simply use direction microphone that faces the front of the array, though di- rectional microphone are usually more expensive and lower in quality with respect to omnidirectional micro- phones. As the frequency increases, the main lobes in the beam- pattern get narrower, however it create grading lobes and also lobes that are as big as the main lobe, which is called spatial aliasing and it is discussed in more de- tail in the next section. Decreasing l help with avoiding grading lobes, but it also widen the main lobe. As the number of elements increases in an array, there is an obvious advantage of narrower beam and smaller grad- ing lobes. More microphones, however, requires more space, power and cost. Fig. 6. Beampattern 4.1.1. Spatial Aliasing As discussed briefly in the previous subsection, spatial aliasing is an artifact that directs the main beam to an- other angle. Spatial aliasing is similar to aliasing. Alias- ing happens when the bandwidth of the signal is more than half the sampling rate and that results in overlap- ping between spectral information in the signal. Spatial aliasing happens when lowest wavelength (λ = c/f) of the signal is less than half the spacing between adjacent microphones which results in multiple main lobes in the beampattern [10]. Figure 7 shows spatial aliasing over all angles and frequencies for two microphones that are 22 cm apart. Fig. 7. Spatial Aliasing We can see that for lower frequencies, microphone array has an omnidirectional response, there is no main lobe. Spatial aliasing occurs at about 1600Hz and in- creases as we go higher in frequency. Different designs and techniques can significantly reduce the spatial alias- ing. For example, increasing the distance between the two microphones for higher frequencies, or in general using frequency dependent spacing between two ele- ments, rather than using uniform spacing [11]. 5. SOUND SOURCE LOCALIZATION In this section I will discuss three methods for localizing a sound source. Two of which are based on beamform- ing techniques: Delay and Sum beamformer and MVDR beamformer. And, finally MUSIC which is a subspace algorithm. 5.1. Beamforming Basic concepts of beamforming were discussed in sec- tion III. In this section we are going to develope two types of filters for W which one is fixed and the other is adaptive to follow the signal. 5.1.1. Delay And Sum Delay and sum beamformer is a fixed beamformer. That means, it does not adapt itself to the signal. The basic
  • 5. idea behind delay and sum is to scan environments us- ing the microphone array beampattern at every angle and calculate when the output power of the signal is at its maximum [13]. Looking back at equation 5 and figure 4 the array output power is, P(w) = 1 k K k=1 |Z(k)2 | = WH RyyW (10) Where Ryy = Y(k)YH (k) and W = A(θ). Equation 10 can be rewritten as, a θ rgmaxP(θ) = a θ rgmax{A(θ)H RyyA(θ)} (11) 5.1.2. MVDR Minimum Variance Distortionless Response, MVDR, also known as the Capon beamformer [12], is a delay and sum beamformer with an additional constraint on the output power. WMVDR = a w rgminWH RyyW s.t. WH A(θ) = 1 (12) Conceptually, this constraint means that the power at the look direction of the source must be a unity gain (g(θi) = 1) and then the power at every other angle must be minimized accordingly. This is a minimization problem that leads to the lagrangian below, J(w, λ) = WH RyyW + λ(WH A(θ) − 1)(A(θ)H W − 1) (13) Taking the gradient of J with respect to λ and W as derived in details in [14], we can find W and therefore the output power of the beamformer as, WMVDR(θ) = RyyA(θ) A(θ)R−1 yyA(θ) (14) PMVDR(θ) = 1 A(θ)R−1 yyA(θ) (15) This additional constraint on delay and sum beam- former will reduce the distortion on the output power while keeping the look angle maximum [16]. Note that both delay and sum and MVDR require a good estimation of the recorded signal covariance. In case of MVDR, we need this covariance to be an invertible one which would require at least as many observations as the number of sensors. 5.2. Subspace algorithm 5.2.1. MUSIC Multiple Signal Classification, MUSIC [14] , is a subspace method for localizing sources. Looking back at equation 7.b we can define the following expressions, Ryy = 1 N N n=1 y(n)yH (n) = 1 N YYH (16a) E{Ryy} = A(θ)RssAH (θ) + σ2 NI (16b) Where Rss = 1 N SSH is the clean signal covariance and we assume that we have access to N samples of the recorded signals at a time. We can then use Eigen Value Decomposition to decompose Ryy to signal and noise subspaces. Ryy = [Us Un]   λ1 . . . 0 ... ... ... 0 . . . λM   UH s UH n (17) Where Us is a signal space, Un is a noise subspace and λ1 > λ2 > · · · > λM. It makes sense that the signal sub- space spans the steering vector subspace, and that the noise subspace is perpendicular to the steering vector subspace as shown below. span(Us) → span(A(θ)) (18a) Un ⊥ A(θ) → UH n A(θ) = 0 (18b) We can use the orthognonality between the noise sub- space and the array steering vectors to find the direction arrival by defining the output power from MUSIC algo- rithm as one of the following, PMUSIC(θ) = 1 ||UH n A(θ)|| = 1 A(θ)HUnUH n A(θ) (19a) PMUSIC(θ) = A(θ)A(θ)H A(θ)HUnUH n A(θ) (19b) Equation (19.a) is known as the MUSIC Pseudo Spec- trum, and (19.b) is known as the MUSIC Spatial Spec- trum. The poles of either one of these equation points to the direction of the signal source. One disadvantage with MUSIC algorithm is that we need to determine the number of sources that needs be detected an in advance as well as having at least one extra sensor for the noise subspace. That is with M sensor we can localize up to M − 1 sources. 6. EXPERIMENT SETUP I used PlayStation EYE as my uniform linear array. PS EYE has 4 microphones inside which are about 2cm apart with the sampling rate of 16kHz. I ran two sets of exper- iments: (1) One source at about 15 degrees and two mi- crophones. I used a plastic bag as my sound source, the spectrogram is provided in figure9. The microphones
  • 6. marked with circles are the two sensors I used for this case. (2) I used two sources located at 15 and −25 degrees, a loud fan and a speech signal as my sound sources and I used all four microphones in PS EYE to record them. The spectrogram for these cases can also be seen in figure 9. In each case, I used a ruler to ap- proximately find the location of the sound source where the right most microphones in the PS EYE is marked -90 degrees and the left most microphone is marked +90 degrees. Fig. 8. Playstation EYE Fig. 9. Spectrogram for one and two sources scenarios 7. RESULTS I evaluated all three algorithms from section V for the two cases discussed in section VI and plot the power from each algorithm over all angles defined in section VI. The results for case 1 and 2 are shown in Figure 10 respectively. Fig. 10. Output power for one and two sources scenarios For one source scenario, all three algorithms were able to detect the DOA correctly. Delay and Sum beam- former was not able to minimize the grading lobes and distortion in the signal. MVDR was able to improve that by minimizing the variance of the distortion. MUSIC algorithm localization looks almost as a delta response with one peak at the source location. For the two sound sources scenario, delay and sum and MVDR are not able to resolve the resolution between the two sources. MU- SIC, however, is able to localize the two sound sources. As you can see, MUSIC is giving the best results for both scenarios. MUSIC, however, is also very sensitive to the frame of analysis. It is important to note that MUSIC needs lots of frames to form a reasonably well defined noise subspace. A quantitative way of evaluating these algorithms is based on how narrow the localization ac- curacy is, such as root mean square error (RMSE) [15]. RMSE = 1 k K k=1 (θestk − θtruek )2 (20) Where, k is the number of blocks (group of frames). RMSE Delay and Sum MVDR MUSIC 1st scenario 0.7035 0.1012 0.0851 2nd scenario 0.4992 0.4990 0.1903 RMSE is only one metric for comparing sound source localization algorithms. One must also take noise and
  • 7. reverberation into account when localizing a sound source, e.g. is the algorithm robust enough to distin- guish the direct sound from the reflections? Which algorithm is able to create an output with higher SNR? Etc. 8. APPENDIX In this section, I have included the Matlab code for vi- sualizing the beampattern for ULA and also the Mat- lab codes for sound source localization algorithms dis- cussed in section V. 8.1. Matlab Code for Beampattern for ULA %% Ramin Anushiravani % 11/24/14 % Linear Mic array close all; clear all; clc dis = 0.02; fs = 9000;%48000; fftPoint = 1024; numfft = 1:1:fftPoint/2; f = numfft*fs/fftPoint; %hertz res = 1; theta = -pi:res*pi/180:pi; c = 345; numMic =10; for i = 1: numMic SV(:,:,i) =(exp(1i*2*pi.*f'*(i-1)*dis*sin(theta)/c)); %delay and sum end Out = abs(sum(SV,3)/numMic); for i = 1:10:fftPoint/2 % figure(1);subplot(1,2,1); polar(theta,Out(i,:)); title(['Frequeny ' , ... num2str(round(i*fs/fftPoint)), ' Hz']); %subplot(1,2,2);plot(theta,20*log10(Out(i,:))); %axis([-pi pi -20 1]);title('Beam Pattern'); pause end figure;imagesc(theta*180/pi,numfft*fs/fftPoint,Out); xlabel('angle');ylabel('frequency-Hz'); title('Beam Pattern');axis xy colormap hot % %% Simulation signals. % angle = pi/4; % bin = 100; % f1 = bin*fs/fftPoint; % % tdelay = dis*sin(angle)/c; % L = fftPoint; % t = (0:L-1)/fs; % % sig1 = sin(2*pi*f1*t); % sig2 = fft(sig1). %*exp(-1i*2*pi*([0:L/2 -L/2+1:-1])*tdelay*fs/L); % fft is symmetric. % % sig3 = real(ifft(sig2)); % TT= [sig3; sig1]; % x = TT'; % audiowrite('sim.wav',TT',fs);
  • 8. % % 8.2. Sound Source Localization %Ramin Anushiravani % March 1st,14 clc;clear all; close all; %% 11/24/14 numMic =2; % #mics %% Theta addpath('sounds551'); theta = -pi/2:pi/179:pi/2; %if 0 to pi cos, if -pi/2 to pi/2 sin. c = 345; %% 2 chan if numMic ==2 % d = 0.22*cos(theta) ; % t = d/c; %time delay between mics [sig fs]= audioread('mystery angle.wav'); end %% 4 chan if numMic ==4 % [sig1 fs] = audioread('cup-01.wav'); % [sig2 fs] = audioread('cup-02.wav'); % [sig3 fs] = audioread('cup-03.wav'); % [sig4 fs] = audioread('cup-04.wav'); [sig1 fs] = audioread('fan speaker-01.wav'); [sig2 fs] = audioread('fan speaker-02.wav'); [sig3 fs] = audioread('fan speaker-03.wav'); [sig4 fs] = audioread('fan speaker-04.wav'); Fs = 16000; s4 = resample(sig4,Fs,fs); s2 = resample(sig3,Fs,fs); s3 = resample(sig2,Fs,fs); s1 =resample(sig1,Fs,fs); else %% 2 chan i = [1 11]; sig1a = sig(i(1):i(2)*fs,1); sig2a = sig(i(1):i(2)*fs,2); %get the signals first %% 2 chan s2 = sig2a; %ch.2 is closer to the source. %steering vectors was based on the ch.2 as ref. s1 = sig1a; end %% ffts fftPoint = 1024; R = fftPoint; L = R/4; k = 1:1:fftPoint/2; w = 2*pi.*(k-1)*fs/fftPoint; %% FFT points N = numMic; numfft = 1:1:fftPoint/2; f = numfft*fs/fftPoint; %% %% Steering Vectors if numMic ==2 dis = 0.22; % SV 4 chan for i = 1: numMic SV(:,:,i) =... (exp(1i*2*pi.*f'*(i-1)*dis*sin(theta)/c)); end SV1 = SV(:,:,1); SV2 = SV(:,:,2); %% SV 2 chan for i = 1: fftPoint/2 for j = 1: length(theta) SVt(i,j)= {[SV1(i,j); SV2(i,j)]}; %Steering vector for each freq and angle end end else %% distance between mics 4 dis = 0.02; % SV 4 chan for i = 1: numMic SV(:,:,i) =... (exp(1i*2*pi.*f'*(i-1)*dis*sin(theta)/c)); end SV1 = SV(:,:,1); SV2 = SV(:,:,2); SV3 = SV(:,:,3); SV4 = SV(:,:,4); % SV 4 for i = 1: fftPoint/2 for j = 1: length(theta) SVtt(i,j)= ... {[SV1(i,j); SV2(i,j);SV3(i,j);SV4(i,j)]}; %Steering vector for each freq and angle end end % end %% Beampattern %% Steer 2 if numMic==2 for j = 1 : length(theta) for i = 1 : fftPoint/2-1 steer (i,j) = abs(sum(SVt{i,j},1))/N; %summing up steering vectors for all mics. end end end %% Steer 4 if numMic ==4 for j = 1 : length(theta) for i = 1 : fftPoint/2-1 steer4 (i,j) = abs(sum(SVtt{i,j},1))/N; %summing up steering vectors for all mics. end end %% Plot beam pattern % for i = 1 : fftPoint/2 % polar(theta,steer4(i,:)) ; %title(['frequency %' num2str(floor(i*(fs/2)/fftPoint)) ' Hz']); %pause(0.01) % end end %% STFT if numMic ==2 [sig1, t1] = enframe(s1,hamming(R),L); [sig2, t2] = enframe(s2,hamming(R),L); for i = 1 : length(sig1(:,1)) Sig1(i,:) = fft(sig1(i,:),fftPoint); Sig2(i,:) = fft(sig2(i,:),fftPoint); end Sig1 = Sig1(:,1:end/2-1); Sig2 = Sig2(:,1:end/2-1);
  • 9. end %% 4 chan if numMic==4 [sig1, t1] = enframe(s1,hamming(R),L); [sig2, t2] = enframe(s2,hamming(R),L); [sig3, t3] = enframe(s3,hamming(R),L); [sig4, t4] = enframe(s4,hamming(R),L); for i = 1 : length(sig1(:,1)) Sig1(i,:) = fft(sig1(i,:),fftPoint); Sig2(i,:) = fft(sig2(i,:),fftPoint); Sig3(i,:) = fft(sig3(i,:),fftPoint); Sig4(i,:) = fft(sig4(i,:),fftPoint); end % take first half Sig1 = Sig1(:,1:end/2-1); Sig2 = Sig2(:,1:end/2-1); Sig3 = Sig3(:,1:end/2-1); Sig4 = Sig4(:,1:end/2-1); end %% making blocks out of frames 2 chan if numMic==2 nframe =500; n = [1 nframe]; for p = 1:floor(length(Sig1(:,1))/ nframe) Sigb1(p,:) = {Sig1(n(1):n(2),:)}; Sigb2(p,:) = {Sig2(n(1):n(2),:)}; n = n + nframe; end for i = 1:length(Sigb1) for k = 1: fftPoint/2-1 SIG(i,k) = {[Sigb1{i}(:,k),Sigb2{i}(:,k)]}; %we need to find the cov between each %block for each signal at one frequency bin %=> energy of the signal at the freq bin end end % SIG is number of block times the % number of frequency bins, each entry % contains a cell, of all frames in %each block for each signal in that % specific freq bin. end %% making blocks out of frames 4 chan if numMic==4 nframe =350; n = [1 nframe]; for p = 1:floor(length(Sig1(:,1))/ nframe) Sigb1(p,:) = {Sig1(n(1):n(2),:)}; Sigb2(p,:) = {Sig2(n(1):n(2),:)}; Sigb3(p,:) = {Sig3(n(1):n(2),:)}; Sigb4(p,:) = {Sig4(n(1):n(2),:)}; n = n + nframe; end for i = 1:length(Sigb1) for k = 1: fftPoint/2-1 SIG(i,k) = ... {[Sigb1{i}(:,k),Sigb2{i}(:,k),Sigb3{i}(:,k),Sigb4{i}(:,k)]}; %we need to find the cov between each block for each %signal at one frequency bin => %energy of the signal at the freq bin end end end %% Rxx for i =1 : length(SIG(:,1)) % Goes through frames for j = 1: fftPoint/2-1 % goes through frequency bins Rxx(i,j)= {(transpose(SIG{i,j})*conj(SIG{i,j}))}; % each cell represent the covariance % for that frame in that freq bin (3*3) end end %% delay and sum 2 if numMic==2 for k = 1: length(Rxx(:,1)) for i = 1: fftPoint/2-1 for j = 1: length(theta) Power1(i,j,k) = ... abs(SVt{i,j}'*(Rxx{k,i})*SVt{i,j}); end end end %% capon 2 for k = 1: length(Rxx(:,1)) for i = 1: fftPoint/2-1 for j = 1: length(theta) Power2(i,j,k) = ... 1/abs(SVt{i,j}'*pinv(Rxx{k,i})*SVt{i,j}); end end end %% MUSIC 2 for k = 1: length(SIG(:,1)) for i = 1: fftPoint/2-1 [u e] = eigs(Rxx{k,i}); e diag = diag(e); [e sort e idx] = sort(e diag,'descend'); u sort = u(:,e idx); noise subspace = u sort(:,1); % Defined based on the dimentions of Rxx for j = 1: length(theta) Power3(i,j,k) = 1/abs(SVt{i,j}'... *(noise subspace*noise subspace')*SVt{i,j}); end end end end %% delay and sum 4 if numMic==4 for k = 1: length(Rxx(:,1)) for i = 1: fftPoint/2-1 for j = 1: length(theta) Power1(i,j,k) = ... abs(SVtt{i,j}'*(Rxx{k,i})*SVtt{i,j}); end end end %% capon 4 for k = 1: length(Rxx(:,1)) for i = 1: fftPoint/2-1 for j = 1: length(theta) Power2(i,j,k) = 1/abs(SVtt{i,j}'... *pinv(Rxx{k,i})*SVtt{i,j}); end end end %% MUSIC 4 for k = 1: length(SIG(:,1)) for i = 1: fftPoint/2-1 [u e] = eig(Rxx{k,i}); e diag = diag(e); [e sort e idx] = sort(e diag,'descend'); u sort = u(:,e idx); noise subspace = u sort(:,3:4); % Defined based on the dimentions of Rxx for j = 1: length(theta) Power3(i,j,k) = (1/abs((SVtt{i,j}')...
  • 10. *(noise subspace*noise subspace')*(SVtt{i,j}))); end end end end %% Power for i = 1: length(SIG(:,1)) PowerSq1(i) = {squeeze(Power1(:,:,i))}; % power for each block PowerSq2(i) = {squeeze(Power2(:,:,i))}; PowerSq3(i) = {squeeze(Power3(:,:,i))}; end % for i = 1: length(SIG(:,1)) sumPower1(i,:) = sum(PowerSq1{i},1); % sum over all freq, max power at every angle sumPower2(i,:) = sum(PowerSq2{i},1); sumPower3(i,:) = sum(PowerSq3{i},1); end %% plot delay and sum and MVDR figure(1);subplot(1,3,1); plot((theta*180/pi),sumPower1) ;title('D and S'); xlabel('Angle'); ylabel('Power') subplot(1,3,2);plot((theta*180/pi),sumPower2) ;title('MVDR'); xlabel('Angle'); ylabel('Power') %% MUSIC subplot(1,3,3); plot(((theta)*180/pi)+9,sumPower3) ;title('MUSIC'); xlabel('Angle'); ylabel('Power') axis([-90 90 2 max(max(sumPower3))*1.2]) %% RMSE if numMic ==2 x=mean(sumPower1,1); [X ind] = max(x); vec = [zeros(1,ind-1),... X,zeros(1,size(sumPower1,2)-ind)]; er1= sqrt(norm(x-vec)/(norm(x)*numMic)) x=mean(sumPower2,1); [X ind] = max(x); vec = [zeros(1,ind-1),X,... zeros(1,size(sumPower1,2)-ind)]; er2= sqrt(norm(x-vec)/(norm(x)*numMic)) x=mean(sumPower3,1); [X ind] = max(x); vec = [zeros(1,ind-1),X,... zeros(1,size(sumPower1,2)-ind)]; er3= sqrt(norm(x-vec)/(norm(x)*numMic)) else x=mean(sumPower1,1); [X ind] = max(x); vec = [zeros(1,ind-1),X,... zeros(1,size(sumPower1,2)-ind)]; er1= sqrt(norm(x-vec)/(norm(x)*numMic)) x=mean(sumPower2,1); [X ind] = max(x); vec = [zeros(1,ind-1),X,... zeros(1,size(sumPower1,2)-ind)]; er2= sqrt(norm(x-vec)/(norm(x)*numMic)) x=mean(sumPower3,1); [X ind] = findpeaks(x,'NPeaks',4) ; vec = [zeros(1,ind(2)-1),X(2)... ,zeros(1,ind(4)-ind(2)-1),X(4),... zeros(1,size(sumPower1,2)-ind(4))]; er3= sqrt(norm(x-vec)/(norm(x)*numMic)) end 9. REFERENCES 1 Farrell, K.; Mammone, R.; Flanagan, J.L., ”Beam- forming microphone arrays for speech enhancement,” Acoustics, Speech, andSignalProcessing, 1992. ICASSP- 92., 1992 IEEE International Conference on , vol.1, no., pp.285,288 vol.1, 23-26 Mar 1992 2 Cauchi, B , Joint Dereverberation and noise reducion using Beamforming and single-channel speech enhance- ment scheme , Reverb Challenge 214 3 Habets, E.A.P.; Benesty, J., ”A Two-Stage Beamforming Approach for Noise Reduction and Dereverberation,” Audio, Speech, and Language Processing, IEEE Trans- actions on , vol.21, no.5, pp.945,958, May 2013 4 Speech enhancement with multichannel Wiener filter techniques in multimicrophone binaural hearing aids Van den Bogaert, Tim and Doclo, Simon and Wouters, Jan and Moonen, Marc, The Journal of the Acoustical Society of America, 125, 360-371 (2009) 5 Antnio L. L, R , Delay-and-sum beamforming for direc- tion of arrival estimation applied to gunshot acoustics , Proceedings of the SPIE 6 Shengkui , Z , 3D BINAURAL AUDIO CAPTURE AND REPRODUCTION USING A MINIATURE MI- CROPHONE ARRAY , Conference on Digital Audio Effects 7 Miles, Q. Su, W. Cui, and M. Shetye, R , A low- noise differential microphone inspired by the ears of the parasitoid fly Ormia ochracea , Acoustic Society 8 Sound source localization inspired by the ears of the Ormia ochracea Kuntzman, Michael L. and Hall, Neal A., Applied Physics Letters, 105, 033701 (2014) 9 Benesty, Jacek P. Dmochowski , Microphone Arrays: Fundamental Concepts , Springer 10 Iain, M , A Microphone Array Tutorial 11 Greensted, A , ”Delay Sum Beamforming”.Retrieved January , 2012 12 J. Capon. High-resolution frequency-wavenumber spectrum analysis. Proc. IEEE, 57(8), 14081418 (1969). 13 Bhuiya, F. Islam, M , Analysis of Direction of Arrival Techniques Using Uniform Linear Array , International Journal of Computer Theory and Engineering
  • 11. 14 Kawitkar, R , Performance of Different Types of Array Structures Based on Multiple Signal Classifica- tion (MUSIC) algorithm, International Conference on MEMS NANO, and Smart Systems 15 Richter, I , Spatial Filtering and DoA Estimation MVDR Beamformer and MUSIC Algorithm , Sensor Array Signal Processing 16 S P. Boyd, R , ROBUST MINIMUM VARIANCE BEAMFORMING