Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Sound Source Localization with microphone arrays
1. SOUND SOURCE LOCALIZATION WITH MICROPHONE ARRAYS
Ramin Anushiravani
Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign
ABSTRACT
In this paper I discuss three basic and important meth-
ods for finding the direction of arrival (DOA) in a far
field environment for sound sources. The first two ap-
proaches are based on Beamforming techniques: Delay
and Sum Beamformer and Minimum Variance Distor-
tionless Response Beamformer (MVDR). The third ap-
proach is a subspace method that uses the well-known
algorithm, Multiple Signal Classification (MUSIC). I
demonstrated the accuracy of each algorithm by local-
izing sound sources in an office environment using a
uniform linear array (ULA).
Index Terms— direction of arrival, Beamforming,
subspace method, uniform linear array
1. INTRODUCTION
Sound source localization has many applications in
speech enhancement such as speech denoising and
dereverberation [1-3] by forming a beam toward the
speaker and therefore reducing the noise reverberation
from other directions. This is especially useful for hear-
ing aids industry [4]. Sound source localization can also
be used for surveillance purposes such as finding the
direction of gun shots for in public places by scanning
the environment for sudden sound activities and classi-
fying sounds to one of the available database [5]. More
recently, sound source localization techniques has been
used to reconstruct spatial audio for the purpose of en-
tertainment and improving teleconferencing experience
[6]. In order to motivate the background behind the
DOA, I first discuss how living creatures localize sound
sources and then represent a simple time delay model
for localizing sound sources. In section III, I discuss the
general signal model for beamforming techniques. In
section VI, I talk about uniform linear arrays and specif-
ically two of their characteristics, spatial aliasing and
beampattern. In section V; I discuss three algorithms for
sound source localization, delay and sum, MVDR and
MUSIC. In section VI, I discuss the experiment setup
and finally in section VII I evaluate the results for the
algorithms mentioned in section V.
2. A MODEL FOR SOUND LOCALIZATION
In a crowded party, we can focus and form a beam
toward the listener and therefore enhance the speech
quality of the speaker, but how?
Human beings use variety of information from the
sound source and environment for localization. For
example, if a sound is located closer to the right ear, it
reaches it earlier than the left ear and it will also be big-
ger in amplitude due to head shadow effect. Brain uses
the time delay and level difference cues between the two
to localize the sound as shown in figure 1. However, if
a sound is coming from a back or front of the head, they
both will have similar time delay and level difference.
That’s when spectral information such as pinna shape,
shoulders, hair and even one’s clothing will help the
listener localize where the sound is coming from. In
addition to human, animals, insects and even parasites
also have some sort of source localization systems. An
interesting case is Ormia, a small fly that eat off crickets.
Ormia’s localization system is quite extraordinary and
many scientists have been trying to model microphones
and hearing of its sound localization systems [7]. The
distance between Ormia’s ears is of micrometer and
so the time and the level difference between the two
ears is too small to help the parasite localize sound.
Ormia’s ears, however, communicate with each other in
a complex manner and that help increasing the resolu-
tion of the localization by almost 20 times its physical
constraint [8].
2.1. Time Delay Model
As discussed earlier, time delay is an important cue for
localizing sound. We can build an exaggerated simpli-
fied model for sound localization based of time delay
for human ears as shown in figure 2.
The assumption on a time delay model is that, the
listener does not have a head (there is no level difference
cues or spectral information), only two ears distant from
each other by the size of the head (about 22 cm). Since in
most practical scenario the sound source is in far field,
we can assume that sound waves are parallel to each
2. Fig. 1. Time delay and level difference cues for sound
source localization
Fig. 2. A time delay model for source localization
other from microphones point of view. One of the mi-
crophones (ears) can then be fixed as a reference, and the
time delay for the other microphone is calculated using
geometry. The time delay of arrival can be represented
as the angle of arrival as shown in equation 1.
τ =
dsin(θ)
c
(1)
In frequency domain time delay is represented as,
e−jωτ
, where ω is the freuency of the source signal.
Speech signal are broadband signals, and so this nar-
rowband assumption will not work in practice.
Given a reference signal and a delayed signal, one can
localize a narrowband signal by simply undoing the
delay from the delayed signal as shown in equation 2.
argmax{
θ
n
n=0
n
m=0
||delayed{n} ref{n + m}||} = (2a)
argmax(
θ
n
n=0
||Cref,delayed[m]||) (2b)
Where n is the number of samples, m undo the delay
from the delayed signal and Cxy is the cross correlation
matrix between the two signals. Once we solved for m,
we can then translate this sample difference into time
difference in second and use equation one to find the
DOA. Basic is idea to use cross correlation and see when
we have a match between the two. The block diagram
for this simple time delay model is shown in figure 3.
Fig. 3. Block diagram for aligning to narrowband signals
3. SIGNAL MODEL
Beamforming is a powerful tool in array signal process-
ing. Beamforming can be thinking of as spatial filtering
for detecting and estimating the output of a sensor array
such that the SNR of other the signal is increased, or the
beampattern of the array is narrower and more accurate.
Different filters can be derived using the signal model
discussed below which is discussed in more details in
[9]. A simple beamforming system is shown in figure 4.
The signal recorded at each sensor can be represented
as,
yn(t) = gn(t) ∗ s(t) + vn(t) (3)
3. Where y is the recorded signal at each sensor, g is the
spatial response corresponding to the location of the
source s, the clean source signal and v is zero mean
Gaussian noise signal. In frequency domain equation 3
can be represented as,
Yn(k) = Gn(k)S(k) + Vn(k) = (4a)
d(k)X1(k) + V(k) (4b)
Where d is the time delay between the two signal in
frequency domain, and X1 is the recorded signal at the
reference microphone. The output of the beamformer
can be shown as following,
Z(k) = WH
Y(k) (5)
Where W are spatial filters, beamformer weights, and Z
is the output of the beamformer. Substituting equation
4.b in equation 5,
WH
[d(k)X1(k) + V(k)] = (6a)
X1,f (k) + Vrn(k) (6b)
Where,
X1, f = WH
(k)d(k)X1(k)
Vrn(k) = WH
(k)V(k)
Fig. 4. Beamformer
4. UNIFORM LINEAR ARRAY
There are variety of microphone arrays that can be used
based on its application e.g. circular arrays, spherical
arrays, biologically inspired arrays, ad-hoc arrays, etc.
Uniform linear arrays are usually used for comparing
Fig. 5. Uniform Linear Array
source localization algorithms due to their simple geom-
etry and the fact that they are cheap and commercially
available as shown in figure 5. We can represent the
signal at each microphone as,
ym(t) = si(t)
M
i=1
ej(m−1)µi
+ vm(t) (7a)
y = As(t) + v(t) (7b)
Where, M is the number of microphones, µi = −2π
λ lsin(θi)
is called the spatial frequency, l is the distance between
two adjacent microphone,λ is the wavelength of the
source signal, θ is the angle of arrival and A is called
the steering vectors, or spatial responses specific to the
array.
A = a(µ1) . . . a(µi) . . . a(µM) (8a)
=
1 1 . . . 1
ejµ1
ejµ2
. . . ejµd
...
...
...
...
ej(M−1)µ1
ej(M−1)µ2
. . . ej(M−1)µd
(8b)
Expanding one of the terms in matrix (8b) we have,
e
−2πjk(Fs)lsin(θ)
cN . Where k is the digital frequency, Fs is the
sampling rate, c is the speed of sound, and N is the
number of DFT points.
4.1. Beampattern
We can visualize the steering vectors of ULA in polar
plots over all angles, for any number elements, for a
specific frequency and an arbitrary input. This visual-
ization of steering vectors is called the beampattern. For
example, in a two microphone case from section II the
steering vectors are,
1 1 . . . 1
e−jωτ1
e−jωτ2
. . . e−jωτn
(9)
Where n depends on desired scanning resolutions, num-
ber of angles. In figure 6 we can see the steering vector
4. for different number of M’s and spacing l’s for 1kHz
and 4kHz. Note that since ULA is symmetric from back
and front and time delay model was used to derive the
steering vectors, the back and front of the ULA beam-
pattern are symmetric, and the array is unable to dis-
tinguish back sounds from front sound. In order to
fix this cone of confusion one can simply use direction
microphone that faces the front of the array, though di-
rectional microphone are usually more expensive and
lower in quality with respect to omnidirectional micro-
phones.
As the frequency increases, the main lobes in the beam-
pattern get narrower, however it create grading lobes
and also lobes that are as big as the main lobe, which
is called spatial aliasing and it is discussed in more de-
tail in the next section. Decreasing l help with avoiding
grading lobes, but it also widen the main lobe. As the
number of elements increases in an array, there is an
obvious advantage of narrower beam and smaller grad-
ing lobes. More microphones, however, requires more
space, power and cost.
Fig. 6. Beampattern
4.1.1. Spatial Aliasing
As discussed briefly in the previous subsection, spatial
aliasing is an artifact that directs the main beam to an-
other angle. Spatial aliasing is similar to aliasing. Alias-
ing happens when the bandwidth of the signal is more
than half the sampling rate and that results in overlap-
ping between spectral information in the signal. Spatial
aliasing happens when lowest wavelength (λ = c/f) of
the signal is less than half the spacing between adjacent
microphones which results in multiple main lobes in the
beampattern [10]. Figure 7 shows spatial aliasing over
all angles and frequencies for two microphones that are
22 cm apart.
Fig. 7. Spatial Aliasing
We can see that for lower frequencies, microphone
array has an omnidirectional response, there is no main
lobe. Spatial aliasing occurs at about 1600Hz and in-
creases as we go higher in frequency. Different designs
and techniques can significantly reduce the spatial alias-
ing. For example, increasing the distance between the
two microphones for higher frequencies, or in general
using frequency dependent spacing between two ele-
ments, rather than using uniform spacing [11].
5. SOUND SOURCE LOCALIZATION
In this section I will discuss three methods for localizing
a sound source. Two of which are based on beamform-
ing techniques: Delay and Sum beamformer and MVDR
beamformer. And, finally MUSIC which is a subspace
algorithm.
5.1. Beamforming
Basic concepts of beamforming were discussed in sec-
tion III. In this section we are going to develope two
types of filters for W which one is fixed and the other is
adaptive to follow the signal.
5.1.1. Delay And Sum
Delay and sum beamformer is a fixed beamformer. That
means, it does not adapt itself to the signal. The basic
5. idea behind delay and sum is to scan environments us-
ing the microphone array beampattern at every angle
and calculate when the output power of the signal is
at its maximum [13]. Looking back at equation 5 and
figure 4 the array output power is,
P(w) =
1
k
K
k=1
|Z(k)2
| = WH
RyyW (10)
Where Ryy = Y(k)YH
(k) and W = A(θ). Equation 10 can
be rewritten as,
a
θ
rgmaxP(θ) = a
θ
rgmax{A(θ)H
RyyA(θ)} (11)
5.1.2. MVDR
Minimum Variance Distortionless Response, MVDR,
also known as the Capon beamformer [12], is a delay
and sum beamformer with an additional constraint on
the output power.
WMVDR = a
w
rgminWH
RyyW s.t. WH
A(θ) = 1 (12)
Conceptually, this constraint means that the power at
the look direction of the source must be a unity gain
(g(θi) = 1) and then the power at every other angle
must be minimized accordingly. This is a minimization
problem that leads to the lagrangian below,
J(w, λ) = WH
RyyW + λ(WH
A(θ) − 1)(A(θ)H
W − 1) (13)
Taking the gradient of J with respect to λ and W as
derived in details in [14], we can find W and therefore
the output power of the beamformer as,
WMVDR(θ) =
RyyA(θ)
A(θ)R−1
yyA(θ)
(14)
PMVDR(θ) =
1
A(θ)R−1
yyA(θ)
(15)
This additional constraint on delay and sum beam-
former will reduce the distortion on the output power
while keeping the look angle maximum [16]. Note
that both delay and sum and MVDR require a good
estimation of the recorded signal covariance. In case of
MVDR, we need this covariance to be an invertible one
which would require at least as many observations as
the number of sensors.
5.2. Subspace algorithm
5.2.1. MUSIC
Multiple Signal Classification, MUSIC [14] , is a subspace
method for localizing sources. Looking back at equation
7.b we can define the following expressions,
Ryy =
1
N
N
n=1
y(n)yH
(n) =
1
N
YYH
(16a)
E{Ryy} = A(θ)RssAH
(θ) + σ2
NI (16b)
Where Rss = 1
N SSH
is the clean signal covariance and
we assume that we have access to N samples of the
recorded signals at a time. We can then use Eigen Value
Decomposition to decompose Ryy to signal and noise
subspaces.
Ryy = [Us Un]
λ1 . . . 0
...
...
...
0 . . . λM
UH
s
UH
n
(17)
Where Us is a signal space, Un is a noise subspace and
λ1 > λ2 > · · · > λM. It makes sense that the signal sub-
space spans the steering vector subspace, and that the
noise subspace is perpendicular to the steering vector
subspace as shown below.
span(Us) → span(A(θ)) (18a)
Un ⊥ A(θ) → UH
n A(θ) = 0 (18b)
We can use the orthognonality between the noise sub-
space and the array steering vectors to find the direction
arrival by defining the output power from MUSIC algo-
rithm as one of the following,
PMUSIC(θ) =
1
||UH
n A(θ)||
=
1
A(θ)HUnUH
n A(θ)
(19a)
PMUSIC(θ) =
A(θ)A(θ)H
A(θ)HUnUH
n A(θ)
(19b)
Equation (19.a) is known as the MUSIC Pseudo Spec-
trum, and (19.b) is known as the MUSIC Spatial Spec-
trum. The poles of either one of these equation points
to the direction of the signal source. One disadvantage
with MUSIC algorithm is that we need to determine the
number of sources that needs be detected an in advance
as well as having at least one extra sensor for the noise
subspace. That is with M sensor we can localize up to
M − 1 sources.
6. EXPERIMENT SETUP
I used PlayStation EYE as my uniform linear array. PS
EYE has 4 microphones inside which are about 2cm apart
with the sampling rate of 16kHz. I ran two sets of exper-
iments: (1) One source at about 15 degrees and two mi-
crophones. I used a plastic bag as my sound source, the
spectrogram is provided in figure9. The microphones
6. marked with circles are the two sensors I used for this
case. (2) I used two sources located at 15 and −25
degrees, a loud fan and a speech signal as my sound
sources and I used all four microphones in PS EYE to
record them. The spectrogram for these cases can also
be seen in figure 9. In each case, I used a ruler to ap-
proximately find the location of the sound source where
the right most microphones in the PS EYE is marked
-90 degrees and the left most microphone is marked +90
degrees.
Fig. 8. Playstation EYE
Fig. 9. Spectrogram for one and two sources scenarios
7. RESULTS
I evaluated all three algorithms from section V for the
two cases discussed in section VI and plot the power
from each algorithm over all angles defined in section
VI. The results for case 1 and 2 are shown in Figure 10
respectively.
Fig. 10. Output power for one and two sources scenarios
For one source scenario, all three algorithms were
able to detect the DOA correctly. Delay and Sum beam-
former was not able to minimize the grading lobes and
distortion in the signal. MVDR was able to improve that
by minimizing the variance of the distortion. MUSIC
algorithm localization looks almost as a delta response
with one peak at the source location. For the two sound
sources scenario, delay and sum and MVDR are not able
to resolve the resolution between the two sources. MU-
SIC, however, is able to localize the two sound sources.
As you can see, MUSIC is giving the best results for both
scenarios. MUSIC, however, is also very sensitive to the
frame of analysis. It is important to note that MUSIC
needs lots of frames to form a reasonably well defined
noise subspace. A quantitative way of evaluating these
algorithms is based on how narrow the localization ac-
curacy is, such as root mean square error (RMSE) [15].
RMSE =
1
k
K
k=1
(θestk
− θtruek
)2 (20)
Where, k is the number of blocks (group of frames).
RMSE Delay and Sum MVDR MUSIC
1st scenario 0.7035 0.1012 0.0851
2nd scenario 0.4992 0.4990 0.1903
RMSE is only one metric for comparing sound source
localization algorithms. One must also take noise and
7. reverberation into account when localizing a sound
source, e.g. is the algorithm robust enough to distin-
guish the direct sound from the reflections? Which
algorithm is able to create an output with higher SNR?
Etc.
8. APPENDIX
In this section, I have included the Matlab code for vi-
sualizing the beampattern for ULA and also the Mat-
lab codes for sound source localization algorithms dis-
cussed in section V.
8.1. Matlab Code for Beampattern for ULA
%% Ramin Anushiravani
% 11/24/14
% Linear Mic array
close all; clear all; clc
dis = 0.02;
fs = 9000;%48000;
fftPoint = 1024;
numfft = 1:1:fftPoint/2;
f = numfft*fs/fftPoint; %hertz
res = 1;
theta = -pi:res*pi/180:pi;
c = 345;
numMic =10;
for i = 1: numMic
SV(:,:,i) =(exp(1i*2*pi.*f'*(i-1)*dis*sin(theta)/c));
%delay and sum
end
Out = abs(sum(SV,3)/numMic);
for i = 1:10:fftPoint/2
% figure(1);subplot(1,2,1);
polar(theta,Out(i,:));
title(['Frequeny ' , ...
num2str(round(i*fs/fftPoint)), ' Hz']);
%subplot(1,2,2);plot(theta,20*log10(Out(i,:)));
%axis([-pi pi -20 1]);title('Beam Pattern');
pause
end
figure;imagesc(theta*180/pi,numfft*fs/fftPoint,Out);
xlabel('angle');ylabel('frequency-Hz');
title('Beam Pattern');axis xy
colormap hot
% %% Simulation signals.
% angle = pi/4;
% bin = 100;
% f1 = bin*fs/fftPoint;
%
% tdelay = dis*sin(angle)/c;
% L = fftPoint;
% t = (0:L-1)/fs;
%
% sig1 = sin(2*pi*f1*t);
% sig2 = fft(sig1).
%*exp(-1i*2*pi*([0:L/2 -L/2+1:-1])*tdelay*fs/L);
% fft is symmetric.
%
% sig3 = real(ifft(sig2));
% TT= [sig3; sig1];
% x = TT';
% audiowrite('sim.wav',TT',fs);
8. %
%
8.2. Sound Source Localization
%Ramin Anushiravani
% March 1st,14
clc;clear all; close all;
%% 11/24/14
numMic =2; % #mics
%% Theta
addpath('sounds551');
theta = -pi/2:pi/179:pi/2;
%if 0 to pi cos, if -pi/2 to pi/2 sin.
c = 345;
%% 2 chan
if numMic ==2
% d = 0.22*cos(theta) ;
% t = d/c; %time delay between mics
[sig fs]= audioread('mystery angle.wav');
end
%% 4 chan
if numMic ==4
% [sig1 fs] = audioread('cup-01.wav');
% [sig2 fs] = audioread('cup-02.wav');
% [sig3 fs] = audioread('cup-03.wav');
% [sig4 fs] = audioread('cup-04.wav');
[sig1 fs] = audioread('fan speaker-01.wav');
[sig2 fs] = audioread('fan speaker-02.wav');
[sig3 fs] = audioread('fan speaker-03.wav');
[sig4 fs] = audioread('fan speaker-04.wav');
Fs = 16000;
s4 = resample(sig4,Fs,fs);
s2 = resample(sig3,Fs,fs);
s3 = resample(sig2,Fs,fs);
s1 =resample(sig1,Fs,fs);
else
%% 2 chan
i = [1 11];
sig1a = sig(i(1):i(2)*fs,1);
sig2a = sig(i(1):i(2)*fs,2);
%get the signals first
%% 2 chan
s2 = sig2a;
%ch.2 is closer to the source.
%steering vectors was based on the ch.2 as ref.
s1 = sig1a;
end
%% ffts
fftPoint = 1024;
R = fftPoint;
L = R/4;
k = 1:1:fftPoint/2;
w = 2*pi.*(k-1)*fs/fftPoint;
%% FFT points
N = numMic;
numfft = 1:1:fftPoint/2;
f = numfft*fs/fftPoint;
%%
%% Steering Vectors
if numMic ==2
dis = 0.22;
% SV 4 chan
for i = 1: numMic
SV(:,:,i) =...
(exp(1i*2*pi.*f'*(i-1)*dis*sin(theta)/c));
end
SV1 = SV(:,:,1);
SV2 = SV(:,:,2);
%% SV 2 chan
for i = 1: fftPoint/2
for j = 1: length(theta)
SVt(i,j)= {[SV1(i,j); SV2(i,j)]};
%Steering vector for each freq and angle
end
end
else
%% distance between mics 4
dis = 0.02;
% SV 4 chan
for i = 1: numMic
SV(:,:,i) =...
(exp(1i*2*pi.*f'*(i-1)*dis*sin(theta)/c));
end
SV1 = SV(:,:,1);
SV2 = SV(:,:,2);
SV3 = SV(:,:,3);
SV4 = SV(:,:,4);
% SV 4
for i = 1: fftPoint/2
for j = 1: length(theta)
SVtt(i,j)= ...
{[SV1(i,j); SV2(i,j);SV3(i,j);SV4(i,j)]};
%Steering vector for each freq and angle
end
end
%
end
%% Beampattern
%% Steer 2
if numMic==2
for j = 1 : length(theta)
for i = 1 : fftPoint/2-1
steer (i,j) = abs(sum(SVt{i,j},1))/N;
%summing up steering vectors for all mics.
end
end
end
%% Steer 4
if numMic ==4
for j = 1 : length(theta)
for i = 1 : fftPoint/2-1
steer4 (i,j) = abs(sum(SVtt{i,j},1))/N;
%summing up steering vectors for all mics.
end
end
%% Plot beam pattern
% for i = 1 : fftPoint/2
% polar(theta,steer4(i,:)) ;
%title(['frequency
%' num2str(floor(i*(fs/2)/fftPoint)) ' Hz']);
%pause(0.01)
% end
end
%% STFT
if numMic ==2
[sig1, t1] = enframe(s1,hamming(R),L);
[sig2, t2] = enframe(s2,hamming(R),L);
for i = 1 : length(sig1(:,1))
Sig1(i,:) = fft(sig1(i,:),fftPoint);
Sig2(i,:) = fft(sig2(i,:),fftPoint);
end
Sig1 = Sig1(:,1:end/2-1);
Sig2 = Sig2(:,1:end/2-1);
9. end
%% 4 chan
if numMic==4
[sig1, t1] = enframe(s1,hamming(R),L);
[sig2, t2] = enframe(s2,hamming(R),L);
[sig3, t3] = enframe(s3,hamming(R),L);
[sig4, t4] = enframe(s4,hamming(R),L);
for i = 1 : length(sig1(:,1))
Sig1(i,:) = fft(sig1(i,:),fftPoint);
Sig2(i,:) = fft(sig2(i,:),fftPoint);
Sig3(i,:) = fft(sig3(i,:),fftPoint);
Sig4(i,:) = fft(sig4(i,:),fftPoint);
end
% take first half
Sig1 = Sig1(:,1:end/2-1);
Sig2 = Sig2(:,1:end/2-1);
Sig3 = Sig3(:,1:end/2-1);
Sig4 = Sig4(:,1:end/2-1);
end
%% making blocks out of frames 2 chan
if numMic==2
nframe =500;
n = [1 nframe];
for p = 1:floor(length(Sig1(:,1))/ nframe)
Sigb1(p,:) = {Sig1(n(1):n(2),:)};
Sigb2(p,:) = {Sig2(n(1):n(2),:)};
n = n + nframe;
end
for i = 1:length(Sigb1)
for k = 1: fftPoint/2-1
SIG(i,k) = {[Sigb1{i}(:,k),Sigb2{i}(:,k)]};
%we need to find the cov between each
%block for each signal at one frequency bin
%=> energy of the signal at the freq bin
end
end
% SIG is number of block times the
% number of frequency bins, each entry
% contains a cell, of all frames in
%each block for each signal in that
% specific freq bin.
end
%% making blocks out of frames 4 chan
if numMic==4
nframe =350;
n = [1 nframe];
for p = 1:floor(length(Sig1(:,1))/ nframe)
Sigb1(p,:) = {Sig1(n(1):n(2),:)};
Sigb2(p,:) = {Sig2(n(1):n(2),:)};
Sigb3(p,:) = {Sig3(n(1):n(2),:)};
Sigb4(p,:) = {Sig4(n(1):n(2),:)};
n = n + nframe;
end
for i = 1:length(Sigb1)
for k = 1: fftPoint/2-1
SIG(i,k) = ...
{[Sigb1{i}(:,k),Sigb2{i}(:,k),Sigb3{i}(:,k),Sigb4{i}(:,k)]};
%we need to find the cov between each block for each
%signal at one frequency bin =>
%energy of the signal at the freq bin
end
end
end
%% Rxx
for i =1 : length(SIG(:,1)) % Goes through frames
for j = 1: fftPoint/2-1 % goes through frequency bins
Rxx(i,j)= {(transpose(SIG{i,j})*conj(SIG{i,j}))};
% each cell represent the covariance
% for that frame in that freq bin (3*3)
end
end
%% delay and sum 2
if numMic==2
for k = 1: length(Rxx(:,1))
for i = 1: fftPoint/2-1
for j = 1: length(theta)
Power1(i,j,k) = ...
abs(SVt{i,j}'*(Rxx{k,i})*SVt{i,j});
end
end
end
%% capon 2
for k = 1: length(Rxx(:,1))
for i = 1: fftPoint/2-1
for j = 1: length(theta)
Power2(i,j,k) = ...
1/abs(SVt{i,j}'*pinv(Rxx{k,i})*SVt{i,j});
end
end
end
%% MUSIC 2
for k = 1: length(SIG(:,1))
for i = 1: fftPoint/2-1
[u e] = eigs(Rxx{k,i});
e diag = diag(e);
[e sort e idx] = sort(e diag,'descend');
u sort = u(:,e idx);
noise subspace = u sort(:,1);
% Defined based on the dimentions of Rxx
for j = 1: length(theta)
Power3(i,j,k) = 1/abs(SVt{i,j}'...
*(noise subspace*noise subspace')*SVt{i,j});
end
end
end
end
%% delay and sum 4
if numMic==4
for k = 1: length(Rxx(:,1))
for i = 1: fftPoint/2-1
for j = 1: length(theta)
Power1(i,j,k) = ...
abs(SVtt{i,j}'*(Rxx{k,i})*SVtt{i,j});
end
end
end
%% capon 4
for k = 1: length(Rxx(:,1))
for i = 1: fftPoint/2-1
for j = 1: length(theta)
Power2(i,j,k) = 1/abs(SVtt{i,j}'...
*pinv(Rxx{k,i})*SVtt{i,j});
end
end
end
%% MUSIC 4
for k = 1: length(SIG(:,1))
for i = 1: fftPoint/2-1
[u e] = eig(Rxx{k,i});
e diag = diag(e);
[e sort e idx] = sort(e diag,'descend');
u sort = u(:,e idx);
noise subspace = u sort(:,3:4);
% Defined based on the dimentions of Rxx
for j = 1: length(theta)
Power3(i,j,k) = (1/abs((SVtt{i,j}')...
10. *(noise subspace*noise subspace')*(SVtt{i,j})));
end
end
end
end
%% Power
for i = 1: length(SIG(:,1))
PowerSq1(i) = {squeeze(Power1(:,:,i))};
% power for each block
PowerSq2(i) = {squeeze(Power2(:,:,i))};
PowerSq3(i) = {squeeze(Power3(:,:,i))};
end
%
for i = 1: length(SIG(:,1))
sumPower1(i,:) = sum(PowerSq1{i},1);
% sum over all freq, max power at every angle
sumPower2(i,:) = sum(PowerSq2{i},1);
sumPower3(i,:) = sum(PowerSq3{i},1);
end
%% plot delay and sum and MVDR
figure(1);subplot(1,3,1);
plot((theta*180/pi),sumPower1)
;title('D and S'); xlabel('Angle'); ylabel('Power')
subplot(1,3,2);plot((theta*180/pi),sumPower2)
;title('MVDR'); xlabel('Angle'); ylabel('Power')
%% MUSIC
subplot(1,3,3); plot(((theta)*180/pi)+9,sumPower3)
;title('MUSIC'); xlabel('Angle'); ylabel('Power')
axis([-90 90 2 max(max(sumPower3))*1.2])
%% RMSE
if numMic ==2
x=mean(sumPower1,1);
[X ind] = max(x);
vec = [zeros(1,ind-1),...
X,zeros(1,size(sumPower1,2)-ind)];
er1= sqrt(norm(x-vec)/(norm(x)*numMic))
x=mean(sumPower2,1);
[X ind] = max(x);
vec = [zeros(1,ind-1),X,...
zeros(1,size(sumPower1,2)-ind)];
er2= sqrt(norm(x-vec)/(norm(x)*numMic))
x=mean(sumPower3,1);
[X ind] = max(x);
vec = [zeros(1,ind-1),X,...
zeros(1,size(sumPower1,2)-ind)];
er3= sqrt(norm(x-vec)/(norm(x)*numMic))
else
x=mean(sumPower1,1);
[X ind] = max(x);
vec = [zeros(1,ind-1),X,...
zeros(1,size(sumPower1,2)-ind)];
er1= sqrt(norm(x-vec)/(norm(x)*numMic))
x=mean(sumPower2,1);
[X ind] = max(x);
vec = [zeros(1,ind-1),X,...
zeros(1,size(sumPower1,2)-ind)];
er2= sqrt(norm(x-vec)/(norm(x)*numMic))
x=mean(sumPower3,1);
[X ind] = findpeaks(x,'NPeaks',4) ;
vec = [zeros(1,ind(2)-1),X(2)...
,zeros(1,ind(4)-ind(2)-1),X(4),...
zeros(1,size(sumPower1,2)-ind(4))];
er3= sqrt(norm(x-vec)/(norm(x)*numMic))
end
9. REFERENCES
1 Farrell, K.; Mammone, R.; Flanagan, J.L., ”Beam-
forming microphone arrays for speech enhancement,”
Acoustics, Speech, andSignalProcessing, 1992. ICASSP-
92., 1992 IEEE International Conference on , vol.1, no.,
pp.285,288 vol.1, 23-26 Mar 1992
2 Cauchi, B , Joint Dereverberation and noise reducion
using Beamforming and single-channel speech enhance-
ment scheme , Reverb Challenge 214
3 Habets, E.A.P.; Benesty, J., ”A Two-Stage Beamforming
Approach for Noise Reduction and Dereverberation,”
Audio, Speech, and Language Processing, IEEE Trans-
actions on , vol.21, no.5, pp.945,958, May 2013
4 Speech enhancement with multichannel Wiener filter
techniques in multimicrophone binaural hearing aids
Van den Bogaert, Tim and Doclo, Simon and Wouters,
Jan and Moonen, Marc, The Journal of the Acoustical
Society of America, 125, 360-371 (2009)
5 Antnio L. L, R , Delay-and-sum beamforming for direc-
tion of arrival estimation applied to gunshot acoustics ,
Proceedings of the SPIE
6 Shengkui , Z , 3D BINAURAL AUDIO CAPTURE
AND REPRODUCTION USING A MINIATURE MI-
CROPHONE ARRAY , Conference on Digital Audio
Effects
7 Miles, Q. Su, W. Cui, and M. Shetye, R , A low-
noise differential microphone inspired by the ears of the
parasitoid fly Ormia ochracea , Acoustic Society
8 Sound source localization inspired by the ears of
the Ormia ochracea Kuntzman, Michael L. and Hall,
Neal A., Applied Physics Letters, 105, 033701 (2014)
9 Benesty, Jacek P. Dmochowski , Microphone Arrays:
Fundamental Concepts , Springer
10 Iain, M , A Microphone Array Tutorial
11 Greensted, A , ”Delay Sum Beamforming”.Retrieved
January , 2012
12 J. Capon. High-resolution frequency-wavenumber
spectrum analysis. Proc. IEEE, 57(8), 14081418 (1969).
13 Bhuiya, F. Islam, M , Analysis of Direction of Arrival
Techniques Using Uniform Linear Array , International
Journal of Computer Theory and Engineering
11. 14 Kawitkar, R , Performance of Different Types of
Array Structures Based on Multiple Signal Classifica-
tion (MUSIC) algorithm, International Conference on
MEMS NANO, and Smart Systems
15 Richter, I , Spatial Filtering and DoA Estimation
MVDR Beamformer and MUSIC Algorithm , Sensor
Array Signal Processing
16 S P. Boyd, R , ROBUST MINIMUM VARIANCE
BEAMFORMING