9. Acoustic feature extraction process
STFT
| ・ |2
Mel FB
Log compression
DCT
Δ, ΔΔ
Microphone
Decoder
STFT coefficients
Fully benefit from
the use of
microphone arrays
10. Acoustic feature extraction process
STFT
| ・ |2
Mel FB
Log compression
DCT
Δ, ΔΔ
Microphone
Decoder
Power spectra
Easy to combine
with noise
suppressors
11. Acoustic feature extraction process
STFT
| ・ |2
Mel FB
Log compression
DCT
Δ, ΔΔ
Microphone
Decoder
Log mel-frequency
features
Efficient for reducing
the acoustic mismatch
between observations
and training data
12. n : frame index
ny : corrupted vector
nx : clean vector
nxˆ : estimate of xn
Notations
14. ∫= nn xxˆ ),,|(p 1nnYY,|X past
yyx ndx
),,,|(p 11-nnnYX,|Y past
yyxy )(p nX x
×
Clean speech modelReverberation model
Generative approach (using Bayes rule)
15. ∫= nn xxˆ ),,|(p 1nnYY,|X past
yyx ndx
),,,|(p 11-nnnYX,|Y past
yyxy )(p nX x
×
Clean speech modelReverberation model
Generative approach (using Bayes rule)
16. STFT domain
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Log mel-frequency feature domain
Linear
prediction
VTS
17. STFT domain
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Log mel-frequency feature domain
18. n : frame index
ny : corrupted complex-valued spectrum
(consisting of 257 bins)
nx : clean complex-valued spectrum
nxˆ : estimate of xn
Notations
19. ∏=
j
X
jn,jn,CNnX,nX )λ;0,(xf)Λ;(p x
Clean STFT coefficients:
normally distributed
X
Jn,
X
n,1 λ,...,λ
X
nP1,...,p
X
pn, σ,)(a =
2
p
piωX
pn,
X
nX
jn,
j
ea1
σ
λ
∑
−
−
=
All-pole model
No model
Model Form Parameters
Clean PSD
20. STFT domain
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Log mel-frequency feature domain
24. STFT domain
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Log mel-frequency feature domain
25. When model parameters are known
jn,p jp,jn,jn, ygyx ∑ ∗
−= ˆˆ
)ygyδ(x jn,p jp,jn,jn, ∑ ∗
+−= ˆ
)Λ,Λ;y,y|(xp RXj1,jn,jn,YY,|X past
ˆˆ
Inverse filtering
26. STFT domain
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Log mel-frequency feature domain
27. ML for parameter estimation
∑∑=
j n
RXj1,j1,-njn,Y|YRX )Λ,Λ;y,y|(ylogp)Λ,L(Λ past
28. ML for parameter estimation
∑∑=
j n
RXj1,j1,-njn,Y|YRX )Λ,Λ;y,y|(ylogp)Λ,L(Λ past
∫
×
)xygδ(y
)Λ;y,,y,x|(yp
jn,jn,p jp,jn,
Rj1,j1,-njn,jn,YX,|Y past
−−= ∑ ∗
∏=
j
X
jn,jn,CN
nX,nX
)λ;0,(xf
)Λ;(p x
29. ML for parameter estimation
∑∑=
j n
RXj1,j1,-njn,Y|YRX )Λ,Λ;y,y|(ylogp)Λ,L(Λ past
∑∑
∑ −
∗
−
−−=
j n
X
jn,
2
p jp,njp,jn,X
jn,
λ
|ygy|
)log(λ
30. ML for parameter estimation
∑∑=
j n
RXj1,j1,-njn,Y|YRX )Λ,Λ;y,y|(ylogp)Λ,L(Λ past
∑∑
∑ −
∗
−
−−=
j n
X
jn,
2
p jp,njp,jn,X
jn,
λ
|ygy|
)log(λ
∑
∑ −
∗
−
=
n
X
jn,
2
p jp,njp,jn,
Λ
jR,
λ
|ygy|
argminΛ
jR,
ˆ
ˆ
If is knownX
jn,λˆ
34. Extensions
• Integration with source separation
• Integration with additive noise reduction
• Adaptive inverse filtering
– Using an RLS-like algorithm
• Application to music signals
– Using a clean source model accounting for strong
harmonic structures
• Exploiting prior knowledge on room properties
35. STFT domain
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Log mel-frequency feature domain
36. n : frame index
ny : corrupted log mel-frequency feature
(consisting of 24 coefficients)
nx : clean log mel-frequency feature
nxˆ : estimate of xn
Notations
38. STFT domain
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Log mel-frequency feature domain
46. STFT domain
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Clean speech model
Reverberation model
Posterior distribution
Parameter estimation
Log mel-frequency feature domain
48. Connected digit recognition
• 1024-component GMM for VTS
• Clean complex back-end defined in Aurora2
• Evaluation data set consisting of 4004
reverberant utterances
– Simulated data
– Impulse responses measured in a varechoic room
– Speaker-microphone distance = 3.5 m
– T60 = 0.2~0.6 sec
50. Concluding remarks
• Dereverberation can be performed in
different domains
• Reverberation model must accounts for
the strong statistical dependencies
between consecutive observation frames