This lecture discusses properties of digital filters. It defines FIR and IIR filters and relates their power responses to pole-zero diagrams. A 2nd-order resonance's bandwidth is related to pole positions. Bandwidth expansion transforms filters by moving poles inward. Reversing coefficients conjugates a filter's frequency response. Expressions are derived for logarithmic frequency responses and their averages.
1. Methods and algorithms of speech recognition
course
Lection 3
Nikolay V. Karpov
nkarpov(а)hse.ru
2. Derive a theoretical model of how sound
waves are affected by the vocal tract
Describe a model for lip radiation
Describe a model for the pulsating glottal
waveform during voiced speech
Assemble the components of a simple speech
synthesiser
3. We model the vocal tract as a tube that has p
segments.
Ug and Ul are the volume flow of air at the glottis and lips
respectively.
Vocal tract is of length L (typically 15-17 cm in adults)
Number of tube segments needed = 2L/cT≈0.001 fsamp
4. Mass × Acceleration = Force
1 u p p u p
pV A x V A
A t x x t x
Adiabatic Gas Law
p 2 u
A c
t x
This equations are known as the wave equations
Solution: u ( x, t ) u (t x / c) u (t x / c)
c
p ( x, t ) u (t x / c) u (t x / c)
A
It is easily verified that this solution satisfies the wave equations
for any differentiable functions u.
The two functions u represent waves travelling in +ve and –ve
directions at velocity c. The actual values of the waves are
determined by the boundary conditions at the end of the tube
section
5. Acoustic signal is the superposition of two waves: U
in the forward direction and V in the reverse direction
Assumptions:
Sound waves are 1-dimensional: true for frequencies < 3-4 kHz
whose wavelengths are long compared to the tube width
No frictional or wall-vibration energy losses
6. Time for sound to travel along segment = L/cp
L L
v(t ) x(t ) u (t ) w(t )
cp cp
Segment length chosen to correspond to half a sample period = 0.5cT 1
If we take z-transforms, this time delay corresponds to multiplying by z (t ) 2
1 1
2 2
V ( z) z (t ) X ( z ); U ( z) z (t ) W ( z)
1
In matrix form
U z 2
0 W 1
2
1 0 W
1 z (t ) 1
V 0 z 2 X 0 z X
7. c sin[ (l x) / c] The transfer function is given by:
p ( x, t ) j U G ( )e j t
A cos[ l / c]
U (l , ) 1
cos[ (l x) / c] U (0, ) cos( l / c)
u ( x, t ) U G ( )e j t
sin[ l / c]
This function has poles located
at every
(2n 1) c
2l
These correspond to the
frequencies at which the tube
becomes a quarter
wavelength 1 c c
l cT ;
2 2 Fs 4l
8. Flow Continuity: U V W X
c c
Pressure Continuity:
U V W X
A B
1 1 U 1 1 W
In matrix form:
B B V A A X
Hence: U B 1 1 1 W A B A B W
1 1
V 2B B 1 A A X 2B A B A B X
9. B A
Define the reflection coefficient to be r
B A
U 1 A B A B W 1 1 r W
V 2B A B A B X 1 r r 1 X
Reflection coefficients always lie in the range ±1
10. Assume Vl = 0:
no sound
reflected back
into mouth
Work
backwards from
lips towards
glottis:
◦ Junction: use
the reflection
matrix
◦ Tube segment:
use the delay
matrix
A3 is large but not infinite: assumption of narrow tube
breaks down at this point
A0 is approximately zero: area of glottis opening
11. Multiplying out the matrices gives
Ug z 1
1 (r0 r1 r1r2 ) z 1 r0 r2 z 2
2 1 2
Ul
Vg r0 (r1 r0 r1r2 ) z r2 z
(1 rk )
k 0
We can ignore Vg: it gets absorbed in the lungs.
The vocal tract transfer function is given by the ratio of Ul to Ug
2
1
(1 rk )z
Ul k 0 Gz 1
1 2
Ug 1 (r0 r1 r1r2 ) z r0 r2 z 1 a1 z 1 a2 z 2
12.
13. 1 1
1 1 r 1
2
1 0 z 2 1 rz
z
1 r r 1 0 z1 1 r r z 1
Multiplying together all the matrices for a p-segment vocal tract gives:
1
p 1
Ug z 2 p
1 rk z 1
p 1
Ul
Vg k 0 rk z rp
(1 rk )
k 0
This results in a transfer function of the form:
1
p
2
Ul Gz
1 2 p
Ug 1 a1 z a2 z ap z
G is a gain term
1
p
z is the acoustic time delay along the vocal tract
2
The denominator represents a p-th order all-pole filter
14. R(z) is the transfer function between airflow at the lips and pressure at
the microphone
For a lip-opening area of A, acoustic theory predicts a 1st-order high-
pass response with a corner frequency of:
c
Hz 5kHz
4A
For fsamp< 20 kHz, a good
approximation is:
S ( z) 1
R( z ) 1 z
U l ( z)
T
R( z ) 2 sin
2
15. “LF Model” (Liljencrants & Fant)
e at sin(bt ) 0 t te
u ' g (t ) ft
c de te t 1
u g (0) u g (1) 0; u g (t ) and u ' g (t ) continuous at te
Line Spectrum of ug (approx –12 dB/octave):
16. Larynx Frequency ≈130
Hz
First Vocal tract
resonance (formant) ≈1
kHz
There is not necessarily
any relation between the
larynx frequency and
the vocal tract
resonances.
Resonances at a
multiple of the larynx
frequency will be louder
(good for singers)
17.
18. This lecture reviews some well known facts about filters
and introduces some less known ones that will be
needed later on.
Derive the power response of first order FIR and IIR
filters and relate this to the geometry of the pole-
zero diagram.
Relate the bandwidth of a 2nd-order resonance to the
geometry of the pole-zero diagram.
Describe the bandwidth expansion transformation of
a filter.
Describe the effect of reversing the coefficients of a
filter.
Derive expressions for the log frequency response
and its average value
19. y ( n) hk x(n k )
k
System which is perform this transformation called linear digital filter
y(n) – output, x(n) - input, hk - impulse response
Transfer function
Y ( z)
H ( z) hk z k
H ( z) ; X ( z) x ( n) z n
k X ( z) n
Digital filter is a finite system
I L L
l
ai y (n i ) bl x(n l ) bl z L
i 0 l 0 l 0 l 0
(z i )
H ( z) I I
1 ai z i
i 0
(z i )
i 1
20. A linear time-invariant system can be characterized
by a constant-coefficient difference equations
N M
y ( n) ak y ( n k ) bk x(n k )
k 1 k 0
Such systems can be implemented
as signal flow graphs:
Stable filter ai 1; i 1 I
Minimum phase filter
bl 1; l 1 L
21. M
1
y ( n) bk x(n k ) H ( z ) 1 az y ( n) x(n) ax(n 1)
k 0
Filter has a single zero at z a re j
Frequency response of filter H (e j ) 1 ae j
Power response of filter H (e )
j 2
H (e j ) H * (e j )
j j
(1 ae )(1 a * e )
Example 1 r 2 2r cos( )
a 0.6 0.4 j
j 2
H (e ) 1.52 1.44 cos( 0.59 )
22. We can calculate the log response of the filter
log( H (e j )) log(1 ae j
)
If |a|<1 then ae j
1 and we can expand the log as a power
series using
d2 d3
log(1 d ) d ;d 1
2 3 j an jn
log( H (e )) e
1 n
j 2 rn j
n
log H (e ) 2 cos(n( )); a re
n 1 n
First six terms in the
summation for:
a = 0.6 + 0.4j
23. If |a|>1, we can rearrange the formula in terms of a
1
log( H (e j )) log( ae j
(1 a 1e j )) log( ae j
) log(1 a 1e j )
Since a 1 we can expand the log as before to obtain
1
2 r n
log H (e j ) 2 log a 2 cos(n( )); a re j
n 1 n
The average of log( H (e ) ) is 2log a if |a|>1
j 2
The log response of an arbitrary filter is just the sum of
the log responses of each pole or zero. For a stable filter,
all the poles must be within the unit circle. Hence
24. 1
H ( z) 1
y ( n) x(n) ay(n 1)
1 az
Filter has a single pole at z a re j
Power response of filter is given by
j 2 1
H (e )
1 r2 2r cos( )
2 2
H (e )j Peak (1 r )
25. If the filter coefficients are real, any complex zeros or poles
will always occur in conjugate pairs.
The response of the filter is the product of the responses of
the individual poles. Conjugate pole/zero pairs ensure a
symmetric response.
0.59 j j
Example: Poles at 0.6 0.4 j 0.72 e re
1 1
H ( z) 1
1 2r cos z r2z 2
1 1.2 z 1
0.52 z 2
j 2
H (e )
26. 1
H ( z)
(1 az 1 )(1 a * z 1 )
1
H ( z)
1 az 1 1 a * z 1
But since |z|=1, we have
1 1
1 az z z a z a
This is just the distance between z and a.
The magnitude response of the filter at a frequency ω is
proportional to the product of the distance from the point e
j
to all the zeros divided by the product of the distance to all
the poles .The constant of proportionality is 0
L
0 l 0
(1 i z 1) 0
H ( z) I
0
i 0
(1 i z 1)
27. The bandwidth of a resonance peak is the frequency
range at which the magnitude response has
decreased by √2.
For poles near the unit circle this is approximately
2(1–r)rad/s = (1–r)/πHz (normalised).
2 1 2(1 r )
28. If we have a filter We can form a new filter by
multiplying coefficients ai and bi by
L ki for some k< 1. L
l l
bl z l bl k z
l 0
H ( z) l 0 G( z) H (z / k) I
I
1 ai z i 1 ai k i z i
i 1 i 1
If H(z)has a pole/zero at z0, then G(z)will have one at kz0.
All poles and zero will be moved inwards by a factor k.
If the bandwidth of a pole of H(z) is b=2(1–r), then the bandwidth of
the corresponding pole in G(z) will be expanded to:
k 0.95 2(1 kr) b 2r (1 k )
29. If we have a filter
G( z) b* b* 1z
p p
1 *
b0 z p
z p H *(z * 1)
We can form a new filter by conjugating the coefficients
and putting them in reverse order:
If z0 is a zero of H(z)then z0*–1 is a zero of G(z). This is
called a reflectionin the unit circle.
The frequency response of G(z) is given by:
G (e j ) e jp
H * (e j )
Hence G(z) has the same magnitude response as H(z) but a
different phase response
Arg G (e j ) Arg H (e j ) p
G (e j ) H (e j )