Speech waves in tube and filters

Methods and algorithms of speech recognition
course

Lection 3

Nikolay V. Karpov

nkarpov(а)hse.ru

 Derive a theoretical model of how sound
waves are affected by the vocal tract
 Describe a model for lip radiation
 Describe a model for the pulsating glottal
waveform during voiced speech
 Assemble the components of a simple speech
synthesiser

We model the vocal tract as a tube that has p
segments.

 Ug and Ul are the volume flow of air at the glottis and lips
respectively.
 Vocal tract is of length L (typically 15-17 cm in adults)

Number of tube segments needed = 2L/cT≈0.001 fsamp

 Mass × Acceleration = Force
1 u p p u p
pV A x V A
A t x x t x
Adiabatic Gas Law
p 2 u
 A c
t x
This equations are known as the wave equations

 Solution: u ( x, t ) u (t x / c) u (t x / c)
c
p ( x, t ) u (t x / c) u (t x / c)
A
 It is easily verified that this solution satisfies the wave equations
for any differentiable functions u.
 The two functions u represent waves travelling in +ve and –ve
directions at velocity c. The actual values of the waves are
determined by the boundary conditions at the end of the tube
section

 Acoustic signal is the superposition of two waves: U
in the forward direction and V in the reverse direction

Assumptions:
 Sound waves are 1-dimensional: true for frequencies < 3-4 kHz
whose wavelengths are long compared to the tube width
 No frictional or wall-vibration energy losses

Time for sound to travel along segment = L/cp
L L
v(t ) x(t ) u (t ) w(t )
cp cp
Segment length chosen to correspond to half a sample period = 0.5cT 1
If we take z-transforms, this time delay corresponds to multiplying by z (t ) 2
1 1
2 2
V ( z) z (t ) X ( z ); U ( z) z (t ) W ( z)
1
In matrix form
U z 2
0 W 1
2
1 0 W
1 z (t ) 1
V 0 z 2 X 0 z X

c sin[ (l x) / c] The transfer function is given by:
p ( x, t ) j U G ( )e j t

A cos[ l / c]
U (l , ) 1
cos[ (l x) / c] U (0, ) cos( l / c)
u ( x, t ) U G ( )e j t
sin[ l / c]
This function has poles located
at every
(2n 1) c
2l
These correspond to the
frequencies at which the tube
becomes a quarter
wavelength 1 c c
l cT ;
2 2 Fs 4l

 Flow Continuity: U V W X
c c
 Pressure Continuity:
U V W X
A B
1 1 U 1 1 W
 In matrix form:
B B V A A X
 Hence: U B 1 1 1 W A B A B W
1 1
V 2B B 1 A A X 2B A B A B X

B A
 Define the reflection coefficient to be r
B A
U 1 A B A B W 1 1 r W
V 2B A B A B X 1 r r 1 X
 Reflection coefficients always lie in the range ±1

 Assume Vl = 0:
no sound
reflected back
into mouth
 Work
backwards from
lips towards
glottis:
◦ Junction: use
the reflection
matrix
◦ Tube segment:
use the delay
matrix

 A3 is large but not infinite: assumption of narrow tube
breaks down at this point
 A0 is approximately zero: area of glottis opening

 Multiplying out the matrices gives

Ug z 1
1 (r0 r1 r1r2 ) z 1 r0 r2 z 2
2 1 2
Ul
Vg r0 (r1 r0 r1r2 ) z r2 z
(1 rk )
k 0
 We can ignore Vg: it gets absorbed in the lungs.
 The vocal tract transfer function is given by the ratio of Ul to Ug
2
1
(1 rk )z
Ul k 0 Gz 1
1 2
Ug 1 (r0 r1 r1r2 ) z r0 r2 z 1 a1 z 1 a2 z 2

1 1
1 1 r 1
2
1 0 z 2 1 rz
z
1 r r 1 0 z1 1 r r z 1

 Multiplying together all the matrices for a p-segment vocal tract gives:
1
p 1
Ug z 2 p
1 rk z 1
p 1
Ul
Vg k 0 rk z rp
(1 rk )
k 0
 This results in a transfer function of the form:
1
p
2
Ul Gz
1 2 p
Ug 1 a1 z a2 z  ap z
 G is a gain term
1
p
 z is the acoustic time delay along the vocal tract
2
 The denominator represents a p-th order all-pole filter

 R(z) is the transfer function between airflow at the lips and pressure at
the microphone

 For a lip-opening area of A, acoustic theory predicts a 1st-order high-
pass response with a corner frequency of:
c
Hz 5kHz
4A
 For fsamp< 20 kHz, a good
approximation is:
S ( z) 1
R( z ) 1 z
U l ( z)
T
R( z ) 2 sin
2

 “LF Model” (Liljencrants & Fant)
e at sin(bt ) 0 t te
u ' g (t ) ft
c de te t 1

u g (0) u g (1) 0; u g (t ) and u ' g (t ) continuous at te

Line Spectrum of ug (approx –12 dB/octave):

 Larynx Frequency ≈130
Hz
 First Vocal tract
resonance (formant) ≈1
kHz

 There is not necessarily
any relation between the
larynx frequency and
the vocal tract
resonances.
 Resonances at a
multiple of the larynx
frequency will be louder
(good for singers)

This lecture reviews some well known facts about filters
and introduces some less known ones that will be
needed later on.
 Derive the power response of first order FIR and IIR
filters and relate this to the geometry of the pole-
zero diagram.
 Relate the bandwidth of a 2nd-order resonance to the
geometry of the pole-zero diagram.
 Describe the bandwidth expansion transformation of
a filter.
 Describe the effect of reversing the coefficients of a
filter.
 Derive expressions for the log frequency response
and its average value

y ( n) hk x(n k )
k
 System which is perform this transformation called linear digital filter
 y(n) – output, x(n) - input, hk - impulse response

 Transfer function
Y ( z)
H ( z) hk z k
H ( z) ; X ( z) x ( n) z n
k X ( z) n

Digital filter is a finite system
I L L
l
ai y (n i ) bl x(n l ) bl z L
i 0 l 0 l 0 l 0
(z i )
H ( z) I I
1 ai z i
i 0
(z i )
i 1

 A linear time-invariant system can be characterized
by a constant-coefficient difference equations
N M
y ( n) ak y ( n k ) bk x(n k )
k 1 k 0

Such systems can be implemented
as signal flow graphs:

 Stable filter ai 1; i 1 I
 Minimum phase filter

bl 1; l 1 L

M
1
y ( n) bk x(n k ) H ( z ) 1 az y ( n) x(n) ax(n 1)
k 0

 Filter has a single zero at z a re j
Frequency response of filter H (e j ) 1 ae j

Power response of filter H (e )
j 2
 H (e j ) H * (e j )
j j
(1 ae )(1 a * e )
 Example 1 r 2 2r cos( )
a 0.6 0.4 j
j 2
H (e ) 1.52 1.44 cos( 0.59 )

 We can calculate the log response of the filter
log( H (e j )) log(1 ae j
)
 If |a|<1 then ae j
1 and we can expand the log as a power
series using
d2 d3
log(1 d ) d  ;d 1
2 3 j an jn
log( H (e )) e
1 n
j 2 rn j
n
log H (e ) 2 cos(n( )); a re
n 1 n

First six terms in the
summation for:
a = 0.6 + 0.4j

If |a|>1, we can rearrange the formula in terms of a
1


log( H (e j )) log( ae j
(1 a 1e j )) log( ae j
) log(1 a 1e j )
Since a 1 we can expand the log as before to obtain
1

2 r n
log H (e j ) 2 log a 2 cos(n( )); a re j
n 1 n

The average of log( H (e ) ) is 2log a if |a|>1
j 2

The log response of an arbitrary filter is just the sum of
the log responses of each pole or zero. For a stable filter,
all the poles must be within the unit circle. Hence

1
H ( z) 1
y ( n) x(n) ay(n 1)
1 az
 Filter has a single pole at z a re j
 Power response of filter is given by
j 2 1
H (e )
1 r2 2r cos( )
2 2
H (e )j Peak (1 r )

 If the filter coefficients are real, any complex zeros or poles
will always occur in conjugate pairs.
 The response of the filter is the product of the responses of
the individual poles. Conjugate pole/zero pairs ensure a
symmetric response.
0.59 j j
Example: Poles at 0.6 0.4 j 0.72 e re
1 1
H ( z) 1
1 2r cos z r2z 2
1 1.2 z 1
0.52 z 2

j 2
H (e )

1
H ( z)
(1 az 1 )(1 a * z 1 )
1
H ( z)
1 az 1 1 a * z 1

But since |z|=1, we have
1 1
1 az z z a z a
This is just the distance between z and a.

The magnitude response of the filter at a frequency ω is
proportional to the product of the distance from the point e
j

to all the zeros divided by the product of the distance to all
the poles .The constant of proportionality is 0
L
0 l 0
(1 i z 1) 0
H ( z) I
0
i 0
(1 i z 1)

 The bandwidth of a resonance peak is the frequency
range at which the magnitude response has
decreased by √2.
 For poles near the unit circle this is approximately
2(1–r)rad/s = (1–r)/πHz (normalised).

2 1 2(1 r )

If we have a filter We can form a new filter by
multiplying coefficients ai and bi by
L ki for some k< 1. L
l l
bl z l bl k z
l 0
H ( z) l 0 G( z) H (z / k) I
I
1 ai z i 1 ai k i z i

i 1 i 1
If H(z)has a pole/zero at z0, then G(z)will have one at kz0.
All poles and zero will be moved inwards by a factor k.
If the bandwidth of a pole of H(z) is b=2(1–r), then the bandwidth of
the corresponding pole in G(z) will be expanded to:
k 0.95 2(1 kr) b 2r (1 k )

 If we have a filter
G( z) b* b* 1z
p p
1 *
 b0 z p
z p H *(z * 1)
 We can form a new filter by conjugating the coefficients
and putting them in reverse order:
 If z0 is a zero of H(z)then z0*–1 is a zero of G(z). This is
called a reflectionin the unit circle.
 The frequency response of G(z) is given by:
G (e j ) e jp
H * (e j )
 Hence G(z) has the same magnitude response as H(z) but a
different phase response
Arg G (e j ) Arg H (e j ) p
G (e j ) H (e j )

Speech waves in tube and filters

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Speech waves in tube and filters

Similar to Speech waves in tube and filters (20)

More from Nikolay Karpov

More from Nikolay Karpov (8)

Recently uploaded

Recently uploaded (20)

Speech waves in tube and filters