SlideShare a Scribd company logo
1 of 29
Methods and algorithms of speech recognition
                   course

                 Lection 3

             Nikolay V. Karpov

              nkarpov(а)hse.ru
   Derive a theoretical model of how sound
    waves are affected by the vocal tract
   Describe a model for lip radiation
   Describe a model for the pulsating glottal
    waveform during voiced speech
   Assemble the components of a simple speech
    synthesiser
We model the vocal tract as a tube that has p
    segments.




   Ug and Ul are the volume flow of air at the glottis and lips
    respectively.
   Vocal tract is of length L (typically 15-17 cm in adults)

Number of tube segments needed = 2L/cT≈0.001 fsamp
   Mass × Acceleration = Force
         1 u             p         p              u     p
      pV             A x         V                    A
         A t             x         x              t     x
    Adiabatic Gas Law
                             p            2   u
                          A          c
                             t                x
This equations are known as the wave equations

   Solution:    u ( x, t ) u (t x / c) u (t x / c)
                              c
                 p ( x, t )     u (t x / c) u (t x / c)
                             A
   It is easily verified that this solution satisfies the wave equations
    for any differentiable functions u.
   The two functions u represent waves travelling in +ve and –ve
    directions at velocity c. The actual values of the waves are
    determined by the boundary conditions at the end of the tube
    section
   Acoustic signal is the superposition of two waves: U
     in the forward direction and V in the reverse direction




Assumptions:
 Sound waves are 1-dimensional: true for frequencies < 3-4 kHz
  whose wavelengths are long compared to the tube width
 No frictional or wall-vibration energy losses
Time for sound to travel along segment = L/cp
                  L                                         L
    v(t )   x(t      )      u (t )              w(t            )
                  cp                                        cp
Segment length chosen to correspond to half a sample period = 0.5cT                                               1
If we take z-transforms, this time delay corresponds to multiplying by                                   z (t )       2
                                        1                                           1
                                            2                                           2
                  V ( z)   z (t )               X ( z );      U ( z)   z (t )               W ( z)
                                1
 In matrix form
                   U        z       2
                                                    0       W               1
                                                                                2
                                                                                    1          0     W
                                                    1              z (t )                       1
                   V            0               z       2   X                       0 z              X
c sin[ (l x) / c]                  The transfer function is given by:
p ( x, t ) j                    U G ( )e j   t

             A cos[ l / c]
                                                  U (l , )          1
           cos[ (l x) / c]                        U (0, )      cos( l / c)
u ( x, t )                 U G ( )e j t
             sin[ l / c]
 This function has poles located
 at every
           (2n 1) c
              2l
 These correspond to the
 frequencies at which the tube
 becomes a quarter
 wavelength           1        c         c
                  l     cT           ;
                      2       2 Fs       4l
   Flow Continuity:          U V       W       X
                              c              c
   Pressure Continuity:
                                U V            W X
                             A              B
                            1    1 U          1  1 W
   In matrix form:
                         B B V     A                A   X
   Hence: U           B 1 1   1 W                      A B   A B W
                   1                                 1
            V     2B       B 1 A    A       X       2B A B    A B   X
B A
       Define the reflection coefficient to be r
                                                      B A
        U     1 A B     A B W       1    1    r W
         V   2B A B     A B   X    1 r    r   1   X
       Reflection coefficients always lie in the range ±1
   Assume Vl = 0:
    no sound
    reflected back
    into mouth
   Work
    backwards from
    lips towards
    glottis:
    ◦ Junction: use
      the reflection
      matrix
    ◦ Tube segment:
      use the delay
      matrix

   A3 is large but not infinite: assumption of narrow tube
    breaks down at this point
   A0 is approximately zero: area of glottis opening
   Multiplying out the matrices gives

         Ug              z   1
                                    1 (r0 r1 r1r2 ) z 1 r0 r2 z 2
                     2                                  1        2
                                                                   Ul
          Vg                         r0 (r1 r0 r1r2 ) z     r2 z
                         (1 rk )
                  k 0
   We can ignore Vg: it gets absorbed in the lungs.
   The vocal tract transfer function is given by the ratio of Ul to Ug
                                    2
                                                    1
                                         (1 rk )z
                Ul                 k 0                                 Gz 1
                                               1              2
                Ug       1 (r0 r1 r1r2 ) z          r0 r2 z       1 a1 z 1 a2 z   2
1                            1
              1         1           r        1
                                                 2
                                                       1 0          z 2              1        rz
                                      z
            1 r          r         1                   0 z1         1 r              r        z   1

   Multiplying together all the matrices for a p-segment vocal tract gives:
                                       1
                                         p                                       1
               Ug                  z   2              p
                                                              1         rk z             1
                              p                                              1
                                                                                              Ul
               Vg                                    k 0      rk        z                rp
                                   (1 rk )
                             k 0
   This results in a transfer function of the form:
                                                     1
                                                       p
                                                     2
          Ul                              Gz
                                   1             2                  p
          Ug        1 a1 z               a2 z               ap z
                       G is a gain term
                             1
                               p
                       z    is the acoustic time delay along the vocal tract
                             2
                       The denominator represents a p-th order all-pole filter
   R(z) is the transfer function between airflow at the lips and pressure at
    the microphone



   For a lip-opening area of A, acoustic theory predicts a 1st-order high-
    pass response with a corner frequency of:
        c
           Hz      5kHz
        4A
   For fsamp< 20 kHz, a good
    approximation is:
            S ( z)            1
     R( z )          1 z
            U l ( z)
                          T
         R( z )   2 sin
                          2
    “LF Model” (Liljencrants & Fant)
                       e at sin(bt ) 0 t         te
        u ' g (t )              ft
                        c de         te    t 1

    u g (0) u g (1) 0; u g (t )           and     u ' g (t )   continuous   at   te




                     Line Spectrum of      ug   (approx –12 dB/octave):
   Larynx Frequency ≈130
    Hz
   First Vocal tract
    resonance (formant) ≈1
    kHz


   There is not necessarily
    any relation between the
    larynx frequency and
    the vocal tract
    resonances.
   Resonances at a
    multiple of the larynx
    frequency will be louder
    (good for singers)
This lecture reviews some well known facts about filters
and introduces some less known ones that will be
needed later on.
 Derive the power response of first order FIR and IIR
  filters and relate this to the geometry of the pole-
  zero diagram.
 Relate the bandwidth of a 2nd-order resonance to the
  geometry of the pole-zero diagram.
 Describe the bandwidth expansion transformation of
  a filter.
 Describe the effect of reversing the coefficients of a
  filter.
 Derive expressions for the log frequency response
  and its average value
y ( n)            hk x(n k )
              k
   System which is perform this transformation called linear digital filter
   y(n) – output, x(n) - input, hk - impulse response

   Transfer function
                                                    Y ( z)
    H ( z)             hk z    k
                                           H ( z)          ; X ( z)                    x ( n) z n
                  k                                 X ( z)                     n

     Digital filter is a finite system
     I                        L                               L
                                                                           l
          ai y (n i )               bl x(n l )                      bl z                     L
    i 0                       l 0                             l 0                            l 0
                                                                                                    (z   i   )
                                                 H ( z)             I                        I
                                                          1             ai z       i
                                                                                             i 0
                                                                                                    (z   i   )
                                                                  i 1
   A linear time-invariant system can be characterized
    by a constant-coefficient difference equations
               N                    M
    y ( n)          ak y ( n k )         bk x(n k )
              k 1                  k 0

    Such systems can be implemented
    as signal flow graphs:




   Stable filter     ai    1;     i 1 I
   Minimum phase filter

                       bl   1;     l 1 L
M
                                               1
y ( n)             bk x(n k )   H ( z ) 1 az       y ( n)    x(n) ax(n 1)
             k 0

   Filter has a single zero at                     z a re j
    Frequency response of filter                    H (e j ) 1 ae           j

    Power response of filter H (e )
                                 j  2
                                                           H (e j ) H * (e j )
                                                       j                j
                                               (1 ae        )(1 a * e       )
   Example                                    1 r 2 2r cos(            )
a   0.6 0.4 j
         j   2
H (e )             1.52 1.44 cos(   0.59 )
   We can calculate the log response of the filter
           log( H (e j ))             log(1 ae     j
                                                       )
   If |a|<1 then       ae   j
                                      1   and we can expand the log as a power
    series using
                                 d2       d3
    log(1 d )            d                    ;d          1
                                 2        3                             j         an     jn
                                                           log( H (e ))              e
                                                                                1 n
              j     2                  rn                               j
                                                                            n
    log H (e )                   2        cos(n(           )); a   re
                                   n 1 n


                         First six terms in the
                         summation for:
                         a = 0.6 + 0.4j
If |a|>1, we can rearrange the formula in terms of a
                                                                        1
  


log( H (e j ))       log( ae   j
                                   (1 a 1e j ))   log( ae   j
                                                                ) log(1 a 1e j )
      Since a  1 we can expand the log as before to obtain
                 1

             2                r n
log H (e j )    2 log a 2         cos(n(   )); a re j
                           n 1 n

   The average of log( H (e ) ) is 2log a if |a|>1
                           j  2



      The log response of an arbitrary filter is just the sum of
      the log responses of each pole or zero. For a stable filter,
      all the poles must be within the unit circle. Hence
1
 H ( z)        1
                       y ( n)   x(n) ay(n 1)
        1 az
     Filter has a single pole at z a re j
     Power response of filter is given by
                   j   2             1
              H (e )
                            1 r2   2r cos(      )
          2                                                  2
H (e )j                                      Peak   (1 r )
   If the filter coefficients are real, any complex zeros or poles
    will always occur in conjugate pairs.
   The response of the filter is the product of the responses of
    the individual poles. Conjugate pole/zero pairs ensure a
    symmetric response.
                                               0.59 j            j
    Example: Poles at   0.6 0.4 j     0.72 e                re
                      1                                 1
       H ( z)               1
              1 2r cos z        r2z   2
                                          1 1.2 z       1
                                                             0.52 z   2



         j   2
    H (e )
1
 H ( z)
        (1 az 1 )(1 a * z 1 )
                 1
 H ( z)
         1 az 1 1 a * z 1

But since |z|=1, we have
                                        1       1
                                1 az        z       z a       z a
This is just the distance between z and a.

The magnitude response of the filter at a frequency ω is
proportional to the product of the distance from the point e
                                                               j

to all the zeros divided by the product of the distance to all
the poles .The constant of proportionality is 0
                                 L
                           0     l   0
                                       (1   i z 1)        0
                  H ( z)         I
                           0
                                 i   0
                                       (1   i z 1)
   The bandwidth of a resonance peak is the frequency
    range at which the magnitude response has
    decreased by √2.
   For poles near the unit circle this is approximately
    2(1–r)rad/s = (1–r)/πHz (normalised).




                                   2   1   2(1 r )
If we have a filter                    We can form a new filter by
                                       multiplying coefficients ai and bi by
              L                        ki for some k< 1.              L
                                                                            l           l
                    bl z   l                                                 bl k z
                                                                       l 0
H ( z)        l 0                             G( z)    H (z / k)             I
                    I
          1             ai z   i                                   1             ai k i z   i

                  i 1                                                    i 1
If H(z)has a pole/zero at z0, then G(z)will have one at kz0.
All poles and zero will be moved inwards by a factor k.
If the bandwidth of a pole of H(z) is b=2(1–r), then the bandwidth of
the corresponding pole in G(z) will be expanded to:
                                   k   0.95                  2(1 kr) b 2r (1 k )
   If we have a filter
     G( z) b* b* 1z
            p  p
                             1      *
                                  b0 z      p
                                                 z p H *(z * 1)
   We can form a new filter by conjugating the coefficients
    and putting them in reverse order:
   If z0 is a zero of H(z)then z0*–1 is a zero of G(z). This is
    called a reflectionin the unit circle.
   The frequency response of G(z) is given by:
       G (e j )   e    jp
                            H * (e j )
   Hence G(z) has the same magnitude response as H(z) but a
    different phase response
Arg G (e j )          Arg H (e j )       p
G (e j )    H (e j )

More Related Content

What's hot

Electromagnetic Wave
Electromagnetic WaveElectromagnetic Wave
Electromagnetic WaveYong Heui Cho
 
The Dynamics and Control of Axial Satellite Gyrostats of Variable Structure
The Dynamics and Control of Axial Satellite Gyrostats of Variable StructureThe Dynamics and Control of Axial Satellite Gyrostats of Variable Structure
The Dynamics and Control of Axial Satellite Gyrostats of Variable StructureTheoretical mechanics department
 
TWO-POINT STATISTIC OF POLARIMETRIC SAR DATA TWO-POINT STATISTIC OF POLARIMET...
TWO-POINT STATISTIC OF POLARIMETRIC SAR DATA TWO-POINT STATISTIC OF POLARIMET...TWO-POINT STATISTIC OF POLARIMETRIC SAR DATA TWO-POINT STATISTIC OF POLARIMET...
TWO-POINT STATISTIC OF POLARIMETRIC SAR DATA TWO-POINT STATISTIC OF POLARIMET...grssieee
 
Further discriminatory signature of inflation
Further discriminatory signature of inflationFurther discriminatory signature of inflation
Further discriminatory signature of inflationLaila A
 
Adiabatic Theorem for Discrete Time Evolution
Adiabatic Theorem for Discrete Time EvolutionAdiabatic Theorem for Discrete Time Evolution
Adiabatic Theorem for Discrete Time Evolutiontanaka-atushi
 
Quantum Transitions And Its Evolution For Systems With Canonical And Noncanon...
Quantum Transitions And Its Evolution For Systems With Canonical And Noncanon...Quantum Transitions And Its Evolution For Systems With Canonical And Noncanon...
Quantum Transitions And Its Evolution For Systems With Canonical And Noncanon...vcuesta
 
An Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelAn Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelShih-Hsiang Lin
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov ModelsVu Pham
 
Doering Savov
Doering SavovDoering Savov
Doering Savovgh
 
Transverse vibration of slender sandwich beams with viscoelastic inner layer ...
Transverse vibration of slender sandwich beams with viscoelastic inner layer ...Transverse vibration of slender sandwich beams with viscoelastic inner layer ...
Transverse vibration of slender sandwich beams with viscoelastic inner layer ...Evangelos Ntotsios
 

What's hot (20)

Electromagnetic Wave
Electromagnetic WaveElectromagnetic Wave
Electromagnetic Wave
 
The Dynamics and Control of Axial Satellite Gyrostats of Variable Structure
The Dynamics and Control of Axial Satellite Gyrostats of Variable StructureThe Dynamics and Control of Axial Satellite Gyrostats of Variable Structure
The Dynamics and Control of Axial Satellite Gyrostats of Variable Structure
 
Attitude Dynamics of Re-entry Vehicle
Attitude Dynamics of Re-entry VehicleAttitude Dynamics of Re-entry Vehicle
Attitude Dynamics of Re-entry Vehicle
 
TWO-POINT STATISTIC OF POLARIMETRIC SAR DATA TWO-POINT STATISTIC OF POLARIMET...
TWO-POINT STATISTIC OF POLARIMETRIC SAR DATA TWO-POINT STATISTIC OF POLARIMET...TWO-POINT STATISTIC OF POLARIMETRIC SAR DATA TWO-POINT STATISTIC OF POLARIMET...
TWO-POINT STATISTIC OF POLARIMETRIC SAR DATA TWO-POINT STATISTIC OF POLARIMET...
 
Electromagnetics
ElectromagneticsElectromagnetics
Electromagnetics
 
Smith Chart
Smith ChartSmith Chart
Smith Chart
 
Anschp30
Anschp30Anschp30
Anschp30
 
Further discriminatory signature of inflation
Further discriminatory signature of inflationFurther discriminatory signature of inflation
Further discriminatory signature of inflation
 
Transmission Line
Transmission LineTransmission Line
Transmission Line
 
Adiabatic Theorem for Discrete Time Evolution
Adiabatic Theorem for Discrete Time EvolutionAdiabatic Theorem for Discrete Time Evolution
Adiabatic Theorem for Discrete Time Evolution
 
Anschp39
Anschp39Anschp39
Anschp39
 
Quantum Transitions And Its Evolution For Systems With Canonical And Noncanon...
Quantum Transitions And Its Evolution For Systems With Canonical And Noncanon...Quantum Transitions And Its Evolution For Systems With Canonical And Noncanon...
Quantum Transitions And Its Evolution For Systems With Canonical And Noncanon...
 
An Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelAn Introduction to Hidden Markov Model
An Introduction to Hidden Markov Model
 
Hmm viterbi
Hmm viterbiHmm viterbi
Hmm viterbi
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
Anschp21
Anschp21Anschp21
Anschp21
 
Anschp35
Anschp35Anschp35
Anschp35
 
Anschp29
Anschp29Anschp29
Anschp29
 
Doering Savov
Doering SavovDoering Savov
Doering Savov
 
Transverse vibration of slender sandwich beams with viscoelastic inner layer ...
Transverse vibration of slender sandwich beams with viscoelastic inner layer ...Transverse vibration of slender sandwich beams with viscoelastic inner layer ...
Transverse vibration of slender sandwich beams with viscoelastic inner layer ...
 

Similar to Speech waves in tube and filters

Notes 2 5317-6351 Transmission Lines Part 1 (TL Theory).pptx
Notes 2 5317-6351 Transmission Lines Part 1 (TL Theory).pptxNotes 2 5317-6351 Transmission Lines Part 1 (TL Theory).pptx
Notes 2 5317-6351 Transmission Lines Part 1 (TL Theory).pptxDibyadipRoy1
 
Transmission Lines Part 1 (TL Theory).pptx
Transmission Lines Part 1 (TL Theory).pptxTransmission Lines Part 1 (TL Theory).pptx
Transmission Lines Part 1 (TL Theory).pptxRituparna Mitra
 
Ee443 phase locked loop - presentation - schwappach and brandy
Ee443   phase locked loop - presentation - schwappach and brandyEe443   phase locked loop - presentation - schwappach and brandy
Ee443 phase locked loop - presentation - schwappach and brandyLoren Schwappach
 
Waveguiding Structures Part 1 (General Theory).pptx
Waveguiding Structures Part 1 (General Theory).pptxWaveguiding Structures Part 1 (General Theory).pptx
Waveguiding Structures Part 1 (General Theory).pptxPawanKumar391848
 
N. Bilic - Supersymmetric Dark Energy
N. Bilic - Supersymmetric Dark EnergyN. Bilic - Supersymmetric Dark Energy
N. Bilic - Supersymmetric Dark EnergySEENET-MTP
 
2. Power Computations and Analysis Techniques_verstud.pdf
2. Power Computations and Analysis Techniques_verstud.pdf2. Power Computations and Analysis Techniques_verstud.pdf
2. Power Computations and Analysis Techniques_verstud.pdfLIEWHUIFANGUNIMAP
 
Slide of computer networks introduction to computer networks
Slide of computer networks introduction to computer networksSlide of computer networks introduction to computer networks
Slide of computer networks introduction to computer networksMohammedAbbas653737
 
Tele3113 wk6wed
Tele3113 wk6wedTele3113 wk6wed
Tele3113 wk6wedVin Voro
 
95414579 flip-flop
95414579 flip-flop95414579 flip-flop
95414579 flip-flopKyawthu Koko
 
tripple e 136 EMII2013_Chapter_10_P2.pdf
tripple e 136 EMII2013_Chapter_10_P2.pdftripple e 136 EMII2013_Chapter_10_P2.pdf
tripple e 136 EMII2013_Chapter_10_P2.pdfAbdulgafforBaimpal1
 
An introduction to discrete wavelet transforms
An introduction to discrete wavelet transformsAn introduction to discrete wavelet transforms
An introduction to discrete wavelet transformsLily Rose
 

Similar to Speech waves in tube and filters (20)

Notes 2 5317-6351 Transmission Lines Part 1 (TL Theory).pptx
Notes 2 5317-6351 Transmission Lines Part 1 (TL Theory).pptxNotes 2 5317-6351 Transmission Lines Part 1 (TL Theory).pptx
Notes 2 5317-6351 Transmission Lines Part 1 (TL Theory).pptx
 
Transmission Lines Part 1 (TL Theory).pptx
Transmission Lines Part 1 (TL Theory).pptxTransmission Lines Part 1 (TL Theory).pptx
Transmission Lines Part 1 (TL Theory).pptx
 
1 d wave equation
1 d wave equation1 d wave equation
1 d wave equation
 
emtl
emtlemtl
emtl
 
Ee443 phase locked loop - presentation - schwappach and brandy
Ee443   phase locked loop - presentation - schwappach and brandyEe443   phase locked loop - presentation - schwappach and brandy
Ee443 phase locked loop - presentation - schwappach and brandy
 
Waveguiding Structures Part 1 (General Theory).pptx
Waveguiding Structures Part 1 (General Theory).pptxWaveguiding Structures Part 1 (General Theory).pptx
Waveguiding Structures Part 1 (General Theory).pptx
 
N. Bilic - Supersymmetric Dark Energy
N. Bilic - Supersymmetric Dark EnergyN. Bilic - Supersymmetric Dark Energy
N. Bilic - Supersymmetric Dark Energy
 
5.pdf
5.pdf5.pdf
5.pdf
 
Laser lecture 02
Laser lecture 02Laser lecture 02
Laser lecture 02
 
Cepstral coefficients
Cepstral coefficientsCepstral coefficients
Cepstral coefficients
 
2. Power Computations and Analysis Techniques_verstud.pdf
2. Power Computations and Analysis Techniques_verstud.pdf2. Power Computations and Analysis Techniques_verstud.pdf
2. Power Computations and Analysis Techniques_verstud.pdf
 
Slide of computer networks introduction to computer networks
Slide of computer networks introduction to computer networksSlide of computer networks introduction to computer networks
Slide of computer networks introduction to computer networks
 
Ch16 ssm
Ch16 ssmCh16 ssm
Ch16 ssm
 
Tele3113 wk6wed
Tele3113 wk6wedTele3113 wk6wed
Tele3113 wk6wed
 
Quantum Hw 15
Quantum Hw 15Quantum Hw 15
Quantum Hw 15
 
95414579 flip-flop
95414579 flip-flop95414579 flip-flop
95414579 flip-flop
 
Hydrogen atom
Hydrogen atomHydrogen atom
Hydrogen atom
 
tripple e 136 EMII2013_Chapter_10_P2.pdf
tripple e 136 EMII2013_Chapter_10_P2.pdftripple e 136 EMII2013_Chapter_10_P2.pdf
tripple e 136 EMII2013_Chapter_10_P2.pdf
 
UNIT I.ppt
UNIT I.pptUNIT I.ppt
UNIT I.ppt
 
An introduction to discrete wavelet transforms
An introduction to discrete wavelet transformsAn introduction to discrete wavelet transforms
An introduction to discrete wavelet transforms
 

More from Nikolay Karpov

Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speechNikolay Karpov
 
Идентификация уровня сложности текста и его адаптация
Идентификация уровня сложности текста и его адаптацияИдентификация уровня сложности текста и его адаптация
Идентификация уровня сложности текста и его адаптацияNikolay Karpov
 
Идентификация уровня ложности текста и его адаптация
Идентификация уровня ложности текста и его адаптацияИдентификация уровня ложности текста и его адаптация
Идентификация уровня ложности текста и его адаптацияNikolay Karpov
 
Теория и практика обработки естественного языка
Теория и практика обработки естественного языкаТеория и практика обработки естественного языка
Теория и практика обработки естественного языкаNikolay Karpov
 
Speech signal time frequency representation
Speech signal time frequency representationSpeech signal time frequency representation
Speech signal time frequency representationNikolay Karpov
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speechNikolay Karpov
 

More from Nikolay Karpov (8)

Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
 
Идентификация уровня сложности текста и его адаптация
Идентификация уровня сложности текста и его адаптацияИдентификация уровня сложности текста и его адаптация
Идентификация уровня сложности текста и его адаптация
 
Идентификация уровня ложности текста и его адаптация
Идентификация уровня ложности текста и его адаптацияИдентификация уровня ложности текста и его адаптация
Идентификация уровня ложности текста и его адаптация
 
Теория и практика обработки естественного языка
Теория и практика обработки естественного языкаТеория и практика обработки естественного языка
Теория и практика обработки естественного языка
 
Linear prediction
Linear predictionLinear prediction
Linear prediction
 
Speech signal time frequency representation
Speech signal time frequency representationSpeech signal time frequency representation
Speech signal time frequency representation
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
 
Tagger numbers
Tagger numbersTagger numbers
Tagger numbers
 

Recently uploaded

Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 

Recently uploaded (20)

Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 

Speech waves in tube and filters

  • 1. Methods and algorithms of speech recognition course Lection 3 Nikolay V. Karpov nkarpov(а)hse.ru
  • 2. Derive a theoretical model of how sound waves are affected by the vocal tract  Describe a model for lip radiation  Describe a model for the pulsating glottal waveform during voiced speech  Assemble the components of a simple speech synthesiser
  • 3. We model the vocal tract as a tube that has p segments.  Ug and Ul are the volume flow of air at the glottis and lips respectively.  Vocal tract is of length L (typically 15-17 cm in adults) Number of tube segments needed = 2L/cT≈0.001 fsamp
  • 4. Mass × Acceleration = Force 1 u p p u p pV A x V A A t x x t x Adiabatic Gas Law p 2 u  A c t x This equations are known as the wave equations  Solution: u ( x, t ) u (t x / c) u (t x / c) c p ( x, t ) u (t x / c) u (t x / c) A  It is easily verified that this solution satisfies the wave equations for any differentiable functions u.  The two functions u represent waves travelling in +ve and –ve directions at velocity c. The actual values of the waves are determined by the boundary conditions at the end of the tube section
  • 5. Acoustic signal is the superposition of two waves: U in the forward direction and V in the reverse direction Assumptions:  Sound waves are 1-dimensional: true for frequencies < 3-4 kHz whose wavelengths are long compared to the tube width  No frictional or wall-vibration energy losses
  • 6. Time for sound to travel along segment = L/cp L L v(t ) x(t ) u (t ) w(t ) cp cp Segment length chosen to correspond to half a sample period = 0.5cT 1 If we take z-transforms, this time delay corresponds to multiplying by z (t ) 2 1 1 2 2 V ( z) z (t ) X ( z ); U ( z) z (t ) W ( z) 1 In matrix form U z 2 0 W 1 2 1 0 W 1 z (t ) 1 V 0 z 2 X 0 z X
  • 7. c sin[ (l x) / c] The transfer function is given by: p ( x, t ) j U G ( )e j t A cos[ l / c] U (l , ) 1 cos[ (l x) / c] U (0, ) cos( l / c) u ( x, t ) U G ( )e j t sin[ l / c] This function has poles located at every (2n 1) c 2l These correspond to the frequencies at which the tube becomes a quarter wavelength 1 c c l cT ; 2 2 Fs 4l
  • 8. Flow Continuity: U V W X c c  Pressure Continuity: U V W X A B 1 1 U 1 1 W  In matrix form: B B V A A X  Hence: U B 1 1 1 W A B A B W 1 1 V 2B B 1 A A X 2B A B A B X
  • 9. B A  Define the reflection coefficient to be r B A U 1 A B A B W 1 1 r W V 2B A B A B X 1 r r 1 X  Reflection coefficients always lie in the range ±1
  • 10. Assume Vl = 0: no sound reflected back into mouth  Work backwards from lips towards glottis: ◦ Junction: use the reflection matrix ◦ Tube segment: use the delay matrix  A3 is large but not infinite: assumption of narrow tube breaks down at this point  A0 is approximately zero: area of glottis opening
  • 11. Multiplying out the matrices gives Ug z 1 1 (r0 r1 r1r2 ) z 1 r0 r2 z 2 2 1 2 Ul Vg r0 (r1 r0 r1r2 ) z r2 z (1 rk ) k 0  We can ignore Vg: it gets absorbed in the lungs.  The vocal tract transfer function is given by the ratio of Ul to Ug 2 1 (1 rk )z Ul k 0 Gz 1 1 2 Ug 1 (r0 r1 r1r2 ) z r0 r2 z 1 a1 z 1 a2 z 2
  • 12.
  • 13. 1 1 1 1 r 1 2 1 0 z 2 1 rz z 1 r r 1 0 z1 1 r r z 1  Multiplying together all the matrices for a p-segment vocal tract gives: 1 p 1 Ug z 2 p 1 rk z 1 p 1 Ul Vg k 0 rk z rp (1 rk ) k 0  This results in a transfer function of the form: 1 p 2 Ul Gz 1 2 p Ug 1 a1 z a2 z  ap z  G is a gain term 1 p  z is the acoustic time delay along the vocal tract 2  The denominator represents a p-th order all-pole filter
  • 14. R(z) is the transfer function between airflow at the lips and pressure at the microphone  For a lip-opening area of A, acoustic theory predicts a 1st-order high- pass response with a corner frequency of: c Hz 5kHz 4A  For fsamp< 20 kHz, a good approximation is: S ( z) 1 R( z ) 1 z U l ( z) T R( z ) 2 sin 2
  • 15. “LF Model” (Liljencrants & Fant) e at sin(bt ) 0 t te u ' g (t ) ft c de te t 1 u g (0) u g (1) 0; u g (t ) and u ' g (t ) continuous at te Line Spectrum of ug (approx –12 dB/octave):
  • 16. Larynx Frequency ≈130 Hz  First Vocal tract resonance (formant) ≈1 kHz  There is not necessarily any relation between the larynx frequency and the vocal tract resonances.  Resonances at a multiple of the larynx frequency will be louder (good for singers)
  • 17.
  • 18. This lecture reviews some well known facts about filters and introduces some less known ones that will be needed later on.  Derive the power response of first order FIR and IIR filters and relate this to the geometry of the pole- zero diagram.  Relate the bandwidth of a 2nd-order resonance to the geometry of the pole-zero diagram.  Describe the bandwidth expansion transformation of a filter.  Describe the effect of reversing the coefficients of a filter.  Derive expressions for the log frequency response and its average value
  • 19. y ( n) hk x(n k ) k  System which is perform this transformation called linear digital filter  y(n) – output, x(n) - input, hk - impulse response  Transfer function Y ( z) H ( z) hk z k H ( z) ; X ( z) x ( n) z n k X ( z) n Digital filter is a finite system I L L l ai y (n i ) bl x(n l ) bl z L i 0 l 0 l 0 l 0 (z i ) H ( z) I I 1 ai z i i 0 (z i ) i 1
  • 20. A linear time-invariant system can be characterized by a constant-coefficient difference equations N M y ( n) ak y ( n k ) bk x(n k ) k 1 k 0 Such systems can be implemented as signal flow graphs:  Stable filter ai 1; i 1 I  Minimum phase filter bl 1; l 1 L
  • 21. M 1 y ( n) bk x(n k ) H ( z ) 1 az y ( n) x(n) ax(n 1) k 0  Filter has a single zero at z a re j Frequency response of filter H (e j ) 1 ae j  Power response of filter H (e ) j 2  H (e j ) H * (e j ) j j (1 ae )(1 a * e )  Example 1 r 2 2r cos( ) a 0.6 0.4 j j 2 H (e ) 1.52 1.44 cos( 0.59 )
  • 22. We can calculate the log response of the filter log( H (e j )) log(1 ae j )  If |a|<1 then ae j 1 and we can expand the log as a power series using d2 d3 log(1 d ) d  ;d 1 2 3 j an jn log( H (e )) e 1 n j 2 rn j n log H (e ) 2 cos(n( )); a re n 1 n First six terms in the summation for: a = 0.6 + 0.4j
  • 23. If |a|>1, we can rearrange the formula in terms of a 1  log( H (e j )) log( ae j (1 a 1e j )) log( ae j ) log(1 a 1e j ) Since a 1 we can expand the log as before to obtain 1 2 r n log H (e j ) 2 log a 2 cos(n( )); a re j n 1 n The average of log( H (e ) ) is 2log a if |a|>1 j 2 The log response of an arbitrary filter is just the sum of the log responses of each pole or zero. For a stable filter, all the poles must be within the unit circle. Hence
  • 24. 1 H ( z) 1 y ( n) x(n) ay(n 1) 1 az  Filter has a single pole at z a re j  Power response of filter is given by j 2 1 H (e ) 1 r2 2r cos( ) 2 2 H (e )j Peak (1 r )
  • 25. If the filter coefficients are real, any complex zeros or poles will always occur in conjugate pairs.  The response of the filter is the product of the responses of the individual poles. Conjugate pole/zero pairs ensure a symmetric response. 0.59 j j Example: Poles at 0.6 0.4 j 0.72 e re 1 1 H ( z) 1 1 2r cos z r2z 2 1 1.2 z 1 0.52 z 2 j 2 H (e )
  • 26. 1 H ( z) (1 az 1 )(1 a * z 1 ) 1 H ( z) 1 az 1 1 a * z 1 But since |z|=1, we have 1 1 1 az z z a z a This is just the distance between z and a. The magnitude response of the filter at a frequency ω is proportional to the product of the distance from the point e j to all the zeros divided by the product of the distance to all the poles .The constant of proportionality is 0 L 0 l 0 (1 i z 1) 0 H ( z) I 0 i 0 (1 i z 1)
  • 27. The bandwidth of a resonance peak is the frequency range at which the magnitude response has decreased by √2.  For poles near the unit circle this is approximately 2(1–r)rad/s = (1–r)/πHz (normalised). 2 1 2(1 r )
  • 28. If we have a filter We can form a new filter by multiplying coefficients ai and bi by L ki for some k< 1. L l l bl z l bl k z l 0 H ( z) l 0 G( z) H (z / k) I I 1 ai z i 1 ai k i z i i 1 i 1 If H(z)has a pole/zero at z0, then G(z)will have one at kz0. All poles and zero will be moved inwards by a factor k. If the bandwidth of a pole of H(z) is b=2(1–r), then the bandwidth of the corresponding pole in G(z) will be expanded to: k 0.95 2(1 kr) b 2r (1 k )
  • 29. If we have a filter G( z) b* b* 1z p p 1 *  b0 z p z p H *(z * 1)  We can form a new filter by conjugating the coefficients and putting them in reverse order:  If z0 is a zero of H(z)then z0*–1 is a zero of G(z). This is called a reflectionin the unit circle.  The frequency response of G(z) is given by: G (e j ) e jp H * (e j )  Hence G(z) has the same magnitude response as H(z) but a different phase response Arg G (e j ) Arg H (e j ) p G (e j ) H (e j )