Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Speech enhancement for distant talking speech recognition

7 656 vues

Publié le

The impact of dereverberation on the ASR performance examined in the REVERB and AMI tasks. Slides from my talk at Imperial College London in Feb 2014.

Publié dans : Technologie
  • Increasing Sex Drive And Getting Harder Erections, Naturally ♣♣♣ http://t.cn/Ai88iYkP
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Speech enhancement for distant talking speech recognition

  1. 1. 24 Feb 2014 Takuya Yoshioka NTT CS Labs, Cambridge University Thanks to: T. Nakatani, K. Kinoshita, M. Delcrolix (NTT) M. Gales, X. Chen (Cambridge)
  2. 2. Speech Enhancement for ASR • Effectiveness measured by WER – use of a sensible ASR system essential • Huge computational resources available • Offline processing allowed • AM can also do some job
  3. 3. Typical ASR System Pron Dict LMAM Recog Engine Speech Enh Front- End Signal Sentence
  4. 4. Different Approaches for Different Situations • 1ch vs. Mch (M > 1) • background noise; • reverberant noise; or • interfering talkers
  5. 5. Different Approaches for Different Situations • 1ch vs. Mch (M > 1) • background noise; • reverberant noise; or • interfering talkers
  6. 6. • Reverberation usually modelled with FIR • Given (x[t])t=1,…,N, recover (s[t])t=1,…,N 1ch Dereverberation (Offline) ∑= −= T tshtx 0 ][][][ τ ττ
  7. 7. Approaches • Time domain – subspace, Trinicon, Long-term LP – accuate – can account for phase distortion • Power spectral domain – WF, NMF – robust against speaker movement • Feature domain – front-end VTS, direct CMLLR – can leverage the AM
  8. 8. Dereverb Dereverb Analysis Synthesis xk(t) sk(t) x[t] s[t] ∑= ∗ −= T kkk tshtx 0 )()()( τ ττ ... Assume in each sub-band
  9. 9. Inverse Filtering (in Each Sub-band) ∑= ∗ −= U kkk txgts 0 )()()( τ ττ
  10. 10. Long-Term Linear Prediction )()()()( tetxatx k U kkk +−= ∑∆= ∗ τ ττ )(tsk ∑∆= ∗ −−= U kkkk txatxts τ ττ )()()()( we don’t minimise ek(t)!
  11. 11. Why LP? )()()()( tstxatx k U kkk +−= ∑∆= ∗ τ ττ ∑= ∗ −= T kkk tshtx 0 )()()( τ ττ LP vs. FIR
  12. 12. ( )tk U kkUtkk tyaNtyty ,,...,1' ,)()(~))'((|)( λτττ∑ ∆= ∗ = − ( )∑ ∑= ∆= ∗ = −= N t tk U kkNtk tyaftyp 1 ,Normal,...,1 ,)()(log))((log λτττ + ),0(~)( ,tkk Nts λ )()()()( tstxatx k U kkk +−= ∑∆= ∗ τ ττ
  13. 13. Interleaved Estimation of: - LP coeff A= (ak(t))t=∆,...,U + speech variance Λ=(λk,t)t=1,...,T - clean speech samples Initialise A Calculate sk(t) Estimate LP coeffs A Convergent? Estimate speech vars Λ
  14. 14. Eval on REVERB Challenge Data Sets System %WER DNN AM + RNN LM + AM adapt 20.0 Dereverb + DNN AM + RNN LM + AM adapt 16.5 • prompts from 5K WSJ • trained on multi-condition data • tested on real recordings from dev set • small amount of background noise
  15. 15. Eval on AMI Corpus (Meeting Transcription) System %WER Dev Eval DNN AM + 3gram LM 43.5 42.6 Dereverb + DNN AM + 3gram LM 42.0 41.1 • 4 participants in each meeting • table-top microphone used • single-speaker segments used • severe reverberation and background noise
  16. 16. 1ch Algorithm Summary • very robust against modelling errors • keys in development – modelling the reverberation with LP – using a reasonable clean speech pdf
  17. 17. Multi-Channel Extension Dereverb BF To recogniser
  18. 18. • LP  MIMO LP )()()()( ttt k U kkk exΑx +−= ∑∆= ∗ τ ττ )(tskh
  19. 19. • LP  MIMO LP • single speech model  vector speech model )()()()( ttt k U kkk exΑx +−= ∑∆= ∗ τ ττ )(tskh ),0(~)( ,tkk Nts λ ),0(~)( ,tkk Nts λ∗ hhh ),0( ,tkN λI≈ ⇔
  20. 20. Interleaved Estimation of: - LP matrix A= (Ak(t))t=∆,...,U + speech variance Λ=(λk,t)t=1,...,T - clean speech samples Initialise A Calculate sk(t) Estimate LP matrices A Convergent? Estimate speech vars Λ
  21. 21. Eval on REVERB Challenge Data Sets #Mics System %WER 1 Baseline(DNN AM + RNN LM + AM adapt) 20.0 Dereverb + Baseline 16.5 2 Dereverb + Baseline 14.8 Dereverb + MVDR + Baseline 13.6 8 Dereverb + Baseline 14.0 Dereverb + MVDR + Baseline 11.3
  22. 22. Long-Term LP Summary • very robust against modelling errors • can cover both 1ch and Mch set-ups • keys in development – modelling the reverberation with LP – using a reasonable clean speech pdf
  23. 23. Extensions Explored • dereverberation+BSS • adaptive long-term LP • NMF-based dereverberation – works in the power spectrum domain • FE-VTS dereverberation
  24. 24. Dereverberation+BSS Dereverb BSS
  25. 25. T60=0.3 s T60=0.5 s 0 2 4 6 8 10 12 14 16 dereverberation+separation separation w/oseparation SIR(dB)
  26. 26. Conclusion • Dereverberation based on long-term LP – represents reverberation with LP – consistent framework covering both 1ch and Mch set-ups – provides gains over well-optimised DNN AMs in realistic conditions – extensions to several directions described

×