Asr

"And of knowledge, you (mankind) have been given only a little." Surah Al-Isra‘, verse 85.

A utomatic S peech R ecognition ,[object Object],[object Object],[object Object]

OUTLINE ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Multilayer Structure of speech production: ,[object Object],[I] [would] [like] [to] [book] [a] [flight] [from] [Rome] [to] [London][tomorrow][morning] [book]  [b/uh/k] Pragmatic Layer Semantic Layer Syntactic Layer Prosodic/Phonetic Layer Acoustic Layer

What is S peech R ecognition ? ,[object Object],[object Object],[object Object]

Capabilities of ASR including: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Uses and Applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

A Timeline & History of Voice Recognition Software Dragon released discrete word dictation-level speech recognition software. It was the first time dictation speech & voice recognition technology was available to consumers . 1995 SpeechWorks, the leading provider of over-the-telephone automated speech recognition (ASR) solutions, was founded. 1984 Dragon Systems was founded. 1982 DARPA established the Speech Understanding Research (SUR) program. A $3 million per year of government funds for 5 years. It was the largest speech recognition project ever. 1971 HMM approach to speech & voice recognition was invented by Lenny Baum of Princeton University Early 1970's AT&T's Bell Labs produced the first electronic speech synthesizer called the Voder. 1936

… timeline…continue Scansoft, Inc. is presently the world leader in the technology of Speech Recognition in the commercial market. ScanSoft Ships Dragon NaturallySpeaking 7 Medical, Lowers Healthcare Costs through Highly Accurate Speech Recognition. 2003 Lernout & Hauspie acquired Dragon Systems for approximately $460 million. 2000 Microsoft invested $45 million to allow Microsoft to use speech & voice recognition technology in their systems. 1998 Dragon introduced "Naturally Speaking", the first "continuous speech" dictation software available 1997

The Structure of ASR System: Functional Scheme of an ASR System Speech samples X Y S W * Database Signal Interface Feature Extraction Recognition Databases Training HMM

Speech Database: ,[object Object],[object Object],[object Object]

Transcription of speech: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Segmentation and labeling example

Many databases are distributed by the Linguistic Data Consortium www.ldc.upenn.edu

Speech Signal Analysis Feature Extraction for ASR: - The aim is to extract the voice features to distinguish different phonemes of a language.

MFCC extraction: ,[object Object],[object Object],Pre-emphasis DFT Mel filter banks Log(|| 2 ) IDFT Speech signal x(n) WINDOW x ’ (n) x t (n) X t (k) Y t (m) MFCC y t (m) (k)

Spectral Analysis: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Speech waveform of a phoneme “e” ,[object Object],After pre-emphasis and Hamming windowing Power spectrum MFCC

Training and Recognition : ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Deterministic vs. Stochastic framework: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Implementing HMM to speech Modeling Training and Recognition ,[object Object],[object Object],[object Object],Training HMM Feature Extraction Recognition W * Y Y S Speech Samples 

Implementation of HMM: ,[object Object],[object Object],P(w t =yes t-1 =il)=0.2 P(w t =il|w t-1 =yes)=1 P(w t =il|w t-1 =no)=1 P(w t =no t-1 =il)=0.2 P(s t t-1 ) s (0) Silence Start S (1) S (2) S (3) S (4) S (5) S (6) S (7) S (8) S (9) S (10) S (11) S (12) Phoneme ‘ YE ’ Phoneme ‘ S ’ w= YES w= NO Phoneme ‘ N ’ Phoneme ‘ O ’ P(Y t =s (9) ) Y 0.6

The search Algorithm: ,[object Object],s (0) s (7) s (0) s (1) s (8) s (7) s (0) s (1) s (2) Time=1 Time=2 Time=3 0.1 0.4 0.1 0.025 0.021 0.051 0.041 0.045 0.036 0.032

Conclusions: ,[object Object],[object Object],[object Object],[object Object],[object Object]

Asr

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (16)

En vedette

En vedette (10)

Similaire à Asr

Similaire à Asr (20)

Plus de kkkseld

Plus de kkkseld (12)

Dernier

Dernier (20)

Asr