Thank you for your attention!
About Me:
https://about.me/luis-beltran
Notes de l'éditeur
Practicing pronunciation and getting timely feedback are essential to improving language skills. Assessment is conventionally driven by experienced teachers, which normally takes a lot of time and great effort, and this makes high-quality assessment expensive for students. In this session we will see how we can make use of artificial intelligence to create a technological solution to support students in learning another language in a practical way and with objective feedback in real time.
Microsoft's Azure Speech service provides speech-to-text, text-to-speech, and speech translation capabilities to developers. At Build 2020, Microsoft announced several new preview capabilities, including Pronunciation Assessment that can assess speech pronunciation and give speakers feedback on the accuracy and fluency of spoken audio. Microsoft is using this service for its PowerPoint Presenter Coach feature.
For language learners, practicing pronunciation and getting timely feedback are essential to improving language skills. Assessment is conventionally driven by experienced teachers, which typically takes a lot of time and great effort, making high-quality assessment expensive for students.
How to solve this problem?
Pronunciation Assessment, a novel AI-powered speech capability, is able to make language assessment more engaging and accessible to students of all backgrounds.
Pronunciation Assessment, a feature of Speech in Azure Cognitive Services, provides subjective and objective feedback to language learners with computer-aided technology.
With Pronunciation Assessment, language learners can practice, get instant feedback, and improve their pronunciation. Online learning solution providers or educators can use the ability to assess the pronunciation of multiple speakers in real time.
Pronunciation Assessment provides various evaluation results in different granularities, from individual phonemes to full text input.
At the phoneme level, it provides precision scores of each phoneme, helping students better understand the pronunciation details of their speech.
At the word level, it can automatically detect errors and provide an accuracy score simultaneously, providing more detailed information about the omission, repetition, insertions, and incorrect pronunciation in the given speech.
At the full-text level, it offers additional fluency and integrity scores: fluency indicates how closely speech matches a native speaker's use of silent pauses between words, and completeness indicates how many words are spoken in speech to reference text input.
An aggregate overall score of Accuracy, Fluency, and Completeness is then provided to indicate the overall pronunciation quality of the given speech. With these characteristics, students can easily know the weakness of their speech and improve with the objective objectives.
You can get pronunciation assessment scores for:
Full text
Words
Groups of syllables
Phonemes in SAPI or IPA format
Pronunciation assessment can provide syllable-level assessment results. Grouping in syllables is more legible and aligned with speaking habits, as a word is typically pronounced syllable by syllable rather than phoneme by phoneme.
For en-US locale, the phoneme name is provided together with the score, to help identify which phonemes were pronounced accurately or inaccurately. For other locales, you can only get the phoneme score.
With spoken phonemes, you can get confidence scores indicating how likely the spoken phonemes matched the expected phonemes.
When speech is recognized, you can request the pronunciation assessment results as SDK objects or a JSON string.
The phoneme alphabet is IPA.
The syllables are returned alongside phonemes for the same word.
You can use the Offset and Duration values to align syllables with their corresponding phonemes. For example, the starting offset (11700000) of the second syllable ("loʊ") aligns with the third phoneme ("l").
There are five NBestPhonemes corresponding to the number of spoken phonemes requested.
Within Phonemes, the most likely spoken phonemes was "ə" instead of the expected phoneme "ɛ". The expected phoneme "ɛ" only received a confidence score of 47. Other potential matches received confidence scores of 52, 17, and 2.