QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016
Первый в Украине фестиваль тестирования
Introduction to Speech
Recognition Software testing
Roman Gorin

Киев 2016
About me
• Senior Technical Leader – Testing
@ Delphi LLC http://udelphi.com
• 12+ years in Speech Recognition Testing
• 6+ years as QA Team Lead
• Main Product: Nuance Dragon Medical
http://www.nuance.com/for-healthcare/dragon-medical
• https://telegram.me/DJ_ZX
• Facebook: rgorin.zx

Киев 2016
Where used
• Nuance Dragon Family
• Dragon Pro
• Dragon Medical
• Dragon for Mac
• Dragon Anywhere
• Etc
Windows Speech Recognition
Google Voice Search

Киев 2016
Where used
Personal assistants
• Siri
• Cortana
• Google Now
• Facebook M, etc
Car systems

Киев 2016
Where used
Smart Home assistants
• Amazon Echo
• Google Home
• Zenbo
• Homer, etc.
• Automated Call Сenters SW
and more

Киев 2016
Where used: ViV AI (unreleased)

Киев 2016
Basic Principles
• Capture audio
• Separate speech from other types of sounds (esp. noise)
• Compare speech audio with known patterns of text<-
>audio match
• Analyze language specific model
• Perform actions (type text, execute command) based on
collected data

Киев 2016
Generic structure of how SR works
Main speech recognition models
(based on Wiki)
• Hidden Markov models
• Dynamic time warping (DTW)-based
speech recognition
• Neural networks
• Deep Feedforward and Recurrent
Neural Networks

Киев 2016
Testing areas
• Engine and Language Modelling (usually on recognition server side)
• UI
• Hardware
• Deployment
• Adaptation
• Recognition and Text Editing
• Language specific
etc

Киев 2016
Testing areas: Hardware
• Mobile HW
• Internal mic (notebooks/tablets)
• Noise cancelling mic
• Sound card and drivers compatibility
• System Requirements compliance
• HW Dependency
• Driver Dependency (WASAPI, DirectSound, ASIO, Kernel streaming for
Windows, ALSA, PulseAudio – Linux, Core Audio – Mac)

Киев 2016
Testing areas: Hardware
• Mics and recorders (samples from nuance.com store)
• Special bundled HW for Professional
*Nuance PowerMic *Philips SpeechMike

Киев 2016
Testing areas: Deployment
• Platform
• Client OS (Desktop/Mobile)
• Server OS for Client app
• Server OS for Cloud/Remote app
• Azure Cloud
• Amazon Cloud
• Proprietary cloud hosts for server recognition (for ex. recognition servers for Siri, etc)
• Support for virtualization platforms: VDI and App Virtualization
(standalone recognition on remote access)
• Citrix XenApp and XenDesktop/Thin and Thick clients
• VMWare Workstation and Horizon
• Oracle VirtualBox
• Microsoft Remote Desktop/Terminal Services

Киев 2016
Testing areas: Adaptation
• Predefined language patterns
• Statistical models
A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability
P ( w 1 , … , w m ) to the whole sequence. Having a way to estimate the relative likelihood of different phrases is useful in many natural language processing applications.
Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications.
• “Part of speech” detection
• Sound specific patterns
• Person-specific
• How person pronounce words and sounds
• How person construct sentences
• Pronunciation speed

Киев 2016
Testing areas: Recognition and Commands
control
• Initial recognition tests
• Turn app into “listening mode”
• Basic commands (“what I can do”)
• Extended commands (app-type specific)
• Non strict commands (pseudo-AI)
• Search commands
• 3rd party Apps specific commands/3rd party SW compatibility
• Dictating into app default text controls (if supported)
• Dictating into 3rd party supported and unsupported apps
• Transcribing prerecorded audio

Киев 2016
Testing areas: Recognition and Text Editing
(sample from PCWorld/Nuance)

Киев 2016
Testing areas: Languages and Accents
• Different accents (UK English, US English, Australian English, etc)
• Issues with speaking
• Language-specific sounds
• Homophones (French)
• Umlauts (German)
• etc
• Language specific syntax (using commas, periods, exclamation marks,
etc)
• Similar or close pronunciation words (fr. voux, voi, vu, etc)
• Hieroglyphs (Chinese, Japan, etc)

Киев 2016
Testing areas: Other stuff
• Audio codecs
• Traffic consumption (for cloud or remote access apps)
• Memory and CPU consumption
• Response time and cancelling recognition

Киев 2016
Enterprise Recognition (based on Nuance.com info)

Киев 2016
Enterprise Recognition (based on Nuance.com info)
• Support Major EHR
platforms—including Epic®,
Cerner®, eClinicalWorks,
athenahealth®, MEDITECH®,
and more. © Nuance.com

Киев 2016
Links
• https://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx
• http://www.explainthatstuff.com/voicerecognition.html
• http://scienceline.org/2014/08/ever-wondered-how-does-speech-to-text-software-work/
• http://www.nuance.com/for-healthcare/capture-anywhere/360-mobile-solutions/powermicmobile/index.htm
• http://www.nuance.com/for-individuals/by-product/dragon-accessories
• https://en.wikipedia.org/wiki/List_of_speech_recognition_software
• https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking
• https://en.wikipedia.org/wiki/Speech_recognition
• https://en.wikipedia.org/wiki/Language_model
• http://www.pcmag.com/article2/0,2817,2464719,00.asp
• http://www.pcworld.com/article/2055599/control-your-pc-with-these-5-speech-recognition-programs.html
• http://www.oxygen.lcs.mit.edu/Speech.html
• http://copia.com.au/medical-speech-recognition/

QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Similar to QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика (20)

More from QAFest

More from QAFest (20)

Recently uploaded

Recently uploaded (20)

QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Editor's Notes