В докладе расскажу об основных принципах работы Speech Recognition Software, где и какие технологии используются и расскажу о ключевых моментах в тестировании продуктов такого типа (как standalone-mode, так и формата cloud-recognition, включая голосовых помощников). Также расскажу о том, как используются такие продукты на Enterprise-уровне и какие аспекты тестирования нужно прнять во внимание.
QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика
1. Киев 2016
Первый в Украине фестиваль тестирования
Introduction to Speech
Recognition Software testing
Roman Gorin
2. Киев 2016
About me
• Senior Technical Leader – Testing
@ Delphi LLC http://udelphi.com
• 12+ years in Speech Recognition Testing
• 6+ years as QA Team Lead
• Main Product: Nuance Dragon Medical
http://www.nuance.com/for-healthcare/dragon-medical
• https://telegram.me/DJ_ZX
• Facebook: rgorin.zx
4. Киев 2016
Where used
• Nuance Dragon Family
• Dragon Pro
• Dragon Medical
• Dragon for Mac
• Dragon Anywhere
• Etc
Windows Speech Recognition
Google Voice Search
8. Киев 2016
Basic Principles
• Capture audio
• Separate speech from other types of sounds (esp. noise)
• Compare speech audio with known patterns of text<-
>audio match
• Analyze language specific model
• Perform actions (type text, execute command) based on
collected data
9. Киев 2016
Generic structure of how SR works
Main speech recognition models
(based on Wiki)
• Hidden Markov models
• Dynamic time warping (DTW)-based
speech recognition
• Neural networks
• Deep Feedforward and Recurrent
Neural Networks
10. Киев 2016
Testing areas
• Engine and Language Modelling (usually on recognition server side)
• UI
• Hardware
• Deployment
• Adaptation
• Recognition and Text Editing
• Language specific
etc
11. Киев 2016
Testing areas: Hardware
• Mobile HW
• Internal mic (notebooks/tablets)
• Noise cancelling mic
• Sound card and drivers compatibility
• System Requirements compliance
• HW Dependency
• Driver Dependency (WASAPI, DirectSound, ASIO, Kernel streaming for
Windows, ALSA, PulseAudio – Linux, Core Audio – Mac)
12. Киев 2016
Testing areas: Hardware
• Mics and recorders (samples from nuance.com store)
• Special bundled HW for Professional
*Nuance PowerMic *Philips SpeechMike
13. Киев 2016
Testing areas: Deployment
• Platform
• Client OS (Desktop/Mobile)
• Server OS for Client app
• Server OS for Cloud/Remote app
• Azure Cloud
• Amazon Cloud
• Proprietary cloud hosts for server recognition (for ex. recognition servers for Siri, etc)
• Support for virtualization platforms: VDI and App Virtualization
(standalone recognition on remote access)
• Citrix XenApp and XenDesktop/Thin and Thick clients
• VMWare Workstation and Horizon
• Oracle VirtualBox
• Microsoft Remote Desktop/Terminal Services
14. Киев 2016
Testing areas: Adaptation
• Predefined language patterns
• Statistical models
A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability
P ( w 1 , … , w m ) to the whole sequence. Having a way to estimate the relative likelihood of different phrases is useful in many natural language processing applications.
Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications.
• “Part of speech” detection
• Sound specific patterns
• Person-specific
• How person pronounce words and sounds
• How person construct sentences
• Pronunciation speed
15. Киев 2016
Testing areas: Recognition and Commands
control
• Initial recognition tests
• Turn app into “listening mode”
• Basic commands (“what I can do”)
• Extended commands (app-type specific)
• Non strict commands (pseudo-AI)
• Search commands
• 3rd party Apps specific commands/3rd party SW compatibility
• Dictating into app default text controls (if supported)
• Dictating into 3rd party supported and unsupported apps
• Transcribing prerecorded audio
17. Киев 2016
Testing areas: Languages and Accents
• Different accents (UK English, US English, Australian English, etc)
• Issues with speaking
• Language-specific sounds
• Homophones (French)
• Umlauts (German)
• etc
• Language specific syntax (using commas, periods, exclamation marks,
etc)
• Similar or close pronunciation words (fr. voux, voi, vu, etc)
• Hieroglyphs (Chinese, Japan, etc)
18. Киев 2016
Testing areas: Other stuff
• Audio codecs
• Traffic consumption (for cloud or remote access apps)
• Memory and CPU consumption
• Response time and cancelling recognition