Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Киев 2016
Первый в Украине фестиваль тестирования
Introduction to Speech
Recognition Software testing
Roman Gorin
Киев 2016
About me
• Senior Technical Leader – Testing
@ Delphi LLC http://udelphi.com
• 12+ years in Speech Recognition T...
Киев 2016
What it is
Киев 2016
Where used
• Nuance Dragon Family
• Dragon Pro
• Dragon Medical
• Dragon for Mac
• Dragon Anywhere
• Etc
Windows...
Киев 2016
Where used
Personal assistants
• Siri
• Cortana
• Google Now
• Facebook M, etc
Car systems
Киев 2016
Where used
Smart Home assistants
• Amazon Echo
• Google Home
• Zenbo
• Homer, etc.
• Automated Call Сenters SW
a...
Киев 2016
Where used: ViV AI (unreleased)
Киев 2016
Basic Principles
• Capture audio
• Separate speech from other types of sounds (esp. noise)
• Compare speech audi...
Киев 2016
Generic structure of how SR works
Main speech recognition models
(based on Wiki)
• Hidden Markov models
• Dynami...
Киев 2016
Testing areas
• Engine and Language Modelling (usually on recognition server side)
• UI
• Hardware
• Deployment
...
Киев 2016
Testing areas: Hardware
• Mobile HW
• Internal mic (notebooks/tablets)
• Noise cancelling mic
• Sound card and d...
Киев 2016
Testing areas: Hardware
• Mics and recorders (samples from nuance.com store)
• Special bundled HW for Profession...
Киев 2016
Testing areas: Deployment
• Platform
• Client OS (Desktop/Mobile)
• Server OS for Client app
• Server OS for Clo...
Киев 2016
Testing areas: Adaptation
• Predefined language patterns
• Statistical models
A statistical language model is a ...
Киев 2016
Testing areas: Recognition and Commands
control
• Initial recognition tests
• Turn app into “listening mode”
• B...
Киев 2016
Testing areas: Recognition and Text Editing
(sample from PCWorld/Nuance)
Киев 2016
Testing areas: Languages and Accents
• Different accents (UK English, US English, Australian English, etc)
• Iss...
Киев 2016
Testing areas: Other stuff
• Audio codecs
• Traffic consumption (for cloud or remote access apps)
• Memory and C...
Киев 2016
Enterprise Recognition (based on Nuance.com info)
Киев 2016
Enterprise Recognition (based on Nuance.com info)
• Support Major EHR
platforms—including Epic®,
Cerner®, eClini...
Киев 2016
Киев 2016
Links
• https://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx
• http://www.explainthatstuff.com/vo...
Prochain SlideShare
Chargement dans…5
×

QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

923 vues

Publié le

В докладе расскажу об основных принципах работы Speech Recognition Software, где и какие технологии используются и расскажу о ключевых моментах в тестировании продуктов такого типа (как standalone-mode, так и формата cloud-recognition, включая голосовых помощников). Также расскажу о том, как используются такие продукты на Enterprise-уровне и какие аспекты тестирования нужно прнять во внимание.

Publié dans : Formation
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

  1. 1. Киев 2016 Первый в Украине фестиваль тестирования Introduction to Speech Recognition Software testing Roman Gorin
  2. 2. Киев 2016 About me • Senior Technical Leader – Testing @ Delphi LLC http://udelphi.com • 12+ years in Speech Recognition Testing • 6+ years as QA Team Lead • Main Product: Nuance Dragon Medical http://www.nuance.com/for-healthcare/dragon-medical • https://telegram.me/DJ_ZX • Facebook: rgorin.zx
  3. 3. Киев 2016 What it is
  4. 4. Киев 2016 Where used • Nuance Dragon Family • Dragon Pro • Dragon Medical • Dragon for Mac • Dragon Anywhere • Etc Windows Speech Recognition Google Voice Search
  5. 5. Киев 2016 Where used Personal assistants • Siri • Cortana • Google Now • Facebook M, etc Car systems
  6. 6. Киев 2016 Where used Smart Home assistants • Amazon Echo • Google Home • Zenbo • Homer, etc. • Automated Call Сenters SW and more
  7. 7. Киев 2016 Where used: ViV AI (unreleased)
  8. 8. Киев 2016 Basic Principles • Capture audio • Separate speech from other types of sounds (esp. noise) • Compare speech audio with known patterns of text<- >audio match • Analyze language specific model • Perform actions (type text, execute command) based on collected data
  9. 9. Киев 2016 Generic structure of how SR works Main speech recognition models (based on Wiki) • Hidden Markov models • Dynamic time warping (DTW)-based speech recognition • Neural networks • Deep Feedforward and Recurrent Neural Networks
  10. 10. Киев 2016 Testing areas • Engine and Language Modelling (usually on recognition server side) • UI • Hardware • Deployment • Adaptation • Recognition and Text Editing • Language specific etc
  11. 11. Киев 2016 Testing areas: Hardware • Mobile HW • Internal mic (notebooks/tablets) • Noise cancelling mic • Sound card and drivers compatibility • System Requirements compliance • HW Dependency • Driver Dependency (WASAPI, DirectSound, ASIO, Kernel streaming for Windows, ALSA, PulseAudio – Linux, Core Audio – Mac)
  12. 12. Киев 2016 Testing areas: Hardware • Mics and recorders (samples from nuance.com store) • Special bundled HW for Professional *Nuance PowerMic *Philips SpeechMike
  13. 13. Киев 2016 Testing areas: Deployment • Platform • Client OS (Desktop/Mobile) • Server OS for Client app • Server OS for Cloud/Remote app • Azure Cloud • Amazon Cloud • Proprietary cloud hosts for server recognition (for ex. recognition servers for Siri, etc) • Support for virtualization platforms: VDI and App Virtualization (standalone recognition on remote access) • Citrix XenApp and XenDesktop/Thin and Thick clients • VMWare Workstation and Horizon • Oracle VirtualBox • Microsoft Remote Desktop/Terminal Services
  14. 14. Киев 2016 Testing areas: Adaptation • Predefined language patterns • Statistical models A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability P ( w 1 , … , w m ) to the whole sequence. Having a way to estimate the relative likelihood of different phrases is useful in many natural language processing applications. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications. • “Part of speech” detection • Sound specific patterns • Person-specific • How person pronounce words and sounds • How person construct sentences • Pronunciation speed
  15. 15. Киев 2016 Testing areas: Recognition and Commands control • Initial recognition tests • Turn app into “listening mode” • Basic commands (“what I can do”) • Extended commands (app-type specific) • Non strict commands (pseudo-AI) • Search commands • 3rd party Apps specific commands/3rd party SW compatibility • Dictating into app default text controls (if supported) • Dictating into 3rd party supported and unsupported apps • Transcribing prerecorded audio
  16. 16. Киев 2016 Testing areas: Recognition and Text Editing (sample from PCWorld/Nuance)
  17. 17. Киев 2016 Testing areas: Languages and Accents • Different accents (UK English, US English, Australian English, etc) • Issues with speaking • Language-specific sounds • Homophones (French) • Umlauts (German) • etc • Language specific syntax (using commas, periods, exclamation marks, etc) • Similar or close pronunciation words (fr. voux, voi, vu, etc) • Hieroglyphs (Chinese, Japan, etc)
  18. 18. Киев 2016 Testing areas: Other stuff • Audio codecs • Traffic consumption (for cloud or remote access apps) • Memory and CPU consumption • Response time and cancelling recognition
  19. 19. Киев 2016 Enterprise Recognition (based on Nuance.com info)
  20. 20. Киев 2016 Enterprise Recognition (based on Nuance.com info) • Support Major EHR platforms—including Epic®, Cerner®, eClinicalWorks, athenahealth®, MEDITECH®, and more. © Nuance.com
  21. 21. Киев 2016
  22. 22. Киев 2016 Links • https://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx • http://www.explainthatstuff.com/voicerecognition.html • http://scienceline.org/2014/08/ever-wondered-how-does-speech-to-text-software-work/ • http://www.nuance.com/for-healthcare/capture-anywhere/360-mobile-solutions/powermicmobile/index.htm • http://www.nuance.com/for-individuals/by-product/dragon-accessories • https://en.wikipedia.org/wiki/List_of_speech_recognition_software • https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking • https://en.wikipedia.org/wiki/Speech_recognition • https://en.wikipedia.org/wiki/Language_model • http://www.pcmag.com/article2/0,2817,2464719,00.asp • http://www.pcworld.com/article/2055599/control-your-pc-with-these-5-speech-recognition-programs.html • http://www.oxygen.lcs.mit.edu/Speech.html • http://copia.com.au/medical-speech-recognition/

×