SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
Comparing voice related API’s
Christian Rebernik
@crebernik7791
Voice First Footprint
In 2017 there will be 33 mio devices
● The Voice 2017 Report - VoiceLabs analysis combined with research from CIRP, KPCB and InfoScout
Voice adoption
The ‘Voice First’ era has already started
● Alexa in 4% of US households
(end 2016)
● Siri handles over 2bn commands
a week
● 20% of Google searches on
Android handsets input by voice
Alexa
Google
home
Ding Dong
Voice Devices
Creating an open ecosystem
Amazon Echo
Skills and Alexa Voices Service
Google Home
Google Assistant Actions
Speech Recognition API
Developing for the Amazon Alexa
● Limit understanding
Amazon Echo is build for predefined options (e.g. no custom notes).
Session is ended after 8 sec.
● Predefined wake word defines the customer experience.
Only 4 wake words available and must be in any conversation.
● No notifications and no presence
You can’t alert the user of an event. You cannot react on e.g. welcome
home.
● No audio / No identification
Anybody can use Alexa (guests, etc.) and access all informations
Technology Stack
Components enabling Voice User Interfaces
Implemented use cases leveraging
the Hardware and AI Software
Software that interprets speech,
enables conversations and provide
natural voice.
Devices the consumer is
interacting like Amazon Echo or
Google Home
Applications
AI Software
Hardware
AI overview
120 companies in Speech Recognition
Ventures Scanner, Contact info@venturescanner.com
Speech Recognition API
Real time speech-to-text API’s
Google4
IBM3
Microsoft2
Status Beta Beta/Production Preview
Language Support1
43 (89) 8 (14) 6 (7)
Cost/min 0,024 €
0,006 / 15sec
0,02 € 0,06 €
1000 calls a 15 sec for 4$
Speaker detection no English (8KHz) no
Audio Formats FLAC, Linear16, MULAW,
ARM, AMR_WB
FLAC, PCM, WAV, OGG,
NULAW
PCM single channel, Siren,
SirenSR
Noise Friendly Yes Unkown Unkown
Word hints Yes No No
1) Languages support (Languages supported including dialects)
2) Microsoft: https://www.microsoft.com/cognitive-services/en-us/speech-api
3) IBM: http://www.ibm.com/watson/developercloud/speech-to-text.html
4) Google: https://cloud.google.com/speech/
● High audio capturing quality
Use lossless coding. Capture audio with 16,000 Hz or higher. Use native sample rate.
● No additional noise
API’s include noise reduction. Duplicate noise reduction can reduce the quality. Echo
and noise has huge impact on speech recognition quality
● User education
Educate user to be close to the microphone
● One speaker per stream.
For multi speaker setting try to separate the audio streams as the current API’s are
built for dictation
● Provide context
Context matters a lot. Provide word hints to help the system to correct detection.
Speech Recognition API
Best practices
Problem
Real life - Voice is in the early days
Speech-to-text-quality
Speaker
recognition
Language mixing
Punctuation
Demo
Voice interaction in IoT
We are building a voice first company
and are looking for support
- Technical Research
- Deep Learning & NLP Scientist
- Software Engineers
Christian Rebernik
Contact: christian@6voices.com

Contenu connexe

En vedette

En vedette (10)

Daum 음성인식 API (김한샘)
Daum 음성인식 API (김한샘)Daum 음성인식 API (김한샘)
Daum 음성인식 API (김한샘)
 
음성인식 기술 및 활용 트렌드 (2013년)
음성인식 기술 및 활용 트렌드 (2013년)음성인식 기술 및 활용 트렌드 (2013년)
음성인식 기술 및 활용 트렌드 (2013년)
 
오픈 API 활용방법(Daum 사례 중심, 윤석찬)
오픈 API 활용방법(Daum 사례 중심, 윤석찬)오픈 API 활용방법(Daum 사례 중심, 윤석찬)
오픈 API 활용방법(Daum 사례 중심, 윤석찬)
 
20160409 microsoft 세미나 머신러닝관련 발표자료
20160409 microsoft 세미나 머신러닝관련 발표자료20160409 microsoft 세미나 머신러닝관련 발표자료
20160409 microsoft 세미나 머신러닝관련 발표자료
 
마인즈랩 발표자료 V1.9_for public
마인즈랩 발표자료 V1.9_for public마인즈랩 발표자료 V1.9_for public
마인즈랩 발표자료 V1.9_for public
 
(MBL310) Alexa Voice Service Under the Hood
(MBL310) Alexa Voice Service Under the Hood(MBL310) Alexa Voice Service Under the Hood
(MBL310) Alexa Voice Service Under the Hood
 
Multi-Factor Auth in Alexa Skills - Faisal Valli
Multi-Factor Auth in Alexa Skills - Faisal ValliMulti-Factor Auth in Alexa Skills - Faisal Valli
Multi-Factor Auth in Alexa Skills - Faisal Valli
 
Google Home
Google HomeGoogle Home
Google Home
 
(MBL301) Creating Voice Experiences Using Amazon Alexa
(MBL301) Creating Voice Experiences Using Amazon Alexa(MBL301) Creating Voice Experiences Using Amazon Alexa
(MBL301) Creating Voice Experiences Using Amazon Alexa
 
Speak Up! Build an Alexa Skill for a Cause
 Speak Up! Build an Alexa Skill for a Cause Speak Up! Build an Alexa Skill for a Cause
Speak Up! Build an Alexa Skill for a Cause
 

Dernier

Dernier (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

The rise of voice platforms - Comparing voice related API's

  • 1. Comparing voice related API’s Christian Rebernik @crebernik7791
  • 2. Voice First Footprint In 2017 there will be 33 mio devices ● The Voice 2017 Report - VoiceLabs analysis combined with research from CIRP, KPCB and InfoScout
  • 3. Voice adoption The ‘Voice First’ era has already started ● Alexa in 4% of US households (end 2016) ● Siri handles over 2bn commands a week ● 20% of Google searches on Android handsets input by voice Alexa Google home Ding Dong
  • 4. Voice Devices Creating an open ecosystem Amazon Echo Skills and Alexa Voices Service Google Home Google Assistant Actions
  • 5. Speech Recognition API Developing for the Amazon Alexa ● Limit understanding Amazon Echo is build for predefined options (e.g. no custom notes). Session is ended after 8 sec. ● Predefined wake word defines the customer experience. Only 4 wake words available and must be in any conversation. ● No notifications and no presence You can’t alert the user of an event. You cannot react on e.g. welcome home. ● No audio / No identification Anybody can use Alexa (guests, etc.) and access all informations
  • 6. Technology Stack Components enabling Voice User Interfaces Implemented use cases leveraging the Hardware and AI Software Software that interprets speech, enables conversations and provide natural voice. Devices the consumer is interacting like Amazon Echo or Google Home Applications AI Software Hardware
  • 7. AI overview 120 companies in Speech Recognition Ventures Scanner, Contact info@venturescanner.com
  • 8. Speech Recognition API Real time speech-to-text API’s Google4 IBM3 Microsoft2 Status Beta Beta/Production Preview Language Support1 43 (89) 8 (14) 6 (7) Cost/min 0,024 € 0,006 / 15sec 0,02 € 0,06 € 1000 calls a 15 sec for 4$ Speaker detection no English (8KHz) no Audio Formats FLAC, Linear16, MULAW, ARM, AMR_WB FLAC, PCM, WAV, OGG, NULAW PCM single channel, Siren, SirenSR Noise Friendly Yes Unkown Unkown Word hints Yes No No 1) Languages support (Languages supported including dialects) 2) Microsoft: https://www.microsoft.com/cognitive-services/en-us/speech-api 3) IBM: http://www.ibm.com/watson/developercloud/speech-to-text.html 4) Google: https://cloud.google.com/speech/
  • 9. ● High audio capturing quality Use lossless coding. Capture audio with 16,000 Hz or higher. Use native sample rate. ● No additional noise API’s include noise reduction. Duplicate noise reduction can reduce the quality. Echo and noise has huge impact on speech recognition quality ● User education Educate user to be close to the microphone ● One speaker per stream. For multi speaker setting try to separate the audio streams as the current API’s are built for dictation ● Provide context Context matters a lot. Provide word hints to help the system to correct detection. Speech Recognition API Best practices
  • 10. Problem Real life - Voice is in the early days Speech-to-text-quality Speaker recognition Language mixing Punctuation
  • 12. We are building a voice first company and are looking for support - Technical Research - Deep Learning & NLP Scientist - Software Engineers Christian Rebernik Contact: christian@6voices.com