The rise of voice platforms - Comparing voice related API's

Comparing voice related API’s
Christian Rebernik
@crebernik7791

Voice First Footprint
In 2017 there will be 33 mio devices
● The Voice 2017 Report - VoiceLabs analysis combined with research from CIRP, KPCB and InfoScout

Voice adoption
The ‘Voice First’ era has already started
● Alexa in 4% of US households
(end 2016)
● Siri handles over 2bn commands
a week
● 20% of Google searches on
Android handsets input by voice
Alexa
Google
home
Ding Dong

Voice Devices
Creating an open ecosystem
Amazon Echo
Skills and Alexa Voices Service
Google Home
Google Assistant Actions

Speech Recognition API
Developing for the Amazon Alexa
● Limit understanding
Amazon Echo is build for predefined options (e.g. no custom notes).
Session is ended after 8 sec.
● Predefined wake word defines the customer experience.
Only 4 wake words available and must be in any conversation.
● No notifications and no presence
You can’t alert the user of an event. You cannot react on e.g. welcome
home.
● No audio / No identification
Anybody can use Alexa (guests, etc.) and access all informations

Technology Stack
Components enabling Voice User Interfaces
Implemented use cases leveraging
the Hardware and AI Software
Software that interprets speech,
enables conversations and provide
natural voice.
Devices the consumer is
interacting like Amazon Echo or
Google Home
Applications
AI Software
Hardware

AI overview
120 companies in Speech Recognition
Ventures Scanner, Contact info@venturescanner.com

Real time speech-to-text API’s
Google4
IBM3
Microsoft2
Status Beta Beta/Production Preview
Language Support1
43 (89) 8 (14) 6 (7)
Cost/min 0,024 €
0,006 / 15sec
0,02 € 0,06 €
1000 calls a 15 sec for 4$
Speaker detection no English (8KHz) no
Audio Formats FLAC, Linear16, MULAW,
ARM, AMR_WB
FLAC, PCM, WAV, OGG,
NULAW
PCM single channel, Siren,
SirenSR
Noise Friendly Yes Unkown Unkown
Word hints Yes No No
1) Languages support (Languages supported including dialects)
2) Microsoft: https://www.microsoft.com/cognitive-services/en-us/speech-api
3) IBM: http://www.ibm.com/watson/developercloud/speech-to-text.html
4) Google: https://cloud.google.com/speech/

● High audio capturing quality
Use lossless coding. Capture audio with 16,000 Hz or higher. Use native sample rate.
● No additional noise
API’s include noise reduction. Duplicate noise reduction can reduce the quality. Echo
and noise has huge impact on speech recognition quality
● User education
Educate user to be close to the microphone
● One speaker per stream.
For multi speaker setting try to separate the audio streams as the current API’s are
built for dictation
● Provide context
Context matters a lot. Provide word hints to help the system to correct detection.
Best practices

Problem
Real life - Voice is in the early days
Speech-to-text-quality
Speaker
recognition
Language mixing
Punctuation

We are building a voice first company
and are looking for support
- Technical Research
- Deep Learning & NLP Scientist
- Software Engineers
Christian Rebernik
Contact: christian@6voices.com

The rise of voice platforms - Comparing voice related API's

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (10)

Dernier

Dernier (20)

The rise of voice platforms - Comparing voice related API's