SlideShare une entreprise Scribd logo
1  sur  73
The State of 
Speech 
Recognition on 
Mobile
Watch the video with slide 
synchronization on InfoQ.com! 
http://www.infoq.com/presentations 
/state-speech-recognition-mobile 
InfoQ.com: News & Community Site 
• 750,000 unique visitors/month 
• Published in 4 languages (English, Chinese, Japanese and Brazilian 
Portuguese) 
• Post content from our QCon conferences 
• News 15-20 / week 
• Articles 3-4 / week 
• Presentations (videos) 12-15 / week 
• Interviews 2-3 / week 
• Books 1 / month
Presented at QCon New York 
www.qconnewyork.com 
Purpose of QCon 
- to empower software development by facilitating the spread of 
knowledge and innovation 
Strategy 
- practitioner-driven conference designed for YOU: influencers of 
change and innovation in your teams 
- speakers and topics driving the evolution and innovation 
- connecting and catalyzing the influencers and innovators 
Highlights 
- attended by more than 12,000 delegates since 2007 
- held in 9 cities worldwide
The future won't be like Star Trek. 
Scott Adams, creator of Dilbert
Why do I care about speech rec?
+ 
= Cape Bretoner
Here's a conversation between two Cape 
Bretoners 
P1: jeet? 
P2: naw, jew? 
P1: naw, t'rly t'eet bye.
And here's the translation 
P1: jeet? 
P1: Did you eat? 
P2: naw, jew? 
P2: No, did you? 
P1: naw, t'rly t'eet bye. 
P1: No, it's too early to eat buddy.
Regular Alphabet 
26 letters 
Cape Breton 
Alphabet 
12 letters!
Alright, enough about me
What is speech 
recognition?
Speech recognition is the 
process of translating the 
spoken word into text.
The process of speech rec 
includes...
Record and digitize the audio 
data
Perform end pointing 
(trimming)
Split data into phonemes
What is a phoneme? 
It is a perceptually distinct 
units of sound in a specified 
language that distinguish one 
word from another.
The English language has 44 
distinct sounds 
Source: English language phoneme chart
By comparison, the Rotokas 
speakers in Papua New Guinea 
have 11 phonemes. 
But the !Xóõ speakers who 
mostly live in Botswana have 
112 phonemes.
Apply the phonemes to the 
recognition model. This is a 
massive lexicon which takes 
into account all of the different 
ways words can be 
pronounced.
Analyze the results against the 
grammar
Return a confidence weighted 
result 
[ 
{ 
"confidence": 0.97335243225098, 
"transcript": "hello" 
}, 
{ 
"confidence": 0.19940405040800, 
"transcript": "hell low" 
}, 
{ 
"confidence": 0.19910827091000, 
"transcript": "how low" 
} 
]
Basically...
We want it to be like this 
0:02
but more often than not... 
0:25
Why is that? 
When two people talk 
comprehension rates are better 
than 97%
A really good english language 
speech recognition system is 
right 92% of the time
Where does that extra 5% in 
error rate come from? 
Vocabulary size and confusability 
Speaker dependence vs independence 
Isolated or continuous speech 
Initiated vs spontaneous speech 
Adverse conditions
Mobile Speech Recognition 
OS Application SDK 
Android Google Now Java API 
iOS Siri Many 3rd party Obj-C SDK's 
Windows Phone Cortana C# API
So how do we 
add speech rec 
to our app?
You may look at the W3C 
Speech API Specification
but only Chrome on the 
desktop has implemented that 
spec
But that's okay!
The spec looks like this: 
interface SpeechRecognition : EventTarget { 
// recognition parameters 
attribute SpeechGrammarList grammars; 
attribute DOMString lang; 
attribute boolean continuous; 
attribute boolean interimResults; 
attribute unsigned long maxAlternatives; 
attribute DOMString serviceURI; 
// methods to drive the speech interaction 
void start(); 
void stop(); 
void abort(); 
};
With additional event methods 
to control behaviour: 
attribute EventHandler onaudiostart; 
attribute EventHandler onsoundstart; 
attribute EventHandler onspeechstart; 
attribute EventHandler onspeechend; 
attribute EventHandler onsoundend; 
attribute EventHandler onaudioend; 
attribute EventHandler onresult; 
attribute EventHandler onnomatch; 
attribute EventHandler onerror; 
attribute EventHandler onstart; 
attribute EventHandler onend;
Let's recognize some speech 
var recognition = new SpeechRecognition(); 
recognition.onresult = function(event) { 
if (event.results.length > 0) { 
var test1 = document.getElementById("test1"); 
test1.innerHTML = event.results[0][0].transcript; 
} 
}; 
recognition.start(); 
Click to Speak 
Replace me...
So that's pretty 
cool...
...if taking dictation gets you 
going
But I want to do 
something more 
exciting with the 
result
Let's do something a little less 
trivial 
recognition.onresult = function(event) { 
var result = event.results[0][0].transcript; 
var music = document.getElementById("music"); 
switch(result) { 
case "jazz": 
music.src="jazz.mp3"; 
music.play(); 
break; 
case "rock": 
music.src="rock.mp3"; 
music.play(); 
break; 
case "stop": 
default: 
music.pause(); 
} 
}; 
Click to Speak
Which seems 
much cooler to 
me
Let's ask the web a question 
Click to Speak
Works pretty 
good... 
...but ugly!
Let's style our 
button with some 
CSS
+ 
= 
<a class="speechinput"> 
<img src="images/mic.png"> 
</a> 
#speechinput input { 
cursor:pointer; 
margin:auto; 
margin:15px; 
color:transparent; 
background-color:transparent; 
border:5px; 
width:15px; 
-webkit-transform: scale(3.0, 3.0); 
}
And we'll add some color using 
Speech 
Bubbles 
by Nicholas Gallagher 
Pure-CSS-Speech-Bubbles
Then pull it all 
together!
But wait, why am 
I using my eyes 
like a sucker?
We'll output the answer using 
SpeechSynthesis
The SpeechSynthesis spec 
looks like this: 
interface SpeechSynthesis { 
readonly attribute boolean pending; 
readonly attribute boolean speaking; 
readonly attribute boolean paused; 
void speak(SpeechSynthesisUtterance utterance); 
void cancel(); 
void pause(); 
void resume(); 
SpeechSynthesisVoiceList getVoices(); 
};
The SpeechSynthesisUtterance 
spec looks like this: 
interface SpeechSynthesisUtterance : EventTarget { 
attribute DOMString text; 
attribute DOMString lang; 
attribute DOMString voiceURI; 
attribute float volume; 
attribute float rate; 
attribute float pitch; 
};
With additional event methods 
to control behaviour: 
attribute EventHandler onstart; 
attribute EventHandler onend; 
attribute EventHandler onerror; 
attribute EventHandler onpause; 
attribute EventHandler onresume; 
attribute EventHandler onmark; 
attribute EventHandler onboundary;
Plugin repo's 
SpeechRecognitionPlugin - 
https://github.com/macdonst/SpeechRecognitionPlugin 
SpeechSynthesisPlugin - 
https://github.com/macdonst/SpeechSynthesisPlugin
Availability 
OS Recognition Synthesis 
Android ✓ ✓ 
iOS* Active development Native to iOS 7.0 
Windows Phone × × 
* Working with Julio César (@jcesarmobile) to get iOS done
Getting started 
cordova create speech com.example.speech speech 
cd speech 
cordova build android 
cordova local plugin add https://github.com/macdonst/SpeechRecognitionPlugin 
cordova local plugin add https://github.com/macdonst/SpeechSynthesisPlugin 
cordova install android
For more information on hybrid 
applications 
Check out 
Christophe 
presentation on 
Coenraets 
Creating Native-Like Mobile 
Apps with AngularJS, Ionic and 
Cordova 3:00pm today right 
here in Salon C.
But wait, one 
more thing...
Speech recognition and speech 
synthesis are not well 
supported in the emulator 
and sometimes developing on 
the device can be a bit of a 
pain.
That's why I coded 
speechshim.js 
https://github.com/macdonst/SpeechShim
Chrome + speechshim.js 
= 
W3C Web Speech API on your 
desktop
Watch the video with slide synchronization on 
InfoQ.com! 
http://www.infoq.com/presentations/state-speech- 
recognition-mobile

Contenu connexe

En vedette

Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemREHMAT ULLAH
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challengesAlexandru Chica
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionRichie
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 

En vedette (7)

Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Plus de C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

Plus de C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Dernier

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

The state of speech recognition on mobile

  • 1. The State of Speech Recognition on Mobile
  • 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /state-speech-recognition-mobile InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. The future won't be like Star Trek. Scott Adams, creator of Dilbert
  • 9.
  • 10. Why do I care about speech rec?
  • 11.
  • 12. + = Cape Bretoner
  • 13. Here's a conversation between two Cape Bretoners P1: jeet? P2: naw, jew? P1: naw, t'rly t'eet bye.
  • 14. And here's the translation P1: jeet? P1: Did you eat? P2: naw, jew? P2: No, did you? P1: naw, t'rly t'eet bye. P1: No, it's too early to eat buddy.
  • 15. Regular Alphabet 26 letters Cape Breton Alphabet 12 letters!
  • 17. What is speech recognition?
  • 18. Speech recognition is the process of translating the spoken word into text.
  • 19. The process of speech rec includes...
  • 20. Record and digitize the audio data
  • 21. Perform end pointing (trimming)
  • 22. Split data into phonemes
  • 23. What is a phoneme? It is a perceptually distinct units of sound in a specified language that distinguish one word from another.
  • 24. The English language has 44 distinct sounds Source: English language phoneme chart
  • 25. By comparison, the Rotokas speakers in Papua New Guinea have 11 phonemes. But the !Xóõ speakers who mostly live in Botswana have 112 phonemes.
  • 26. Apply the phonemes to the recognition model. This is a massive lexicon which takes into account all of the different ways words can be pronounced.
  • 27. Analyze the results against the grammar
  • 28. Return a confidence weighted result [ { "confidence": 0.97335243225098, "transcript": "hello" }, { "confidence": 0.19940405040800, "transcript": "hell low" }, { "confidence": 0.19910827091000, "transcript": "how low" } ]
  • 30.
  • 31. We want it to be like this 0:02
  • 32. but more often than not... 0:25
  • 33. Why is that? When two people talk comprehension rates are better than 97%
  • 34. A really good english language speech recognition system is right 92% of the time
  • 35. Where does that extra 5% in error rate come from? Vocabulary size and confusability Speaker dependence vs independence Isolated or continuous speech Initiated vs spontaneous speech Adverse conditions
  • 36. Mobile Speech Recognition OS Application SDK Android Google Now Java API iOS Siri Many 3rd party Obj-C SDK's Windows Phone Cortana C# API
  • 37. So how do we add speech rec to our app?
  • 38. You may look at the W3C Speech API Specification
  • 39. but only Chrome on the desktop has implemented that spec
  • 41. The spec looks like this: interface SpeechRecognition : EventTarget { // recognition parameters attribute SpeechGrammarList grammars; attribute DOMString lang; attribute boolean continuous; attribute boolean interimResults; attribute unsigned long maxAlternatives; attribute DOMString serviceURI; // methods to drive the speech interaction void start(); void stop(); void abort(); };
  • 42. With additional event methods to control behaviour: attribute EventHandler onaudiostart; attribute EventHandler onsoundstart; attribute EventHandler onspeechstart; attribute EventHandler onspeechend; attribute EventHandler onsoundend; attribute EventHandler onaudioend; attribute EventHandler onresult; attribute EventHandler onnomatch; attribute EventHandler onerror; attribute EventHandler onstart; attribute EventHandler onend;
  • 43. Let's recognize some speech var recognition = new SpeechRecognition(); recognition.onresult = function(event) { if (event.results.length > 0) { var test1 = document.getElementById("test1"); test1.innerHTML = event.results[0][0].transcript; } }; recognition.start(); Click to Speak Replace me...
  • 44. So that's pretty cool...
  • 45. ...if taking dictation gets you going
  • 46. But I want to do something more exciting with the result
  • 47. Let's do something a little less trivial recognition.onresult = function(event) { var result = event.results[0][0].transcript; var music = document.getElementById("music"); switch(result) { case "jazz": music.src="jazz.mp3"; music.play(); break; case "rock": music.src="rock.mp3"; music.play(); break; case "stop": default: music.pause(); } }; Click to Speak
  • 48. Which seems much cooler to me
  • 49. Let's ask the web a question Click to Speak
  • 50. Works pretty good... ...but ugly!
  • 51. Let's style our button with some CSS
  • 52. + = <a class="speechinput"> <img src="images/mic.png"> </a> #speechinput input { cursor:pointer; margin:auto; margin:15px; color:transparent; background-color:transparent; border:5px; width:15px; -webkit-transform: scale(3.0, 3.0); }
  • 53. And we'll add some color using Speech Bubbles by Nicholas Gallagher Pure-CSS-Speech-Bubbles
  • 54. Then pull it all together!
  • 55.
  • 56. But wait, why am I using my eyes like a sucker?
  • 57. We'll output the answer using SpeechSynthesis
  • 58. The SpeechSynthesis spec looks like this: interface SpeechSynthesis { readonly attribute boolean pending; readonly attribute boolean speaking; readonly attribute boolean paused; void speak(SpeechSynthesisUtterance utterance); void cancel(); void pause(); void resume(); SpeechSynthesisVoiceList getVoices(); };
  • 59. The SpeechSynthesisUtterance spec looks like this: interface SpeechSynthesisUtterance : EventTarget { attribute DOMString text; attribute DOMString lang; attribute DOMString voiceURI; attribute float volume; attribute float rate; attribute float pitch; };
  • 60. With additional event methods to control behaviour: attribute EventHandler onstart; attribute EventHandler onend; attribute EventHandler onerror; attribute EventHandler onpause; attribute EventHandler onresume; attribute EventHandler onmark; attribute EventHandler onboundary;
  • 61.
  • 62. Plugin repo's SpeechRecognitionPlugin - https://github.com/macdonst/SpeechRecognitionPlugin SpeechSynthesisPlugin - https://github.com/macdonst/SpeechSynthesisPlugin
  • 63. Availability OS Recognition Synthesis Android ✓ ✓ iOS* Active development Native to iOS 7.0 Windows Phone × × * Working with Julio César (@jcesarmobile) to get iOS done
  • 64. Getting started cordova create speech com.example.speech speech cd speech cordova build android cordova local plugin add https://github.com/macdonst/SpeechRecognitionPlugin cordova local plugin add https://github.com/macdonst/SpeechSynthesisPlugin cordova install android
  • 65. For more information on hybrid applications Check out Christophe presentation on Coenraets Creating Native-Like Mobile Apps with AngularJS, Ionic and Cordova 3:00pm today right here in Salon C.
  • 66. But wait, one more thing...
  • 67. Speech recognition and speech synthesis are not well supported in the emulator and sometimes developing on the device can be a bit of a pain.
  • 68. That's why I coded speechshim.js https://github.com/macdonst/SpeechShim
  • 69. Chrome + speechshim.js = W3C Web Speech API on your desktop
  • 70.
  • 71.
  • 72.
  • 73. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/state-speech- recognition-mobile