Contenu connexe
Similaire à AT&T 2012 DevLab Speech API Deep Dive (20)
AT&T 2012 DevLab Speech API Deep Dive
- 2. September 25, 2012
AT&T SPEECH API DEEP DIVE
Michael Owens (@mko on Twitter, mowens on Github)
Jay Lieske ( jay.lieske@att.com, jayatyp on Github)
AT&T Developer Program
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
- 3. WHAT IS THE
AT&T SPEECH API?
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 4. How the
AT&T
Speech
API Works
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 5. Powered by AT&T WATSON℠
• Developed 20+ years
• Optimized for different usage scenarios:
• Web Search
• Business Search
• Question & Answer
• Voicemail-to-Text
• Short Message (SMS)
• TV Search/Remote (U-Verse)
• Generic Speech-to-Text
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 6. Simple Speech-to-Text
• One REST endpoint
• Accepts audio in WAV or AMR
• Structured JSON response
• Text spoken by user
• Metrics to evaluate recognition quality
• AT&T Native SDKs for Android and iOS
handle audio capture and streaming
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 7. Apps in the Wild
AT&T-Translator Speak4it U4Verse-Easy-Remote
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 8. GETTING STARTED
WITH THE AT&T
SPEECH API
3
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 9. Sign Up for API Access
• j.mp/ATTDevSignUp
• Free API Access for
DevLab Attendees
• Detailed Instructions in
your Attendee Packet
• Sign up with code
“APILAB12”
• AT&T Staff is on hand to
answer questions and
help get you set up
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 10. Before You Code
• Get your API Keys from Developer portal:
• Client ID (“API Key” on the AT&T Developer Portal)
• Client Secret (“Secret Key” on the AT&T Developer Portal)
• OAuth 2.0 client_credentials grant type
• OAuth 2.0 access_token
• Audio File Types:
• AMR: narrowband, 12.2 kbits/s, 8 kHz sampling
• WAV: 16 bit PCM WAV, single channel, 8 kHz sampling
• Audio File Length:
• Voicemail: 4 minutes or less
• Other: 1 minute or less
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 11. Step 1: Connect via OAuth
Request Method: POST
Request URL: https://api.att.com/oauth/token
Request Headers: Content-Type: application/x-www-form-
urlencoded
Request Body: client_id=ATT_API_CLIENT_ID
&client_secret=ATT_API_CLIENT_SECRET
&grant_type=client_credentials
&scope=SPEECH
Response Body: {
"access_token": "xxyz123"
}
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 12. Step 2: POST Audio to AT&T
(Non-Streaming HTTP Request)
Request Method: POST
Request URL: https://api.att.com/rest/1/SpeechToText
Request Headers: Accept: application/json
Authorization: Bearer xxyz123
Content-Type: audio/wav
Content-Length: 1534
X-SpeechContext: BusinessSearch
Request Body: AUDIO_BINARY_DATA
Note: The Audio Binary Data
goes directly in POST Body,
not a MIME Attachment.
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 13. Step 2: POST Audio to AT&T
(Streaming HTTP Request)
Request Method: POST
Request URL: https://api.att.com/rest/1/SpeechToText
Request Headers: Accept: application/json
Authorization: Bearer xxyz123
Content-Type: audio/amr
Transfer-Encoding: chunked
X-SpeechContext: QuestionAndAnswer
Request Body: 200
Note: Numbers are the AUDIO_BINARY_DATA_CHUNK
recommended chunk size 200
in hexadecimal format. AUDIO_BINARY_DATA_CHUNK
0
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 14. AT&T SPEECH API
EXAMPLE
APPLICATION
Download the Source:
https://github.com/attdevsupport/2012DevLabExamples
4
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 15. Transcription in Three Steps
1. Capture Audio Input 2. POST Audio to AT&T 3. Use AT&T API Response
Capturing audio input differs Once the audio input has been The AT&T API sends back a very
from platform to platform. captured, we send the easy to parse JSON object with
compatible audio file from our the interpreted text.
In our Basic Example, we use a server to the Speech API using
small Adobe Flex app to access In our Basic example, we
a simple POST.
the mic via Flash, capture the output this to the user’s screen
audio in one of the two In our Basic Example, we use a pretty printed and syntax
accepted formats, then save small Node.js module called highlighted, but you could do
that newly created audio file to “Watson.js” (NPM: “watson-js”) much more.
disk on the server. to OAuth to the Speech API
In our Speech Labs, we will look
and then POST the audio file.
In our Speech Labs, we will look at other ways to use this data,
at the methods by which you In our Speech Labs, we will do like searching for businesses
can capture and stream audio this on iOS, Android, and Web. on Foursquare.
directly to the Speech API.
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 16. Watson.js
Node.js API Wrapper for the AT&T
Speech API
GitHub: http://github.com/mowens/watson-js/
NPM: https://npmjs.org/package/watson-js
5
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 17. Using Watson.js
1. Require API Wrapper
var WatsonClient = require(‘watson-js’);
2. Set API Client Options
var options = {
client_id: ATT_API_CLIENT_ID,
client_secret: ATT_API_CLIENT_SECRET,
access_token: ACCESS_TOKEN,
scope: "SPEECH",
context: "Generic",
access_token_url: "https://api.att.com/oauth/token",
api_domain: "api.att.com"
};
3. Instantiate New API Client
var Watson = new WatsonClient.Watson(options);
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 18. The Methods of Watson.js
Watson.getAccessToken(callback)
Method for requesting a new OAuth Access Token using
the Client Credentials grant type and passes the returned
Access Token to the passed callback function.
Watson.speechToText(speechFile, accessToken, callback)
Method for piping a speech file (passed as an absolute file
location) to the AT&T Speech API using the passed access
token. The API Response’s JSON is returned to the passed
callback function as parsed JSON.
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 19. AT&T SPEECH API
EXAMPLE APP CODE
WALKTHROUGH
Using the AT&T Speech API to convert
generic audio to text in a web browser.
example-basic in the examples repo
6
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 20. Frameworks &
Requirements:
Server-side:
• Node.js: JavaScript platform for building fast, scalable network apps
• FS: Node.js File System module
• Express: Minimal web application framework for Node.js
• Optimist: Lightweight option parsing module for Node.js
• HBS: Express View Engine wrapper for Handlebars
• Watson.js: Simple API Wrapper for AT&T Speech API
Client-side:
• jQuery: The gold standard of client-side JavaScript libraries
• swfobject: JavaScript to make embedding Flash objects easier
• Bootstrap: Twitter’s CSS framework for quickly developing web apps
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 21. Capture Audio Input
recorder.swf:
Adobe Flex app that accesses the user’s microphone and emits events to JS
recorder.js:
JavaScript interface to receive events, update UI, and POST file to Node.js
Node.js upload script:
function cp(source, destination, callback) {
fs.readFile(source, function(err, buf) {
fs.writeFile(destination, buf, callback);
});
}
app.post('/upload', function(req, res) {
cp(req.files.upload_file.filename.path, __dirname +
req.files.upload_file.filename.name, function(err) {
res.send({ saved: 'saved' });
return;
});
});
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 22. POST Audio to AT&T
AJAX Request via POST from client side to Node.js
// Receive an AJAX POST from client-side JavaScript
app.post('/speechToText', function(req, res) {
// Pass the audio file and access token to AT&T Speech API
Watson.speechToText(__dirname + '/public/audio/audio.wav',
this.access_token, function(err, reply) {
// Pass any errors associated with API call to client-side JS
if(err) { res.send({ error: err }); return; }
// Return the parsed JSON to client-side JavaScript
res.send(reply);
return;
});
});
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 23. Use Speech API Response
Example API Response, returned Response-
What-The-Response-Parameter-Means
from call using Content-Type of Parameter
‘application/json’: Recognition Body"object"for"the"AT&T"Speech"API"Response
ResponseId Unique"IdenGfier"for"a"specific"API"call
Array"of"hypothesis"objects"(possible"
{ NBest
transcripGons"of"audio"data).
"Recognition": {
PlainKtext,"cleaned"up"representaGon"of"the"
"ResponseId": "74a964bf2fe", ResultText Hypothesis."This"should"be"used"when"displaying"
"NBest": [ { the"text"to"users."
"WordScores": [1, 0.75, 1, 0.75], Confidence"score"for"the"overall"Hypothesis."
"Confidence": 0.75, Confidence Scored"on"a"scale"from"0"(not"confident)"to"1.0"
(very"confident)
"Grade": "accept",
Recommended"acGon"to"take"with"the"current"
"ResultText": "This is a test.", Grade
Hypothesis:"accept,"reject,"or"confirm
"Words": [“This”, “is”, “a”, Array"of"the"individual"words."Confidence"scores"
“test.”], Words for"each"word"are"available"in"the"WordScores"
"LanguageId": "en-us", array."
"Hypothesis": "This is a test." Array"of"individual"confidence"scores"for"each"
WordScores word"in"the"ResultText"parameter."Corresponds"
} ] to"Words"array.
} RepresentaGon"of"the"response"language."
} LanguageId Supports"English"&"Spanish"in"Generic;"EnglishK
only"in"other"contexts.
The"raw"transcripGon"of"the"audio"that"was"
Hypothesis
interpreted.
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 24. Up Next:
Michael Fitzpatrick
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 25. Up Next:
Jason Goecke
Adam Kalsey
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 26. ADVANCED
EXAMPLES
What can you do with Speech-to-text?
You could…
• Make your mobile or web application accessible with voice commands
• Post tweets using voice commands in a simple Twitter app
• Add on-the-fly transcripts while recording in a podcasting app
• Add captioning to videos hosted on your website automatically
• Create real-time closed captions of a conference speaker’s presentation
• Search for nearby places to check in at on Foursquare
7
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 27. Speech Labs
We’re now going to break out into three clusters, each focusing on a
different technology stack. Work independently or with a partner!
Web (Flex + Node.js) iOS (Objective-C) Android (Java)
In the Web Speech Lab, Michael In the iOS Speech Lab, Brant In the Android Speech Lab, Jay
will be on hand to help get your will help you try out the AT&T will help you try out the AT&T
Node.js app working with the Speech API on iOS and go into Speech API on Android and go
AT&T Speech API. Code up your more depth about the AT&T into more depth about the
own Speech API app from Speech SDK for iOS. AT&T Speech SDK for Android.
scratch, or you can start from a The mobile SDK allows you to The mobile SDK allows you to
boilerplate app that uses quickly capture and stream quickly capture and stream
Foursquare to search for audio from your iPhone or iPad audio from your Android
locations and allow you to app to the AT&T Speech API. phone or tablet app to the
check-in from your web AT&T Speech API.
browser!
2
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
- 28. September 25, 2012
THANKS! ANY QUESTIONS?
Michael Owens (@mko on Twitter, mowens on Github)
Jay Lieske ( jay.lieske@att.com, jayatyp on Github)
AT&T Developer Program
©"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.