3. 70s 80s 90s 00s Present
mode
GUI
web
mobile
character
VUI
4.
5. WE BELIEVE VOICE REPRESENTS
THE NEXT MAJOR DISRUPTION IN COMPUTING
6. CONVERSATION IS THE MOST NATURAL WAY TO ENGAGE
WITH YOUR PRODUCTS
VOICE RELEASES THE FRICTION OF TRADITIONAL
TECHNOLOGY INTERACTION
USERS CAN NOW INTERACT WITH YOUR PRODUCT IN A
MORE INTIMATE WAY
13. Create Great Content:
ASK is how you connect
to your consumer
THE ALEXA SERVICE
Supported by two powerful SDKs
ALEXA
VOICE
SERVICE
Unparalleled Distribution:
AVS allows your content
to be everywhere
Lives In The Cloud
Automated Speech
Recognition (ASR)
Natural Language
Understanding (NLU)
Always Learning
ALEXA
SKILLS
KIT
20. Key Design Principles for
ALEXA SKILLS
Skills Should Provide High Value
A Skill Should Evolve Over Time
Users Can Speak to Your Skill Naturally and
Spontaneously
Alexa Should Understand Most Requests to
Your Skill
A Skill Should Respond in an Appropriate Way
21. High Utility Low Utility
Doing
Performs a Task
“Alexa, ask Scout to arm
away mode.”
“Away mode armed. You
have 45 seconds to leave
the house.”
Searching
Identifies specific info
“Alexa, ask Vendor if there
are Pearl Jam tickets
available for this
weekend.”
“There are a limited
number of tickets, ranging
from $49 to $279.”
Telling
Provides a quick reference
point
“Alexa, tell me a cat fact.”
“It is well known that dogs
are superior to cats.”
Browsing
Gives info on a broad
subject
“Alexa, ask Amazon what’s
on sale.”
“The following items are on
sale right now...”
22. Example of Automatic Learning
ALEXA SKILL
Alexa, launch Travel Buddy
Hi, I’m Travel Buddy. I can easily tell you about your
daily commute. Let’s get you set up. Where are you
starting from?
Las Vegas
Okay, and where are you going?
Los Angeles
Great, now whenever you ask, I can tell you about the
commute from Las Vegas to Los Angeles. The current
drive time is four hours and fourty-two minutes. There
is an accident on I-15 near Pasadena.
Alexa, launch Travel Buddy
Your commute is currently four hours and two minutes.
User engages skill
Give traffic information
Get destination
location
Do we have their
destination location?
Do we have their home
location?
Get home
location
YesNo
Is their home/destination
set up?
Give traffic information
No Yes
No Yes
23. AVOID FEATURE CREEP. KEEP IT SIMPLE
Don’t overwhelm your users with features out of the box. Voice is a new way for users to interact
with your product. Keep it simple and grow from there.
AS NATURAL CONVERSATION AS POSSIBLE
Try to make your utterances as natural as they possibly can. Top Tip: Have a real world
conversations with one another to create these.
CORE BUSINESS FUNCTIONALITY AS A MINIMUM
It’s important to do the fundamentals right. If you are a news company. Your users will naturally
expect you to at least provide the news. Do the extra features later.
UTILIZE THE BUILT IN LIBRARY
There are hundreds of entities that Alexa can understand using the Built-In library. You can handle
this in your skill by simply including them in your interaction model and respond with a useful
response.
VOICE DESIGN TOP TIPS
24. WHERE DO YOU START?
The Evolution of a Skill
Traffic Skill Example
Give an estimated time of
arrival from home to work.
Traffic Skill Example
Include accidents, construction
and closures on route.
Traffic Skill Example
Proactively alert user to delays
and provide alternate routes.
R U N
Evolve Over Time
CRAWL
What’s Your Core Functionality?
ANALYZE USER FEEDBACK
& OPTIMIZE SKILL
WALK
Expand Capabilities & Features
INNOVATE FOR
CUSTOMERS
26. UNDER THE HOOD OF THE ALEXA SKILLS KIT
A closer look at how the Alexa Skills Kit processes a request and returns an appropriate response
You Pass Back a Textual or
Audio Response
You Pass Back a Graphical
Response
Alexa Converts Text-to-Speech
(TTS) & Renders Graphical
Component
Respond to Intent
through Text & Visual
Alexa sends Customer
Intent to Your Service
Your Service
processes
Request
User Makes a
Request
Audio Stream is
sent up to Alexa Alexa Identifies Skill & Recognizes
Intent Through ASR & NLU
28. WHAT COMPONENTS MAKE UP A SKILL
Skills are made up of two components
Skill configuration in the Amazon Developer Portal. Our
Voice Interaction Model
and
Your skill code, hosted in AWS Lambda or your own
HTTPS endpoint. Our hosted service.
30. INVOCATION NAMES
Invocation names are how we know to route traffic
to your particular skill.
Interactions can be either:
One Shot – open your skill and perform an action
such as ‘Alexa, ask National Rail for my commute’
Conversational – Alexa, ask National Rail to set up
my commute’ - ‘OK, what is your regular departure
station’ – ‘Birmingham New Street’
Open Only – Alexa, open National Rail
Your skill can support all of these, it’s not one or
the other.
‘Alexa, ask National Rail for my commute’
Alexa, open Just Eat
Alexa tell Uber to get me a ride
Alexa, launch Cat Facts
Alexa, play RuneScape
31. INTENTS AND SLOTS
You define interactions for your voice app through
intent schemas
Each intent consists of two fields. The intent field
gives the name of the intent. The slots field lists the
slots associated with that intent.
Slots can also included types such as LITERAL,
NUMBER, DATE, etc.
intent schemas are uploaded to your skill in the
Amazon Developer Portal
{
"intents": [
{
"intent": "tubeinfo",
"slots": [
{
"name": "LINENAME",
"type": "LINENAMES"
}
]
}
]
}
32. CUSTOM SLOTS
Custom Slots increase the accuracy of Alexa when
identifying an argument within an intent.
They are created as a line separated list of values
It is recommended to have as many possible slots
as possible.
There are some built in slots for things such as
GB.City and GB.FirstName
bakerloo
central
circle
district
hammersmith and city
jubilee
metropolitan
northern
piccadilly
victoria
waterloo and city
london overground
tfl rail
DLR
33. Intents for human driven events such as: Cancel, Play, Pause, Repeat, Stop
or Help
Intents across multiple categories including: Books, Calendar, Cinema
Showtimes, General, Local Search, Music, Video, and Weather
Slots for Numbers, Dates, Times and List Types
AMAZON.DATE – converts words that indicate dates (“today”, “tomorrow”, or
“July”) into a date format (such as “2015-07-00T9”).
34. SAMPLE UTTERANCES
The mappings between intents and the typical utterances that
invoke those intents are provided in a tab-separated text document
of sample utterances.
Each possible phrase is assigned to one of the defined intents.
tubeinfo are there any disruptions on the {LINENAME} line
tubeinfo {LINENAME} line
“What is…”
“Are there…”
“Tell me…”
“Give me…”
“Give…”
“Find…”
“Find me…”
35. PUTTING IT ALL TOGETHER
tubeinfo are there any delays on the {LINENAME} line
{
"intent": "tubeinfo",
"slots": [
{
"name": "LINENAME",
"type": "LINENAMES"
}
]
}
bakerloo
central
. . .
Utterance
Intent
Slots
37. REQUEST TYPES
LaunchRequest
Occurs when the users launch the app without specifying
what they want
IntentRequest
Occurs when the user specifies an intent
SessionEndedRequest
Occurs when the user ends the session
38. AN EXAMPLE REQUEST
If hosting your own service, you will need to handle
POST requests to your service over port 443 and
parse the JSON
With AWS Lambda, the event object that is passed
when invoking your function is equal to the request
JSON
Requests always include a type, requestId, and
timestamp
If an IntentRequest they will include the intent and
its slots
type maps directly to LaunchRequest,
IntentRequest, and SessionEndedRequest
"request": {
"type": "IntentRequest",
"requestId": "string",
"timestamp":"2016-05-13T13:19:25Z",
"intent": {
"name": "tubeinfo",
"slots": {
"LINENAME": {
"name": "LINENAME",
"value": "circle"
}
}
},
"locale": "en-GB"
}
39. AN EXAMPLE RESPONSE
Your app will need to build a response object that
includes the relevant keys and values.
The alexa-sdk for Node.js makes this super simple.
ouputSpeech, card and reprompt are the supported
response objects.
ShouldEndSession is a boolean value that
determines wether the conversations is complete
or not.
You can also store session data in the Alexa Voice
Service. These are in the sessionAttributes object.
{
"version": "1.0",
"response": {
"outputSpeech": {
"type": "SSML",
"ssml": "<speak>There are
currently no delays on the circle
line.</speak>"
},
"shouldEndSession": true
},
"sessionAttributes": {}
}
40. CHANGING ALEXA’S INFLECTION WITH SSML
• Alexa automatically handles normal punctuation, such as
pausing after a period, or speaking a sentence ending in a
question mark as a question.
• Speech Synthesis Markup Language (SSML) is a markup
language that provides a standard way to mark up text for the
generation of synthetic speech.
• Tags supported include: speak, p, s, break, say-as, phoneme,
w and audio.
41. Existing Customer with
ACCOUNT LINKING
• Allow your customers to link their existing
accounts with you, to Alexa.
• Customers are prompted to log in to your
site using their normal credentials with
webview url you provide.
• You authenticate the customer and
generate an access token that uniquely
identifies the customer and link the
accounts.
44. VOICE ENABLE ALL THE THINGS WITH ALEXA
M a r k B a t e
Solutions Architect, Alexa Skills Kit
@markbate
markbate@amazon.com
Notes de l'éditeur
Let’s look at a history of user interfaces.
The evolution of user interfaces, every decade there is a new technology disruption.
Voice represents the latest interface.
ASR is at approx 95% accuracy today, we are striving for 99%
Voice is Everywhere! It’s the most natural user interface, it’s how were taught as children to communicate. We were’nt born with keyboards and or smart phones in our hands.
The idea is to put lots of voice enabled devices throughout the house
Mention how heavily invested we are in this area.
Talk about far field recognition. Investment in the first party device hardware and the newly released reference architecture for the microphone array.
Explain what skills are, and that we now have over 7,000 in the US
Thousands in the UK
Echo 2014
Hand’s free in noisy rooms – 7 mics
Echo v. Dot
Tap, FireTV, Tablets
The Echo is the first and best-known endpoint of the Alexa Ecosystem. We released Echo in 2014 to allow customers to engage with Alexa and control their home via voice. Alexa and The Echo device was built to make life easier and more enjoyable.
The Echo and the Echo Dot are what we call far-field Alexa devices.
You interact with them in a completely hand’s free way from anywhere in the room…even if that room is noisy.
They include a 7 microphone mic-array with advanced beam-forming and noise cancelling technology.
The difference between Echo and Echo Dot is simple: Echo has a powerful built-in speaker that provides room filling sound.
Echo Dot is smaller and contains a less powerful speaker and works great when connected to another audio system. Both include the same array microphone and are otherwise functionally identical
Alexa is also available other Amazon devices including Tap, our portable, battery powered speaker. Alexa is available on Fire TV via the push-to talk remote control that comes with it.
And just last week we announced Alexa on Amazon’s Fire Tablets.
Ideal for Smart Home and the CEDIA channel
Every room
Available in 6-packs and 12-packs
We will be enabling CEDIA partners to order them in bulk
Echo: White & Black
On Wednesday, we made a bunch of announcements. Hopefully you heard the news, but if you didn’t here's a recap:
First, we announced the general availability of the all-new Echo Dot, in two colors (White & Black).
The Echo Dot sells for $49 each. We’re also offering them in 6-packs and 12-packs. Buy 5 and get one free or buy 10 and get 2 free.
As you can see, we’ve engineered and priced it so that you can put them in every one of your customers’ rooms!
Next, we announced the original Echo is now available in White as well as black.
Those of you who deal in whole-home and multi-zone audio systems know that sound travels between rooms in the real world.
When a customer speaks to Alexa, when there are multiple Echo devices scattered throughout the home, you want Alexa to do the right thing and only respond in the room the customer is actually in. To enable this we’ve invented a new technology that was also announced on Weds: Echo Spatial Perception which goes by the nice acronym ESP. ESP gives Alexa the ability to perceive spatial relationships within the home.
Finally, we announced that our Alexa devices are available to customers in the UK and Germany, available for the first time outside the US.
Echo Spatial Perception
This shows how we are constantly innovating and enhancing the product.
I previously mentioned the Alexa ecosystem, so let’s spend a little time talking about what that means.
The Alexa Ecosystem is supported by two important frameworks that provide unparalleled distribution and ways to connect with your customer.
On one side we have ASK (Alexa Skills Kit) which empowers brands and developers to create rich voice experiences for their consumers;
On the other side is AVS (Alexa Voice Service), which ensures that the places that Alexa can go are endless
All you Have to Do Is ASK (What is the Alexa Skills Kit?)
The ASK is our SDK, read human….our way of making the voice experience via Alexa possible.
ASK gives you the ability to create new voice-driven capabilities (also known as skills, think Apps) for Alexa using the new Alexa Skills Kit (ASK).
You can connect existing services to Alexa in minutes with just a few lines of code.
You can also build entirely new voice-powered experiences in a matter of hours, even if you know nothing about speech recognition or natural language processing.
AVS: Serving a Platform Agnostic Voice Experience
Let’s start with AVS, it’s through the Alexa Voice Service that, hardware manufacturers and other participants in the new and exciting world of the Internet of Things (IoT) can incorporate an Alexa-driven voice experience into their devices.
Any device that has a speaker, a microphone, and an Internet connection can integrate Alexa with a few lines of code.
This enables a platform agnostic growth strategy that ends with your consumer having one, if not multiple seamless touch-points to a world by voice.
Just imagine what that means…
While right now there are two Amazon provided endpoints, picture everything from a car to a microwave to a pen, and more...all enabled to deliver an experience by voice
Talk briefly about how AVS enables us to get wider distribution. Skills and interactions are not limited to first party devices.
Tons of other integrations from hardware manufacturrers
But the focus of today is the Alexa skills kit. And we’ll be focussing on custom skills.
Talk about the consumer experience
Let’s take a step back and look under the hood of ASK.
When the user makes a request of Alexa, all of THIS happens in just seconds...resulting in audio and/or visual feedback to the user.
So how does this work...
The customer speaks into a device
Ex. Alexa, what do I make for dinner?
Ex. Alexa, play a song from Bruno Mars
We send an audio file up to the Alexa Service sitting in the cloud.
Alexa identifies your skill and recognizes intent through Automated Speech Recognition (ASR) and Natural Language Understanding (NLU)
Alexa passes the intent and variables (ie. slots) to your service, where you process and return VUI and GUI
Graphical Experiences are delivered via the companion app, and on companion screens with the recent launch on FireTV
VUI, again, through the speaker (could be text or audio file)
Demos of the developer portal screens. Hidden slides that follow show what to enter or demonstrate.
Demos of the developer portal screens. Hidden slides that follow show what to enter or demonstrate.