Alexa, the voice service that powers Echo, provides a set of built-in abilities, or skills, that enable customers to interact with devices in a more intuitive way using voice. In this session we’ll provide best practices on how to create a compelling voice experience leveraging Alexa’s built-in skills. Scott Totman, VP of Mobile and Innovation at Capital One, will describe what they learned building their first voice experience, including how they mapped utterances to intents and optimized it for spoken language understanding.
2. CREDIBLY INNOVATE PHOTO
HERE
Alexa Skills Kit
TODAY’S AGENDA
About Alexa
Capital One skill demo
Building the Capital One skill
Alexa skill best practices
4. What is Alexa?
Alexa is a cloud-based voice service
that can answer questions
play music
read the news
and more.
Echo is an
always-on
always-connected
hands-free device that connects to
Alexa.
8. Alexa is always learning.
Alexa gets smarter by
learning new skills.
Developers can create new
skills for Alexa.
Alexa is
ALWAYS LEARNING
9. Creating your own
ALEXA SKILLS
Alexa skills have two parts:
Configuration data in Amazon
Developer Portal
Hosted service responding to user
requests
10. Alexa Skills Kit architecture
Amazon
Alexa
service
Developer’s
application
service
Amazon’s
Developer
Portal
Application, intents, sample data,
developer service URL endpoint
Configured through portal
User intents and
arguments are sent
to the developer
service
GUI cards are rendered in
the Amazon Alexa app
User audio is
streamed to the
service
Audio responses are
rendered on-device
Text response and/or GUI
card data is returned
11. Building an Alexa skill
HOSTED SERVICE
• You define interactions for your voice
app through intent schemas
• Each intent consists of two fields. The
intent field gives the name of the intent.
The slots field lists the slots associated
with that intent.
• Slots can also included types, such as
LITERAL, NUMBER, DATE, etc.
12. Building an Alexa skill
HOSTED SERVICE
• The mappings between intents and the
typical utterances that invoke those
intents are provided in a tab-separated
text document of sample utterances.
• Each possible phrase is assigned to one
of the defined intents.
• GetHoroscope what is the horoscope for
{pisces|Sign}
• GetHoroscope what will the horoscope for
{leo|Sign} be {next tuesday|Date}
14. Capital One’s Alexa approach
June: A few
developers buy Echos
July: Full day tech offsite &
side of desk project kickoff
August: Rapid prototyping and
expanding Capital One skill
Goal: Pair Alexa with the
Capital One app and
allow users to get their
credit card balance
15. Consumer insights: Design thinking/test + learn
Customers like it!
• Hands-free convenience is valuable
• Interested in using Echo for informational
purposes
• Open to making payments/transactions
But…
• Concerns about local security
• Users don’t want financial information captured
by a third party (Amazon)
16. Prototyping: Prerequisites & new development
Leverage existing API model
built for Android/iPhone apps
Piggy-back off “glance”
services built for Apple Watch
Build new JS service as the
ASK orchestrator*
*Used Alexa app node library
(Thanks Matt Kruse!)
17. Capital One skills focus
Read-only information Transactional skills Experimenting
• Default accounts
(credit card, bank,
loans)
• Account balances
• Bill due date
• Last payment
• Last transactions
• Interest rate
• Pay bill(s)
• Transfer $
• App usage Patterns
• O-Auth
• Customer service/
support
• Customer
acquisition
• Alexa adoption
• Alexa evolution
Skill development segmented into three priority buckets
19. Alexa challenges discovered during prototyping
Numerical utterances, device latency, and security were our most significant
20. Numerical utterances
Challenge:
• “Twenty-two” is hard to turn into 22 instead of 20
and 2
• “Three hundred and forty-four dollars”
• Needed to call out words like ‘hundred’, ‘and’
Solution:
• Programmatically create utterances (big list)!
• Optional words
• ASK support for CURRENCY data type
22. Latency
Challenge:
• Coding visually is great for websites, not for voice
• Pauses while the service looks up data are a much
bigger deal for voice
Solution:
• Keep APIs fast
• Leverage Alexa session data
• Keep explanations terse…but not rude
23. Security
Challenge:
• Account linking didn’t exist as an available solution
• Figure out how to connect an Echo with a
customer account
• No guarantee of privacy on Echo end
Solution:
• Make vulnerabilities dependent on compromised
account
• Pairing code for secure account linking
• 2nd factor authentication for moving money
24. Pairing process workflow
1. Open session
2. Device ID not
recognized
3. Generate 6-digit PIN
4. Log in
to C1 app
– provide
PIN
25. Keeping things in context
Challenge:
• Context is hard with multiple accounts
• Helping a user with tasks and cross-
context:
• Switching context
• Keeping context
• Recognizing context
Solution:
• Map user workflow
• When in doubt, ask the user
27. Capital One takeaways
Wish list
• Skill discoverability
• Handle vocal interruptions
better, with context
• Notification indicator
Works great
• Straightforward
• Majority of the effort is
on customer
experience, not
implementation
• ASK is evolving quickly
+ adding new
capabilities
29. Making it sound easy
A person can absorb and process a lot more
written information than audio information.
Instructions that makes sense in an average
web page dialog are probably going to sound
intimidating in a spoken command.
Follow these best practices for better results.
Image of
Picture of an Ear
30. 1. Make it clear the user needs to respond
Not so good
Trivia challenge: Trivia Challenge.
You can choose from the following
categories: 80’s Pop Songs, Potent
Potables, or European History.
31. 1. Make it clear the user needs to respond
Better
Trivia challenge: Trivia
Challenge. Here are your
categories: 80’s Pop Songs,
Potent Potables, or European
History. Which one do you want?
32. 1. Make it clear the user needs to respond
Best practice
If you expect the user to say
something, make sure you end
your prompt with a question.
33. 2. Don’t assume the user knows what to do
Not so good
Car Fu: Car Fu.
34. 2. Don’t assume the user knows what to do
Better
Car Fu: Car Fu. You can ask to get a
ride or request a fare estimate. Which
will it be?
User: Get a ride.
Car Fu: Sending your request. A mobile
alert on your cell phone will let you
know when your car arrives.
35. 2. Don’t assume the user knows what to do
Best practice
When launching a skill or
finishing an interaction, always
suggest what the user can do
next.
36. 3. Present the options clearly
Not so good
Food Taxi: Would you like french
fries or a salad?
User: Yes
37. 3. Present the options clearly
Better
Food Taxi: Which side would you
like: French fries or a salad?
User: Salad.
38. 3. Present the options clearly
Best practice
Either/or questions must be
stated explicitly, lest it be
interpreted as a yes/no
question.
39. 4. Keep it brief
Not so good
Astrology Daily: There are 12
Zodiac signs that I can give you
a horoscope for. Please tell
which one you’d like.
Image Here
40. 4. Keep it brief
Better
Astrology Daily: Get the
Horoscope for which sign?
Image Here
41. 4. Keep it brief
Best practice
Use fewer words than you
might on your website.
Image Here
42. 5. Avoid verbose choices
Not so good
Dairy Shack: What flavor do you
want? For chocolate, say
Chocolate. For vanilla, say
Vanilla. Or for strawberry, say
Strawberry.
Image Here
43. 5. Avoid verbose choices
Better
Dairy Shack: Which flavor
would you like? You can say
Chocolate, Vanilla, or
Strawberry.
44. 5. Avoid verbose choices
Best practice
Do not present more than
three choices and avoid
repetitive wording.
45. 6. Avoid crowding options
Not so good
Score Keeper: Score Keeper. You
can give a player points, add a new
player, ask for the score, start a new
game, clear all players, or stop if you’re
done. Now, what would you like?
User: What was that again?
Image Here
46. 6. Avoid crowding options
Better
Score Keeper: Score Keeper. You can give a player points, ask for the score,
or say Help. What would you like?
User: Help.
Score Keeper: Here are some things you can say:
add John, give John 5 points, tell me the score, start a new game, or reset all
players.
You can also say stop if you’re done.
So, how can I help?
47. 6. Avoid crowding options
Best practice
Present the 2-3 choices that users
will pick 80% of the time and expose
the rest through ‘Help’.
48. 7. Get one piece of information at a time and use it
Not so good
Joke Bank: Would you like to hear a
joke?
User: Yes.
Joke Bank: What’s black, white, and
red all over? An embarrassed skunk.
“One, Two, Five!”
“Three, sir! Three!”
49. 7. Get one piece of information at a time and use it
Better
Joke Bank: What’s black,
white, and red all over? An
embarrassed skunk.
50. 7. Get one piece of information at a time and use it
Best practice
Make smart assumptions
where possible.
Avoid asking non-essential
questions.
51. 8. Finally, make the user comfortable
Best practice
• Let users know they’re in the right place.
• Present usable chunks of information, not overload.
• Take care of technical and legal details when enabling the
skill, not in the audio.
• Don’t blame the user.
52. Best practices
1. Make it clear the user needs to respond
2. Don’t assume the user knows what to do
3. Present the options clearly
4. Keep it brief
5. Avoid verbose choices
6. Avoid crowding options
7. Get information and use it
8. Make users comfortable