SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez nos Conditions d’utilisation et notre Politique de confidentialité.
SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez notre Politique de confidentialité et nos Conditions d’utilisation pour en savoir plus.
“Multimodal interfaces allow users to
seamlessly integrate two more of their senses
when interacting with a system, so they can
engage with that system in much the same
way they engage with the physical world.”
• natural language processors
smart displays give designers the ability
to use voice, sight, and touch to increase user engagement
the rapid growth of smart displays
Source: Voicebot.ai, 2019
0 3.25 6.5 9.75 13
Percentage of smart speakers in homes that have displays
as of september 2019, there are
9.6 million installed smart displays in the U.S.
Google & Partners
Source: Voicebot.ai, 2019
• Introduced smart displays in 2017
• 6 million installed base
• 59% of the market
• Down from 67% in Dec 2018
Google (& Partners)
• Entered market late summer 2018
• 4.25 million installed base
• 39% of the market
• Up from 33% in Dec 2018
• Facebook has 2% of the market; uses Alexa
…and that growth isn’t expected to slow down
Source: Strategy Analytics, Jan 8, 2019
0 25 50 75 100
Number of homes that will have smart displays
769% increase over the
next four years
why such rapid adoption?
particularly since so few experiences
are optimized for smart displays?
• We type at ~40 words per minute.
• We speak at ~130 words per minute.
• We read at ~250 words per minute
Therefore, the optimal experience for
interacting with any device may be:
VOICE IN | READ OUT
which is precisely what MMI devices offer
how is this done using
a graphic user interface (GUI)?
Using a GUI,
you can see…
• categories of beverages
• options within each of the
• photos to help you identify
• which beverages you’ve
• the total beverages in your cart
• Explore all the options available to you
• For example, if you click the left nav, you will
see the options on the right change
• Visual Cues
• You can differentiate items by photo
• All the time in the world….
• You can study and peruse and compare to
your heart’s content
On one hand, the benefits
But on the other hand…
• It’s SLOW….
• To place an order and check out, you may
have to visit 5+ screens and fill out multiple
• Requires personal info to checkout
• Email, phone, address, age, your high school
mascot, your first dog’s name…
• Did I mention S……L……O……W…?
• Remember typing = ~40wpm
“I know! Let’s make this
a voice app or chatbot!
It will be so much faster!”
What are today’s featured coffees?”
“Okay Marti. We have…
Pumpkin Spiced Latte…
Salted Caramel Mocha Frappucino…
Nitro Cold Brew…
Blonde Vanilla Latte…
Blonde Skinny Vanilla Latte…”
• You know precisely what you want
• You know precisely how to say it
• You are in an environment where your
eyes and hands are occupied doing
• You are in an environment that doesn’t
have a lot of background noise
On one hand,
VUI’s are ideal IF…
But on the other hand…
• It’s TERRIBLE at presenting choices
• Any more than 3 options?
Users WILL forget.
• Gives the user NO time to consider
• Voice interfaces expect immediate responses
• Highly limited navigation options
• Voice interfaces have linear flows. You can’t
easily skip steps or jump between areas.
“Voice-only interfaces will certainly have
a key role to play for simple interactions
such as turning on a lightbulb or listening
to music. But voice on its own is not
necessarily the best input and output
mechanism when it comes to more
Principal Analyst, Strategy Analytics
we are just at the beginning
of the multimodal era
Bank of America reported
6.3 million users of its
virtual assistant, Erica, in
the first quarter of 2019,
up from 4.8 million the
That’s an additional
1.5 million users
but we are letting down
smart display users
by not optimizing our content
for these devices.
Users want to love these
• The adoption of non-display smart speakers is still rising
slightly faster than smart displays
• Could be price. Smart speakers cost far less than smart displays
• Yet the device makers showcase the SAME THREE USE
CASES: Cooking, Smart Home control, and Video Chat.
• There is a potentially huge
competitive edge for services that can
take advantage of this gap.
So how can you do that?
Let’s look at common VUI problems
and see how MM’s solve them.
VUI problem #1: cognitive load
• The immediate and transient nature of voice interfaces requires the user to be fully alert
when the system responds
• They cannot control the speed of the information flow
• They cannot re-read to gain a better understanding
• They cannot scan multiple choices
• They cannot click away
• They cannot ignore the voice prompt without risking the cancellation of the entire interaction
• Therefore, they pay close attention
• This cognitive load requires all VUI responses be kept short, and limited in succession
• “Peaks of Attention”
peaks of attention: VUIs vs GUIs
BY DANIEL WESTERLUND
How to Go from Screens to Voice without Overwhelming the User
provide all the
advantages of a VUI
but can follow the GUI
conversation guidelines: grice’s maxims
• The maxim of QUANTITY
Give as much information as needed, but NO MORE.
• The maxim of QUALITY
Be truthful. Information must be supported by evidence.
• The maxim of RELATION
Be relevant, saying only things that are pertinent to the discussion.
• The maxim of MANNER
Be clear, brief, and orderly to avoid obscurity and ambiguity.
grice’s maxim example: “what time is it?”
“It’s 10:56 AM”“It’s morning”“It’s 10:56 and 46 seconds AM,
Eastern Daylight Savings Time,
on October 25, 2019”
VUI problem #2: request and response interruptions
• Voice interfaces respond to requests, but interruptions cause problems
• You cannot count on user input being an isolated utterance. Requests often come embedded within
• Real conversations are NOT scripted exchanges based on decision trees and flow charts
• In April 2018, Alexa was still experiencing a 50% failure rate
• Wake word not heard, “I didn’t understand the question”, etc.
• Note: Amazon says that rate has now been cut by 25%, but much is outside their control.
• You must always be asking, “How can my app work through interruptions?”
• A MM interface gives you the opportunity to provide quick visual cues to help the user recover and
achieve their task.
VUI problem #3: learnability
• Sadly, most users don’t know how to use voice commands and have unrealistic expectations about them.
• Ironically, the better AI/NLP systems become, the more users assume the device always has context.
• Without context, the device will return poor results – so the users keep playing music and setting timers.
• Although AI and personalization algorithms are making rapid improvements, users must still be reminded
how to express their intents fully, which does not come naturally.
• They will say: “Read my horoscope” vs "Ask Astrology Daily for horoscope for Leo”
• Reminder is in the response: “I have horoscopes for today from Astrology Daily. What is your zodiac sign?”
• Consequence of poor learnability: Only 3% of Alexa skills still have users after 2 weeks. (February 2018)
• A MM Interface can dramatically increase learnability of your skill or action by providing context and
visual cues for initial inquiries and follow-up responses.
A real challenge you are going to face will be
reconciliation of these important principles
with your marketing department’s “brand voice”
six tips I’ve gathered along the way…
“voice first” design
doesn’t apply to
• Multimodal interfaces aren’t simply
voice interfaces with images.
• Do not have your assistant “read” the
screen. In fact, doing so will aggravate
• Presenting multiple choices on a
single screen means you will need to
build far more complex navigation
paths - not simple linear flows.
user education and support tasks
are particularly well-suited to MMIs
• High level product overviews and comparisons
• Anything that benefits from charts or other data
• Questionnaires, surveys, risk assessments –
tasks that require multiple-choice responses
• Step-by-step tutorials
• If you have forms or need to provide critical,
detailed information, stick with a GUI.
never stop training your NLP application
(Natural Language Processing)
• Just like VUI’s, even the most intelligent AI-powered multimodal interfaces have to be
“trained for intent” to reduce error rates. Consider these utterances:
• “What is the balance of my checking account?”
• “What’s my checking account balance?”
• “How much money is in checking?”
• Be prepared to monitor your error logs to test and add phrases to your intents — you can
even include negative phrases (“How do I balance my checking account?”).
• Amusing note:
Remember that linguistics degree your Mom thought was useless? It’s now a 6-figure job.
when scoping, put in
3x your normal time
"80% of the effort that goes into building these
skills is probably going into testing and
refining the user experience, and the things
that users can say, and how they can say
them, and the different ways they can say
founder and managing director of Dabblelab
contextual and field
testing is important
Multimodal interactions are difficult for
researchers to observe.
• They are heavily dependent on context and
• They often happen in private spaces with
many interruptions and distractions
• They may only last a few seconds
• Try to find some way to include
contextual testing as you build your
prepare your org for
• Designing multimodal interfaces is the
Wild Wild West
• There are lots of suggestions but not a lot
of information on proven best practices
• No matter how carefully you test, keep in
mind that these interfaces are new to
users as well. What they like this month
may completely change next month.
Design and build for easy editing.
• Many orgs do not release non-bug related
updates quickly, so prepare them mentally
for this shift.