FXD 2019 Keynote: Marti Gold, SiriusXM

The Era of Multi-Modal Design - Marti Gold, Director, Automotive Experience Design Group, SiriusXM

  1. 1. the era of multimodal design October 25, 2019 Financial Experience Design Conference, Boston, MA #FXD2019
  2. 2. marti gold Director of UX and Design SiriusXM The views and opinions expressed in this presentation are my own and do not necessarily reflect the views of my employer. 
  3. 3. what is a multimodal interface?
  4. 4. “Multimodal interfaces allow users to seamlessly integrate two more of their senses when interacting with a system, so they can engage with that system in much the same way they engage with the physical world.”
  5. 5. #multimodal #FXD2019 • voice • touch • natural language processors • AI smart displays give designers the ability
 to use voice, sight, and touch to increase user engagement
  6. 6. #multimodal #FXD2019 the rapid growth of smart displays Source: Voicebot.ai, 2019 0 3.25 6.5 9.75 13 2017 558% 12.3% < 3% Percentage of smart speakers in homes that have displays 2018 increase in year two year 1 year 2
  7. 7. #multimodal #FXD2019 as of september 2019, there are 
 9.6 million installed smart displays in the U.S. 2% 39% 59% Amazon Google & Partners Facebook Source: Voicebot.ai, 2019 Amazon • Introduced smart displays in 2017 • 6 million installed base • 59% of the market • Down from 67% in Dec 2018 Google (& Partners) • Entered market late summer 2018 • 4.25 million installed base • 39% of the market • Up from 33% in Dec 2018 • Facebook has 2% of the market; uses Alexa
  8. 8. #multimodal #FXD2019 …and that growth isn’t expected to slow down Source: Strategy Analytics, Jan 8, 2019 0 25 50 75 100 2019 Number of homes that will have smart displays 2023 100m 13m 769% increase over the next four years
  9. 9. why such rapid adoption? 
 particularly since so few experiences
 are optimized for smart displays?
  10. 10. #multimodal #FXD2019 three reasons… • We type at ~40 words per minute. • We speak at ~130 words per minute. • We read at ~250 words per minute Therefore, the optimal experience for interacting with any device may be: VOICE IN | READ OUT which is precisely what MMI devices offer our users.
  11. 11. let’s look at a task
 many of us do every day…
  12. 12. let’s order some coffee
  13. 13. how is this done using
 a graphic user interface (GUI)?
  14. 14. #multimodal #FXD2019 Using a GUI,
 you can see… • categories of beverages • options within each of the categories • photos to help you identify beverages quickly • prices • which beverages you’ve already ordered • the total beverages in your cart
  15. 15. #multimodal #FXD2019 • Discoverable • Explore all the options available to you • Learnable • For example, if you click the left nav, you will see the options on the right change • Visual Cues • You can differentiate items by photo • All the time in the world…. • You can study and peruse and compare to your heart’s content On one hand, the benefits are self-evident…
  16. 16. #multimodal #FXD2019 But on the other hand… • It’s SLOW…. • To place an order and check out, you may have to visit 5+ screens and fill out multiple form fields • Requires personal info to checkout • Email, phone, address, age, your high school mascot, your first dog’s name… • Did I mention S……L……O……W…? • Remember typing = ~40wpm
  17. 17. “I know! Let’s make this
 a voice app or chatbot! It will be so much faster!”
  18. 18. #multimodal #FXD2019 “Hey Google, 
 What are today’s featured coffees?” “Okay Marti. We have… Pumpkin Spiced Latte… Salted Caramel Mocha Frappucino… Nitro Cold Brew… Vanilla Latte… Blonde Vanilla Latte… Blonde Skinny Vanilla Latte…”
  19. 19. Stop! Go Back! That one! Wait!
  20. 20. #multimodal #FXD2019 • You know precisely what you want • You know precisely how to say it • You are in an environment where your eyes and hands are occupied doing something else • You are in an environment that doesn’t have a lot of background noise On one hand, 
 VUI’s are ideal IF…
  21. 21. #multimodal #FXD2019 But on the other hand… • It’s TERRIBLE at presenting choices • Any more than 3 options? 
 Users WILL forget. • Gives the user NO time to consider options • Voice interfaces expect immediate responses • Highly limited navigation options • Voice interfaces have linear flows. You can’t easily skip steps or jump between areas.
  22. 22.  “Voice-only interfaces will certainly have a key role to play for simple interactions such as turning on a lightbulb or listening to music. But voice on its own is not necessarily the best input and output mechanism when it comes to more complex tasks.” David Mercer Principal Analyst, Strategy Analytics
  23. 23. we are just at the beginning
 of the multimodal era
  24. 24. #multimodal #FXD2019
  25. 25. #multimodal #FXD2019 AI-powered Chatbots Bank of America reported 6.3 million users of its virtual assistant, Erica, in the first quarter of 2019, up from 4.8 million the previous quarter.  That’s an additional 
 1.5 million users 
  26. 26. #multimodal #FXD2019
  27. 27. but we are letting down
 smart display users by not optimizing our content for these devices.
  28. 28. #multimodal #FXD2019 Users want to love these devices, but… • The adoption of non-display smart speakers is still rising slightly faster than smart displays • Could be price. Smart speakers cost far less than smart displays • Yet the device makers showcase the SAME THREE USE CASES: Cooking, Smart Home control, and Video Chat. • There is a potentially huge 
 competitive edge for services that can 
 take advantage of this gap.
  29. 29. So how can you do that? Let’s look at common VUI problems and see how MM’s solve them.
  30. 30. #multimodal #FXD2019 VUI problem #1: cognitive load • The immediate and transient nature of voice interfaces requires the user to be fully alert when the system responds • They cannot control the speed of the information flow • They cannot re-read to gain a better understanding • They cannot scan multiple choices • They cannot click away • They cannot ignore the voice prompt without risking the cancellation of the entire interaction • Therefore, they pay close attention • This cognitive load requires all VUI responses be kept short, and limited in succession • “Peaks of Attention”
  31. 31. #multimodal #FXD2019 peaks of attention: VUIs vs GUIs BY DANIEL WESTERLUND How to Go from Screens to Voice without Overwhelming the User Multimodal interfaces provide all the advantages of a VUI 
 but can follow the GUI attention curve.
  32. 32. #multimodal #FXD2019 conversation guidelines: grice’s maxims • The maxim of QUANTITY 
 Give as much information as needed, but NO MORE. • The maxim of QUALITY
 Be truthful. Information must be supported by evidence. • The maxim of RELATION
 Be relevant, saying only things that are pertinent to the discussion. • The maxim of MANNER
 Be clear, brief, and orderly to avoid obscurity and ambiguity.
  33. 33. #multimodal #FXD2019 grice’s maxim example: “what time is it?” “It’s 10:56 AM”“It’s morning”“It’s 10:56 and 46 seconds AM, Eastern Daylight Savings Time, on October 25, 2019”
  34. 34. #multimodal #FXD2019 VUI problem #2: request and response interruptions • Voice interfaces respond to requests, but interruptions cause problems • You cannot count on user input being an isolated utterance. Requests often come embedded within other conversations • Real conversations are NOT scripted exchanges based on decision trees and flow charts • In April 2018, Alexa was still experiencing a 50% failure rate • Wake word not heard, “I didn’t understand the question”, etc. • Note: Amazon says that rate has now been cut by 25%, but much is outside their control. • You must always be asking, “How can my app work through interruptions?” • A MM interface gives you the opportunity to provide quick visual cues to help the user recover and achieve their task.
  35. 35. #multimodal #FXD2019 VUI problem #3: learnability • Sadly, most users don’t know how to use voice commands and have unrealistic expectations about them. • Ironically, the better AI/NLP systems become, the more users assume the device always has context. • Without context, the device will return poor results – so the users keep playing music and setting timers. • Although AI and personalization algorithms are making rapid improvements, users must still be reminded how to express their intents fully, which does not come naturally. • They will say: “Read my horoscope” vs "Ask Astrology Daily for horoscope for Leo” • Reminder is in the response: “I have horoscopes for today from Astrology Daily. What is your zodiac sign?” • Consequence of poor learnability: Only 3% of Alexa skills still have users after 2 weeks. (February 2018) • A MM Interface can dramatically increase learnability of your skill or action by providing context and visual cues for initial inquiries and follow-up responses.
  36. 36. A real challenge you are going to face will be reconciliation of these important principles
 with your marketing department’s “brand voice”
  37. 37. #askmehowIknowthis
  38. 38. and finally, six tips I’ve gathered along the way…
  39. 39. #multimodal #FXD2019 “voice first” design doesn’t apply to these devices. • Multimodal interfaces aren’t simply voice interfaces with images. • Do not have your assistant “read” the screen. In fact, doing so will aggravate your users. • Presenting multiple choices on a single screen means you will need to build far more complex navigation paths - not simple linear flows. 1
  40. 40. #multimodal #FXD2019 user education and support tasks are particularly well-suited to MMIs • High level product overviews and comparisons • Anything that benefits from charts or other data visualizations • Questionnaires, surveys, risk assessments – tasks that require multiple-choice responses • Step-by-step tutorials • If you have forms or need to provide critical, detailed information, stick with a GUI. 2
  41. 41. #multimodal #FXD2019 never stop training your NLP application
 (Natural Language Processing) • Just like VUI’s, even the most intelligent AI-powered multimodal interfaces have to be “trained for intent” to reduce error rates. Consider these utterances: • “What is the balance of my checking account?” • “What’s my checking account balance?” • “How much money is in checking?” • Be prepared to monitor your error logs to test and add phrases to your intents — you can even include negative phrases (“How do I balance my checking account?”). • Amusing note: 
 Remember that linguistics degree your Mom thought was useless? It’s now a 6-figure job. 3
  42. 42. #multimodal #FXD2019 when scoping, put in 3x your normal time for testing "80% of the effort that goes into building these skills is probably going into testing and refining the user experience, and the things that users can say, and how they can say them, and the different ways they can say them.” Tingiris, 
 founder and managing director of Dabblelab 4
  43. 43. #multimodal #FXD2019 contextual and field testing is important Multimodal interactions are difficult for researchers to observe. • They are heavily dependent on context and current activities • They often happen in private spaces with many interruptions and distractions • They may only last a few seconds • Try to find some way to include contextual testing as you build your MM interfaces. 5
  44. 44. #multimodal #FXD2019 and finally, prepare your org for frequent updates • Designing multimodal interfaces is the 
 Wild Wild West • There are lots of suggestions but not a lot of information on proven best practices • No matter how carefully you test, keep in mind that these interfaces are new to users as well. What they like this month may completely change next month. Design and build for easy editing. • Many orgs do not release non-bug related updates quickly, so prepare them mentally for this shift. 6
  45. 45. Thank you! @martigold marti.gold@siriusxm.com