Digital Assistants and Chatbots - a Brands best friend?

1
DIGITAL ASSISTANTS/CHATBOTS – A BRAND’S BEST FRIEND?
INTRODUCTION
Background
Microsoft CEO Satya Nadella recently stated at their Worldwide Partner conference that “chatbots will
fundamentally revolutionize how computing is experienced by everybody”.
Digital Assistants (DAs) and chatbots define a shift to a more conversational style of interacting with
internet based services.
At its most primitive, a DA might be a simple search interface built upon an existing FAQ database. At
its more advanced, a DA would use contextual information and artificial intelligence (AI) to calculate a
response tailored specifically for an individual.
It is envisaged that these conversational interfaces will not be staffed with human call centre
operators but by cloud based servers running sophisticated artificial intelligence routines. In the short
to medium term, however, it is likely that the AI routines will hand off to a human operator once some
initial conversation filtering has taken place.
Until very recently it was difficult for a brand or a market researcher to participate directly in the
creation and distribution of DAs, as companies such as Apple (SIRI) and Microsoft (Cortana) were
closed platforms. A lot changed when Facebook announced at its 2016 developer conference that
anyone could build their own DA within Facebook Messenger. Facebook is not the only company
opening up to developers. Line (Japan), WeChat (China), KIK (US teens) also operate chatbot stores.
The use of DAs and chatbots offer the potential to reach billions of potential customers with a
consistent conversational experience.
So what is the difference between a chatbot and a Digital Assistant? A DA is akin to an electronic
helper; it is the butler working for you. Chatbots are like dealing with a company representative
helping with a specific task such as booking travel or buying insurance. Facebook’s stance is that they
make little differentiation between conversing with a human and a chatbot. Both are resident and
searchable in Facebook’s contacts book. The fact that Facebook provides an open platform means
that a brand could develop a general purpose DA as a chatbot if they wanted to. For instance, Nike
could create a general fitness DA, broader in scope than solely e-commerce sportswear.
Why should brands be interested?
There are two key opportunities for brands.
 Create a standalone chatbot within a third party platform (such as Facebook Messenger or
WeChat).
 Work with the large internet companies who run general DAs such as Apple (SIRI) or
Microsoft (Cortana) to ensure that they have access to and can interact with the brand’s best
content.
Chatbots offer a number of specific advantages:
 Chatbots will move public brand to consumer service conversations, which often occur
publically on Facebook and Twitter pages, into a private and more controlled chat

2
environment. Chatbots can learn, and profile the user over time as Messenger platforms
provide a persistent ID.
 Chatbots offer significant scalability. For instance, a Facebook chatbot can effectively
connect a company’s customer service system with 900 million people within a consistent
User Interface (UI). Chatbots are therefore a potential challenger to telecom customer
care lines (see Figure 1)
 Chatbots allow the seamless sending of documents and forms. For instance a digital
telecoms sales assistant could forward contract documentation directly to the user, real-
time and within the chat (without needing to ask for a postal, or email address).
In short, digital agents play to automation and scale opportunities.
Why should market researchers be interested?
Messenger-based chatbots offer similar advantages to the market research industry.
 Messenger platforms have a significant and increasing population reach. The largest of these
platforms now have more active users than social networks (see Figure 2)
 A significant youth population engages with these platforms
 Research can now be presented in a UI framework already understood by millions of people
(this will also present a challenge to existing well known question types and formats)
FIGURE 1. GLOBAL MESSAGE VOLUME TELCO VERSUS FACEBOOK MESSENGER AND WHATSAPP
FIGURE 2. MONTHLY ACTIVE USERS FOR TOP 4 SOCIAL NETWORKS ** AND MESSEAGING APPS **

3
Testing the hypothesis
When considering the notion of the Digital Assistant as a brand’s best friend?” a number of high level
questions come to mind:
 Do people want to converse with DAs in the first place?
 How do people converse with brands via a digital channel such as a Facebook brand page?
 How do people converse? Is it the same as with a friend?
 What sort of subjects would someone want to converse with a DA about?
 Even though AI techniques have advanced significantly since the launch of SIRI in 2011, is
the goal of fulfilling a truly open conversation using AI still a twinkle in the developer’s eye?
I had several hypotheses I wanted to test:
 Conversations with a DA are likely to be succinct. This would mean that a DA needs to
accurately interpret intent and have the content on hand to satisfy the request. If this was not
the case it could be to the detriment of the brand experience.
 Conversations between a brand and a consumer via a digital channel are already brief
 Current interactions with a DA use simple question conventions which do not requires
contextual information
 Given the challenge of creating a conversational UI, it is likely that interactions need to be
highly templated
As I did not have access to one definitive single source of data to answer all these questions, I carried
out a series of experiments to test the hypotheses and attempt to answer the core question.
 Usability test: 10 London-based panellists were invited to interact with a voice based DA
(Apple’s SIRI and Google’s Voice Search). Tasks were administered by an on-site moderator.
 SIRI + Messenger analysis: examination of SIRI and Messenger usage on Kantar’s US
mobile behavioural panel. In the case of SIRI, the aim was to deduce the current types of
interactions occurring. For Facebook Messenger, the work involved looking at the session
lengths and deduce the amount of time spent in this mobile environment.
 Facebook Brand Page analysis: an examination of interactions taken from 13,658 posts
made to the Facebook pages of 15 UK retailers during the period 1 November to 31
December 2015. The objective was to analyse the types of conversations occurring and the y
implications for DAs.
 Messenger chatbot analysis: I created a working chatbot called Kat and had 70 of my
colleagues interact with it. The chatbot had a mixture of closed and open questions. All data
from the interactions were recorded and stored in a database for analysis.

4
METHOD I: USABILITY TEST “HOW DO PEOPLE INTERACT WITH MOBILE BASED
DIGITAL ASSISTANTS?”
The usability test involved 10 London-based respondents undertaking a series of voice-triggered
brand related tasks using SIRI or Google Voice.
The questionnaire was managed by a moderator, and a video of the exercise was recorded. The
mobile phone, either an iPhone (for SIRI), or a Samsung (Google Voice) was provided to the
respondent.
The script was broken into three sections. At the end of each section the respondent was asked for
their thoughts on the interaction.
Location based questions:
 Can you find the nearest Pizza Express?
 Can you find the nearest Costa Coffee?
 Can you recommend a good Indian restaurant?
 Where is the nearest petrol station?
 Where is the nearest Aldi supermarket?
 Where is the nearest free Wi-Fi hotspot?
Weather and time questions:
 What is today’s weather forecast?
 What is the temperature right now?
 What will the weather will be like tomorrow?
 What is the weather forecast for the weekend?
 What is the time in New York?
 What is the time in Moscow?
Information searches:
 Can you find me the best Windows tablets?
 Find me the best baked chicken recipes.
 Find me the best smartphone deals.
 Find me news about Coca Cola.
 Show me upcoming movies.
 Find me new car deals.
 Find me new VW car deals.
 Find me this week’s ASDA deals.
Location Tasks: Summarized Observations
Nearest…
The first two tasks were simple ‘find the nearest’ requests. All respondents had success with this task.
Respondent X insisted on just saying the brand i.e. “Pizza Express”, “Costa Coffee”. His reasoning
was that because he was on a mobile device with GPS, it would automatically know his location and
deduce that he would want the closest one. He was correct as he got the same responses as the
other respondents.
“Good Indian restaurant”…

5
There was some discussion re whether ‘good’ was the right question, as no one would search for a
‘bad’ Indian restaurant. There were concerns at the sort order of the results. It seemed to sort them in
order of location rather than quality or rating.
SIRI provided TripAdvisor reviews, although some respondents noted that there were so few reviews
for some restaurants that they were not useful.
Respondent V asked for 5 star rated restaurants, instead of a “good restaurant”. He thought stars
were the accepted rating currency of restaurant reviews.
Respondent X would not use the term “good”. He thought that by using this term, Google would return
paid for search results. Instead, he just said “Indian Restaurant” and received the same results as the
others.
Respondent Y didn’t like the fact that SIRI would not allow for further tuning of the response. He
wanted it to be more ‘conversational’ and its response to be able to be tuned.
Overall, in terms of the ‘nearest’ requests, respondents thought that the DAs struggled with more
complex tasks, particularly where weighting is required (i.e. location versus quality of goods).
As regards the weather and time questions, respondents found that the DAs performed very well on
these tasks. Some of the respondents commented that when they voiced a question, they expected a
voice response rather than text on the screen.
In terms of information searches, the information is best summarised as follows:
Best Windows tablets Respondents found the question too generic. Respondent X said he would
only trust results from reputable sources such as PC Magazine.
Baked chicken recipes Respondents thought this worked well, though Respondent Y would have
preferred the result to be a YouTube video. Respondent X pointed out, that
he would be more likely to use a PC or Tablet for this type of task.
Best smartphone deals
and best new car deals
It was thought that these questions were too vague. Both searches linked
to comparison sites or local retailers with the term “best” ignored. One
person struggled with the smartphone question as SIRI repeatedly insisted
the only phone worth having is an iPhone.
New VW car deals A mixed bag, SIRI linked to VW dealer sites. Google performed better by
linking to the deal section of the VW site. However, every respondent
would have preferred to see the deal first hand rather than having to click
on a link.
This week’s ASDA
deals
The results of this task were interesting for several reasons. Firstly, for half
the respondents, the DA did not understand what “ASDA” was. Perhaps
“ASDA” is phonetically confusing for an AI. Secondly, when the DA did
understand, it linked to a deal site as opposed to the ASDA website.
Further respondent comments from the usability testing
“If I do a location search I want a map in the result (Siri didn't always do this)"
“I like it when Siri replies to me (via voice), not when it gives me a list of tiny links"

6
“I want it more conversational, like if SIRI asks me additional questions, to help it get me the best
information"
“I only use Siri at home - I'd be too embarrassed when I go shopping"
“If I ask for the best restaurant near me, it gives all restaurants, no matter how bad the review (which
it also shows in the results). Just give me the ones with good reviews you have this information"
“If I ask for a baked chicken recipe, or how to remove a stain, why not link straight to a 'how to' video
on YouTube from a reputable brand?”
“I need to adjust the way I talk, so Google understands me".
“I like it when there is a photo in the result”

7
Findings
Half of the respondents already use a DA voice service, however, they only do so at home. For one of
the respondents, English was their second language and the DA would often fail to understand them.
When carrying out such UI tests, it is worthwhile to ensure that respondents are participating in their
first language.
One struggled due to having no prior experience with a DA.
Three key points emerged from the usability work. Firstly, it was notable that more experienced users
would continually reword the question to a format they thought would work best. Often this involved
simplifying the question and trusting the DA (thanks to location sensors and contextual information) to
fill in the gaps.
Secondly, respondents did not expect a DA to handle difficult questions. However their expectation
was the return of a high quality result, maps if required, deals surfaced, results ordered correctly and
recipes in video form.
Thirdly, they wanted the results to be conversational. In short if they talked to a DA, they wanted the
DA to talk back.
Two respondents expressed their surprise at the improvements in Siri. One commented “Wow, it
understands me way better than it used to”. They had tried SIRI when it was first launched and never
came back to it. All respondents found it frustrating when the DA did not understand them.
Can voice activated Digital Assistants be considered a brand’s best friend?
In the US, Google says that 20% of its queries on a mobile device are voice searches(1). The fact that
a voice DA struggled to understand “ASDA” should be of concern for that brand.
Respondents also clearly wanted information to be surfaced without clicking through to web links.
They expected the DA to filter out content and only provide the best response. DAs did this well for
simple time and weather queries but not for more complex questions.
However the real issue is that at the time of testing Apple and Google had complete control over the
DA with no direct way for brands to participate. For instance, in the case of ASDA, they would have to
make a formal request to Google to request for the speech recognition system to recognize their
brand.
Under the current circumstances, the best a brand can do is to format their content in a way that
mirrors the approach used by Google or Apple to find and index content. One possible technique
could be to use Google’s keyword suggestion tool to find the types of searches and frequency that
appear to be conversational.
Finally, the testing shows that while brands provide content, they are not involved in the conversation.
As a result we cannot consider DAs such as SIRI to be a brand’s best friend.
(1) http://searchengineland.com/google-reveals-20-percent-queries-voice-queries-249917

8
METHOD II: “WHAT CLUES CAN MOBILE BEHAVIOURAL DATA GIVE US ABOUT
DIGITAL ASSISTANT USAGE?”
To answer this question I reviewed app and web data logged from Kantar’s mobile behavioural panel
to isolate and quantify:
 SIRI usage patterns
 Mobile Search terms indicative of an interaction that a digital assistant might be involved in
 Mobile Messenger usage patterns
SIRI Usage Patterns
To get an idea of how people are using DAs in practice, I identified a group of more than 3000
panellists who have used SIRI over the past year. Whilst I could not measure when a panellist had
made an internal device call, like setting an alarm, I could evaluate searches that require SIRI to
connect to one of its content partners (i.e. Bing for web searches, Wikipedia for factual information,
Wolfram Alpha if calculations are required).
I was able to capture almost 70K of these SIRI connections and classify their purpose. This revealed
that 63% of the searches from SIRI resulted in a Bing search for information. The next largest
category was Maps/Location at 23% of searches (Figure 3).
FIGURE 3. BREAKDOWN OF SIRI USAGE FROM 70K INTERACTIONS RECORDED FROM 3K US MOBILE PANELISTS
For 10% of SIRI searches processed by Bing (~4.5 K photos), a photo was also displayed, and I was
able to capture and analyse these. Often photos are displayed if a factual question is posed of SIRI,
particularly if the response is sourced from Wikipedia (Figure 4).
FIGURE 4. BREAKDOWN OF PHOTOS SENT TO A RESPONDENT AS A RESULTS OF A SIRI INTERACTION
Maps/Location
29%
Wolfram Calculations
8%
Bing
Searches
63%
Itunes (find music)
0%
Sports
Faces
48%
Sports Logos
18%
Nature
6%
Music
1%
Movies
18%
History or Maths
3%
Flags
0%
Famous/Celeb
4%
Corporate Logos
2%

9
For the photos that were captured, sports data figured prominently. The largest category was that of
portrait photos of sports stars. It is very likely this would have been for searches relating to player
stats and information and that this sports related information service has been heavily integrated into
SIRI. Sports logos and movie posters were the next largest photo categories.
The nature category was also interesting with sharks, spiders and snakes featuring prominently!
However, as interesting as the photos were, they only represented 10% of the Bing calls, and give
only an indication of the things people search for, and the visuals they are used to experiencing.
Mobile Searches
Whilst I did not have access to SIRI/Bing search queries made via SIRI, I did have access to Google
mobile searches. Whilst I could not separate whether these searches were initiated via voice or text, I
could find searches that could be brokered by a DA. To do this I took inspiration from Google’s work
on ‘Micro Moments’, defined as the instant when someone reaches for their mobile device to find
something out. Two of the specific moments that Google said a brand should look for are “How…?”
moments and “Near….?” moments.
After isolating searches that contained “How…?” (3% of searches on our panel) and “Near..?” (1% of
search), I was able to analyse the results for word frequencies.
Near..?
Google searches containing “Near..?” related strongly to hotels and restaurants/shopping (figure 5).
FIGURE 5. WORD FREQUENCY FOR MOBILE SEARCHES THAT CONTAIN “NEAR”
HOTELS
RESTAURANTS
STORES
STORE
FOOD
APARTMENTS
CAR
PIZZA
CHEAP
SHOPS
RESTAURANT
SHOP
AIRPORT
MALL
NORTHLAKE
SERVICE
CHINESE
PARKING
BREAKFAST
OPEN
MOVIE
SOUTH
SALE
ICE
BANK
NEW
ROSA
KOHLS
MOTEL
REPAIR
PARK
TARGET
GAS
RENT
DRIVE
STATION
BUFFET
CREAM
WATER
BARS
DELIVERY
POST
DEPOT
NY
BEACH
SCHOOLS
JAPANESE
SALON
UNIVERSITY
BESTBUY
CENTER
GOLF
WALMART

10
How..?
Interestingly, TV was the most searched item (figure 6).
Particularly apparent were technical questions such as:
 “How do I connect YouTube from Phone to TV?” Phone to TV connectivity was a significant
trend in the data.
 “How do I edit my contacts on my Samsung Galaxy?”
Are device makers now ceding the customer relationship to search engines?
What was striking was the number of technical questions relating to mobile devices that are being fed
through search engines. This in turn begs the question of whether device manufacturers should be
on-boarding this information into the device or creating a technical support chatbot.
FIGURE 6. WORD FREQUENCY FOR MOBILE SEARCHES THAT CONTAIN “NEAR”
Findings
It was surprising how many SIRI request are processed by the Bing search engine. I had expected
that requests for directions via Apple would be the largest usage category.
For sports brands it is worth nothing that SIRI brokers a significant amount of sporting related
questions and that the images were often returned as part of these.
Once again, as brands are not directly involved in these DA consumer interactions, SIRI cannot be
considered a brand’s best friend.
The brief analysis of ‘How’ and ‘Near’ searches did show how companies are inadvertently ceding
their consumer interactions to search. It’s a risky strategy for two reasons. Firstly, the search engine
could easily pass the enquiry to a competitor as a result of sponsored advertising. Secondly, brands
miss the opportunity to be involved in and learn from these interactions.
TV
FIX
MONEY
WORK
REMOVE
CAR
ANDROID
PLAY
PHONE
FREE
HAIR
IPHONE
COOK
RESET
APP
XBOX
CARD
WATER
CLEAN
UNLOCK
GALAXY
CALORIES
BABY
DOG
HOME
INSTALL
ONLINE
BECOME
GOOGLE
FACEBOOK
OPEN
WORTH
BOX
GROW
MUSIC
OIL
CONNECT
MOVIES
PAY
WINDOWS

11
Messenger Data
Given the significant scale of messenger apps and the fact that the platforms have begun opening up
to third party developers via “Bot Stores”, it is highly likely that messengers will be the key distribution
channel for branded DAs.
I thought it worthwhile to quantity the messenger app usage patterns of 3.5K mobile panellists for the
month of March 2016. For this exercise I analysed Facebook Messenger patterns (Figures 7 and 8).
The data shows that the average individual interaction/session out of home is 85 seconds versus in
home 113 seconds. Notable is how brief the average session is, especially when out of home.
FIGURE 7. FB MESSENGER AVERAGE SESSION LENGTH BY MESSENGER IN HOME VERSUS OUT OF HOME
FIGURE 8. FB MESSENGER FREQUENCY OF INDIVIDUAL MESSENGER ACTIVITY LENGTHS
Findings
The fact that the data shows messenger sessions to be very brief means that brand consumer
interactions will need to be succinct. It is conceivable that the interactions will be in chains and while
each link in the chain might be brief, a single conversation could conceivably last some time.
Unfortunately our mobile behavioural data can only isolate app usage and not individual
conversational chains.
To address the issue of chain measurement, Ted Livingstone CEO of KIK (a youth orientated
messenger with 300 million active users) has proposed that messenger conversations require a new
set of metrics.
 Active: A chat on one topic, where interaction, responses happen in rapid fire (i.e. a <= second
interval between messages). This could be an intense chat between girlfriend, boyfriend.
0
20
40
60
80
100
120
140
Facebook
Messenger
Snapchat Google Talk Kik Whatsapp Groupme Textnow Android
Messenger
Pinger
SessionSeconds
- Out of home
0-30 seconds52
18%30-60 seconds
60-90 seconds
5%

12
 Passive: An on-again off-again conversation. I.e. continued tweaking of a travel arrangement.
 Sporadic: Occasional messages, sent during the day, or week. This might be the style of
conversation you would expect with an entertainment service.
It is likely that messenger based brand interactions such as customer service would tend to be in the
“Active” camp however we do not have data to prove this.
The DA should also be contextually aware and reduce the requirement for user input, if it detects the
individual is out and about and likely to be distracted.
I believe that due to the sheer reach of chatbots that they will be a Brand’s best friend when used in a
messenger environment. RC: Don’t you need to add the bit in yellow highlight? However the data
indicates that interactions will need to be well designed and accurate. If messenger session times
swell due to the chatbot being non-intuitive, it is easy to imagine it becoming a source of frustration.

13
METHOD III: WHAT DOES A DIRECT DIGITAL BRAND TO CONSUMER
CONVERSATION LOOK LIKE?
One of the limitations with method II was that the behavioural data did not allow us to analyse
conversation chains. Facebook brand pages, however, provide an avenue to collecting this type of
data.
For this exercise we extracted 13,658 posts made to the Facebook pages of 15 UK retailers during
the period 1 November to 31 December 2015.
We define a post as when a user sends a new comment to a brand page. For each of these posts we
can extract the ensuing conversation chain. Taking both the number of posts and chains together
gave us a total of over 105,000 comments to analyse.
Each chain is classified as one of these three types:
 u = user (the person who first posted to the page and started a conversation chain)
 p = page owner (i.e. the brand or business who operates the Facebook page)
 0 = other users have decided to comment within the conversation chain.
FIGURE 9. EXAMPLE POSTS, COMMENTS, CHAIN LENGTH
Once each post had been tagged, we were left with a conversation signature. We then aggregated
the occurrence of these signatures per brand.

14
Findings
The table below (Figure 10) shows a brief extract of that data. For instance you can see on the Boots
Facebook page the most frequent signature was where the user posted and received no response
(32% of the time).
FIGURE 10. SAMPLE EXTRACT FROM UK RETAILER FB PAGE DATA
Signatures % of signature occurrence per brand
Boots LidlUK AldiUK Tesco Marks and Spencer All Retail Brands Combined
u 32 49 12 7 35
26
up 24 27 35 29 18
22
upu 5 11 13 5 6
7
upup 9 0 0 9 4
5
uo 2 1 1 1 4
3
uop 3 0 2 3 2
2
upo 1 1 3 2 1
1
upupu 1 0 0 1 1
1
uou 0 0 1 0 1
1
upupup 1 0 0 2 1
1
uoo 1 0 0 0 2
1
uoup 2 0 0 1 1
1
upuo 1 1 1 0 0
1
upuu 0 1 1 0 1
1
uu 1 0 0 0 1
0
upou 0 0 1 1 0
0
uup 0 0 0 1 0
0
uopu 0 0 1 0 1
0
uooo 1 0 0 0 1
0
upuup 0 0 0 1 1
0
Looking across all retail brands the most common chain was a user post and no brand response
(26%) followed by a user post and a single brand response (22%).

15
What was notable in the data was the significant long tail for the chain types recorded, 942 for the
13K posts. There are several factors that need to be considered when making message chain
measurements. For instance, brands that promote their Facebook page as a communication channel
will receive more messages and therefore more variety in conversation chains. Brands which do not
respond to initial user posts would also not be expected to receive a variety of chain types.
Tesco (Figure 11) received the most posts and also had the lowest number of unanswered posts. For
this dedication they also had to manage by far the largest variety of conversation chains. This would
certainly have cost implications.
FIGURE 11. COUNT OF CHAIN TYPES VERSUS % UNANSWERED POSTS
Brands which do not respond to user requests face the issue that the user’s friends will often hold
conversations on the brand’s page without any brand involvement. For example, on the Lidl UK page
there were 10 public conversations between a user and their Facebook friends without any brand
intervention.
A brand’s best friend?
The obvious advantage of chatbots in this instance is that communication is private.
We did not have access to determine the number of private versus public brand page interactions.
KLM airways shared this graph (Figure 12) that indicated they receive 7 times more private Facebook
messages than public ones.
FIGURE 12. KLM FACEBOOK SOCIAL CUSTOMER CARE MESSAGE TYPES

16
However, the data indicate that if a chatbot is well promoted and successful it will garner a significant
number of messages. In addition, if these interactions are not templated there will be a significant
variety of conversations to manage.
To be a best friend a brand will therefore need to develop systems to manage this communication
channel cost effectively.

17
METHOD IV: MEASURING CHATBOT INTERACTION VIA A PROTOTYPE HEALTH BOT
NAMED KAT
The last method necessitated the development of a Facebook chatbot to gather first hand data of user
interactions.
How do you build said chatbot?
As of April 2016 Facebook allowed third parties to create a chatbot on their Messenger platform. A
chatbot follows a simple messaging format (the same as SMS) where it is able to receive and send
messages. The messages can contain text, multi-media or both.
Facebook also allows messages to contain simple structured elements such as buttons.
The messages can also contain structured template elements (figure 13).
FIGURE 13. FACEBOOK UI ELEMENTS TAKEN FROM HTTPS://DEVELOPERS.FACEBOOK.COM/DOCS/MESSENGER-
PLATFORM
Buttons

18
We took the decision to name our chatbots Kat as we wanted a short name with a connection to the
Kantar brand.
The questions were taken from a Kantar Health diary study (Figure 14). The survey was modified to
make it more akin to a general health survey and finished with an open conversation (figure 15
describes the flow).
FIGURE 14. CHATBOT FIXED QUESTIONS
Original Survey Questions Chatbot Survey
You are .. (ask once M F) Delete Facebook provides this information.
How old are you? (ask once)
How would you evaluate your overall health? Would you say you
are: (ask once)
This was added as an “Additional Question”.
Which of the following best describes your capacities to perform
everyday activities: (ask once)
How would you evaluate your overall level of activity. Would you
say you are: (ask once)
How do you feel today? (ask daily in the morning and allow
people to answer this question throughout the day if their mood
changes)
How are you feeling right now? Unwell, Cruising,
Awesome
Did you get a good sleep last night? Did you get a good sleep last night? Not great, Average,
Great Sleep
Did you exercise today? Have you had any exercise within the last two hours? I
worked out!, A nice walk, No mostly stationary.
What have you consumed in the last two hours – Snacks,
A Meal, Nothing
FIGURE 15. CHATBOT USER FLOW
Distribution of the chatbot

19
A link to the Kat was sent out to a company mailing list “Mobile Insight Group”. In the email was a link
that would directly launch Kat.
70 people clicked on the link and launched Messenger and Kat. 10 of the 70 recipients did not interact
with the Chatbot. Four of them experienced technical issues as they were using an older version of
Messenger.
Findings
For four of the respondents the conversation spanned 16 hours! The reason is one of the challenges
to talking with an assistant in a messenger environment. A Facebook chat is not deleted; it sits there
ready to start again where it left off. This is a usability challenge. Do we insist that we start the
conversation afresh, is it rude to do so?
FIGURE 15. TOTAL DURATION OF A SINGLE CHATBOT CONVERSATION
Messenger does not have a mechanism to remove previous conversations. Deletion is at the
discretion of the user. This obviously has PR implications if the chatbot or a human operator
managing the bot sends an inappropriate message.
After removing outliers such as saying Hi, the average time to answer the four set questions (one
being dynamic if answered again during the day) was 91 seconds.
After the four fixed questions, Kat would attempt to create a more ‘open’ conversation. To do this, it
would look through the respondent’s answers, and based on a decision tree select a response. For
example, if someone had slept badly, Kat would ask why? If a respondent had said they did not have
any exercise, it would ask what their favourite form of exercise was.
0
10000
20000
30000
40000
50000
60000
70000
80000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67
Total Duration of Chat (seconds)
Conversations that
lastest 16 hours
Conversations that
span 1-2 hours
The majority
Conversations 1-15 Minutes.
Conversation ID
Seconds

20
FIGURE 16. TIME SPENT OF STRUCTURED QUESTIONS VERSUS DECISION TREE CONVERSATION PER RECORDED
CONVERSATION
The data indicated that respondents who attempted the fixed questions did so efficiently. Most
engaged in conversation with Kat but the limitations of a primitive AI engine would often make for
unsatisfying and frankly confusing conversations.
Did people share photos with the bot? And what was the subject matter?
Kat was designed to encourage some photo uploads.
24 of the 60 respondents uploaded a photo with 108 photos collected in total.
We used an automated, image classification system (clarifai.com) to auto tag the pictures. It would
tag the photos in milliseconds, and then ask a simple question based on the tag.
20 of the 108 were stickers (which a general image system struggles with). If stickers are an important
part of communication, in a messenger environment they will need classification. Of the 20 sticker
images (of which there are hundreds of options), the most common was the “thumbs up”.
This was followed by smiley face variations.
0
100
200
300
400
500
600
Seconds
Duration of ad-hoc.
Duration of Set Questions (seconds)

21
The rest of the photos shared were an even mix of food and selfies (Figure 17).
FIGURE 17. SMALL SAMPLE OF FOOD PHOTOS UPLOADED TO KAT)

22
Did respondents build a connection with Kat?
As can be seen from Figure 18, 52% of responses from the internal pilot were one word only. In the
word count table underneath (Figure 19) a rapid drop in frequency is evident.
FIGURE 18. FREQUENCY OF WORD COUNT CONTAINED IN USER VERBATIM RESPONSES
Of the 684 unique respondent user responses 40 had ten or more words as an input. These
respondents were testing Kat out.
FIGURE 19. EXMAPLES OF USER RESPONSES IN DIFFERENT WORD COUNT BANDS
No. Words/Symbols
in Response
Five interesting examples per count
1 ? (Translation – “what the heck are you on about robot!”)
Wha (Translation – “They meant to say hi)
Gym
Baby
Beer
2 soup :) (note emoticon symbol)
dancing zumba
feeling cold
did already
noise, heat
3 blocked nose, cold
am i done?
am i healthy?
holiday in Scotland
needing the bathroom.
4 3 cups of coffee.
you getting freaky bot
usually to celebrate sunset
noises and bad dreams
eat sleep rave repeat
5 enormous amounts of road noise
the pretzels or the watermelon?
i have a broken ankle
no my bike is stolen
quinoa bites, and lemon water
6 baby got unwell during the night
pretty hot with the windows closed
what would you like to know
so what should we do ?
ok, i have to go. bye
7 i don’t like being i in pictures.
i mean my job and my house
feed it, put it to bed, repeat
do you not feel sorry for me
after a few fingers of whiskey, yes.
1
52%
2
15%
3
9%
4
6%
5
5%
6
3%
7
2%
9
2%

23
8 event location one day prior to the event
i will after the event in 2 days
9 go to bed at the same time every night
? slices of bread if thats what you mean?
10 i like to cycle when i can to stay fit
this is the 3rd time you Ave asked me this question
probably insomnia, but it could be any number of things
11 i drank a coke with my lunch which was some curry
hmm favourite exercise, walking probably as does not feel like exercise
12 my son was sick, so he woke up 3 times at time
my favourite fruit is mango, but the bar was a strawberry one
Summary of Findings
As Kat was a first version prototype there were numerous lessons learnt.
 Verbatim conversation is difficult to manage
 Keep it simple and use buttons and other templated elements where possible
 A chatbot needs to respond quickly otherwise the respondent will leave the conversation or
repeatedly send the same message.
 It is easy for a chatbot with a limited knowledge base to repeat the same response. Users find
this frustrating
 Kat did not have a sentiment scoring engine so on several occasions would appear
insensitive. Empathy is important when holding a conversation.
Chat sample 1
Respondent: “I have a broken ankle”.
Kat: ” What do you do to treat it?
Respondent: “I am wearing a boot.”
Kat: “Has it been a tough few days?”
Respondent: “Yes.”
Kat: “Ok thanks. Please remember to say Hi again!”
Respondent “Do you not feel sorry for me?”
Kat: “I am a Healthbot called Kat”.
Respondent: “XXXXX!”
Chat sample 2
Kat: "OK do you know what caused the lack of sleep?"
Varti: "I just told you...cold, blocked nose"
Kat: “ ”
 The chatbot owner needs to reflect very carefully about letting artificial intelligence source and
compile responses on the fly.

24
CONCLUSION – ARE DIGITAL ASSISTANTS OR CHATBOTS A BRAND’S BEST
FRIEND?
Chatbots combined with the vast distribution of Messenger networks offer an unprecedented
opportunity for brands to connect with their customers at scale (see Appendix I for reach and brand
access).
Facebook’s chatbot store only launched in April and already more than 11,000 brand chatbots have
been launched. China’s WeChat opened their messenger to developers well before Facebook and
are reportedly launching thousands of new channels (their term for chatbots) each week.
While chatbots represent a significant opportunity for brands, their creation is not a straight forward
task.
Users will expect brief and accurate interactions. Users are very demanding of new technology as
evidenced by our usability tests.
Everything a chatbot says must be consistent with brand values. Once AI integration with brand DAs
becomes more mainstream, complex tests will be required to ensure the AI’s text generation is
consistent with a brand’s values.
Early adopters and influencers will try to see if they can stump a chatbot. In Microsoft’s case users
gleefully posted the conversation online when their chatbot Tay became confused. This means initial
offerings will be highly templated and often remove the requirement for open text.
Respondents indicated throughout the 3 their willingness to engage with a DA. However these
conversations can be complicated and nuanced as indicated by the significant variety of conversation
chains identified in the brand page experiment.
In short I believe that DAs/chatbots will become a brand’s best friend in the longer term. It is clear,
however, that as brands move into the world of digital conversation, they will need to start with simple
and templated experiences.

25
Appendix I
The following table, details a selection of the leading DAs, and DA distribution channels.
Messenger
Type
DA TYPE USER INTERFACE Reach Developer Access
Facebook
Messenger
Bot (built inside
of Messenger)
Primarily text, but
multimedia, can also
be shared with a Bot.
1 Billion
Monthly
Users
(MAU’s)
Developer API (Messenger
Developer).
SIRI General Digital
Assistant -
Voice. But the voice
can trigger website
links, that can then be
interacted with. Voice
Interface - Able to
integrate apps, with
SIRI commands. I.e.
“SIRI, please ask
Easyjet, what is the
status of my flight?”
~500
million
Developer API. Slightly different
interpretation,
Whatsapp None yet Primarily Text 1 Billion
MAU’s.
TBD
WeChat Bot Primarily Text 700
million
users.
Developer API.
Line Bot Primarily Text 215
million
users.
“Bot Store”, create fully featured
conversational Bot.
Snapchat None 150
million
daily
users.
None, brands make filters etc
KIK Bot Primarily text. Developer API
Telegram Bot Primarily text. “Bot Store”, create fully featured
conversational Bot.
Microsoft
Cortana
General Digital
Assistant – text
and voice.
Microsoft Bot Framework
Duer
Google “Allo” TBD launches
sohtly
TBD Brand
new!
TBD
Amazon Echo Bot (called Skills) Voice 3
million
units
sold in
the US.
Developer API (Skills).

26
Appendix II: Measurements
This table summarises the various DA measurement ideas surfaced throughout this paper.
User Interface Testing
 If you needed to find out X, how would you ask a DA to help you?
 If you needed to find out X, how would you ask a friend to help you?
 What would be the most useful response (to the question) from a friend?
 What would be the most useful response (to the question) from a DA?
 Did it present the information in a useful format?
 Is this the sort of question you would ask a DA?
 Could the information have been presented better and how?
 How would you expect this DA, to respond (prior)?
 How did the DA respond?
 How did the DA response make you feel?
 Did the DA meet your needs?
 Did the Da respond to you in a timely manner?
Script/Persona Testing
 How did interacting with the DA, make you feel about the brand? (a pre-question would ask about brand favourability);
 Conversational personification – did the DA require a persona?
 Does the persona, fit the brand?
Conversation Measurement (assuming access behavioural/log data)
 Average interaction length
 Time of day of interaction
 Location of interaction (at home, out of home – or more granular i.e. at the mall)
DA analytics for DA owner
 Number of unique users (assume filtered by time)
 Number of unique conversations
 Length of conversation
 Notification response time (how quickly someone enters a conversation from a notification or other prompt
 Repeat usage (and repeat usage frequency)
 Successful Conversation completes (i.e. did the conversation successfully conclude?)
 Un-successful Conversation completes (i.e. the conversation stopped prior to a transaction, or prior to information being shared. The
conversation became cyclic and the user gave up etc..)
 Confusion versus clarity – (i.e. did the user have to ask for clarification during the interaction)
 Conversational Variety – (i.e. the different conversation chains - as per TNS Facebook page analysis user-brand-user)
 The sentiment of the user interactions at each stage of the conversation
 The sensitivity of the user interactions (for instance if a user mentioned medical conditions)
 Intensity of conversation (inspired by KIK’s suggestions) -
o Active: A chat on one topic, where interaction, responses happen in rapid fire (i.e. a <= second interval between messages).
This could be an intense chat between girlfriend, boyfriend.
o Passive: An on-again off-again conversation. I.e. continued tweaking of a travel arrangement.
o Sporadic: Occasional messages, sent during the day, or week. This might be the style of conversation you would expect with
an entertainment service.
Artificial Intelligence Script Testing
A little futuristic, but I would imagine that pure AI generated conversations, would actually to be tested an AI routine with human over site.
 Conversational personification – this is a measure of how much people interact with your Bot as if it is human (i.e. Turing test) by
checking the language of user responses, for interactions that are human like, or contain empathy.
 Conversational economy score – i.e. when adding AI to a conversation flow, did the AI element assist in achieving a conversational
“task complete”.
 What sentiment spread of the AI generated conversation responses?
 Did the AI responses include stop-words (i.e. words or phrases that a Brand would not want to be associated with)?
 Confidence level % that the AI will meet brand guidelines, for a given subject.

Digital Assistants and Chatbots - a Brands best friend?

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Digital Assistants and Chatbots - a Brands best friend?

Similaire à Digital Assistants and Chatbots - a Brands best friend? (20)

Plus de David Wright

Plus de David Wright (8)

Dernier

Dernier (20)

Digital Assistants and Chatbots - a Brands best friend?