Millions of users visit Intuit product portals every day. With web analytics, we know what user behavior looks like, but not why. By tapping into in-product search and social data, we began to understand the types of questions, pain points, and suggestions users have. This was made possible with text analytics, via unguided machine learning at scale.
Topic discovery was just the beginning though. Trending, segmentation, integration with clickstream data and association with business goals made voice of customer insights actionable. In this presentation, learn about:
Text analytics at Intuit (case study)
Building decision support around text analytics
Technical approach & scaling
Protecting data privacy
Open source & commercial solutions
Heather Wasserlein is a Senior Product Manager at Intuit, where she partners with Data Science to create data-driven New Business Initiatives. Prior to Intuit, Heather worked on advertising marketplaces and web content classification at Yahoo! Heather holds a Master’s degree in Mechanical Engineering from MIT.
8. Overwhelming data volumes
You can read a few 1000 customer comments, but not millions.
And, new themes come up every day..
8
9. You can pull a “top 1000” list, but..
Is it telling you anything new? Actionable?
Top
hello
help
call
login
Mid
password
cant find pwd
account
multiple accounts
print
import error 5514
phone
printing blank page
phone number
call customer sevice
change password charged twice cancel
Long tail
10
Tail
print function not working new
version of IE, error msg 87956
please call back at 555-555-5555
10. Insights often in the tail
Top
Needle-in-the-haystack problem – valuable details
hidden in descriptive, tail verbatims
hello
help
call
login
Mid
password
cant find pwd
account
multiple accounts
print
import error 5514
phone
printing blank page
phone number
call customer sevice
print function not working
version of IE, error msg 87956
change password
charged twice cancel
please call back at 555-555-5555
Long tail
11
Tail
11. Related topics dispersed
Top
The “top 1000” can be misleading – the most common
verbatims may not represent the most common themes
hello
help
call
login
Mid
password
cant find pwd
account
multiple accounts
print
import error 5514
phone
printing blank page
phone number
call customer sevice
print function not working new
version of IE, error msg 87956
change password
charged twice cancel
please call back at 555-555-5555
Long tail
12
Tail
12. What is text analytics?
With numeric data, you can run summary stats summarizing textual data is more complex
Statistics + Linguistics
13
You can mix and match various statistical and linguistic tools,
depending on the problem
14. Case Studies
Applying text analytics
to simple and complex problems
at Travelocity, Yahoo! and Intuit
15
15. Travelocity search
Where is Albekerke?
San
San
San
San
Jose
Jose, CA
Jose, Costa Rica
Jose Intl Airport
NY
NYC
JFK
New York, NY, USA
NY, New York
Grand Canyon
Disneyland
16
Home
16. Travelocity search solution
Finite set of airports, but many variations in search
San Jose
San Jose, CA
San Jose International
Mineta San Jose Airport
San Josee Airport
Silicon Valley
SJC
SJC
Simple, but manually intensive solution –
Mapping of all known search variations to relevant
airport codes. Plus, sound-ex phonetic matching
to catch unforeseen misspellings.
“Rules-based” approach
no statistics, minimal linguistics (sounds)
17
17. Yahoo! web site classification
Is this site clean?
Does it contain any illegal
or sensitive content?
alcohol
tobacco
drug
online gambling
violence or weapons
adult content
Does the web site meet
advertiser standards?
18
18. Yahoo! web site classification solution
Verbose, rapidly-changing data, but finite set of topics.
100,000’s of web sites in Y! and partner Ad Networks.
Training data (human-labeled)
5K positive examples
30K negative examples
Multiple approaches –
Classifiers, keyword matching, image
matching, and human-review process.
19
Supervised machine learning
Pattern detection, phrases and contexts
associated with finite set of “risk categories.”
Emphasis on recall, catching true positives.
20. Intuit tax support solution
Millions of questions daily, of all types.
Google-like search, but often in natural language.
PIN number
Where can I find my PIN?
Newly married, file jointly
File married or separately?
Home mortgage deduction
Can I deduct my dog?
Why is 1099-int import slow?
Where’s my refund??
Solution –
Clustering of site searches,
topic “discovery”.
21
PIN
file married
deduct
1099int
refund
Unsupervised machine learning
Statistics and linguistics. Part of speech
tagging. Detection of words that “go
together more often than not”.
import
21. Results for 3 algorithms
LDA
(bag of words)
File, free, taxes
File, extension, get
File, security, social
Income, state,
business
Payment, state, filed
State, refund, check
Lingo
(hierarchal clustering)
File
File 2012
File an extension
File state
Deduction
Deduction car
Deduction sales tax
Deduction standard
Custom
(n-gram clustering)
File extension
Social security
Business income
Sales tax deduction
Refund check
Payment
(in-house solution)
22. Words + numbers = insights
Emerging
Topics
Funnel
Analysis
Refund
deduct
Late legislation
File extension
Error 576
etc.
Enter
w2
Import
error..
Trending &
(pre) Segmentation
Taxes done!
Sentiment
23
23. Use Cases
Product
Managers
1.
User needs
Customer
Care
1.
– Identify product
enhancements
– Rapidly diagnose
product defects
– Tune site search
– Personalize content
Common questions
Marketing
1.
– Train agents & staff
appropriately
2.
3.
– Address common
questions to retain users
– Segment by sentiment
and empower promotors
Emerging issues
– Early insight to new issues
Call routing
Segment by VOC
2.
Customer dialogue
– Listen to feedback &
respond 1:1 or 1:many
24. Our journey
Site search &
FAQ tuning
2 new
products
100’s items enabled
actioned,
$10M’s
X-functional value
“VOC team”
Scaled
meets weekly
Data
volume
grew,
system
crawled
Emerging issues
detection
Science
project
Clustering
2M searches
2-day lag
Vocal
early
adopters
Y1
Proof of concept
25
Transfer
from
science to
eng
Y2
Productize
Campaign
to grow
adoption
to 15M
searches,
1-day lag
Report
email
Scaled to
30M
searches,
next day
9am SLA
Viral
adoption,
50+ users
Y3
Scale..!
25. Scaling
Reduce
problem size
1.
Pre-process
– de-dup
– remove PII, system
generated info, etc.
– remove stop words
– map synonyms
– stemming
2.
Reduce data size
– sample
– segment
– narrow time period
– remove tail terms
(cautiously)
Add
hardware
1.
Add memory
– text clustering is
memory constrained
– verbose text is harder
2.
Distribute processes
– rule-based categorization
scales linearly
– clustering of segments
can be run in parallel
– data sourcing
– pre-processing
Optimize
algorithm
1.
Tradeoffs & tuning
– Choose approach to
balance accuracy vs.
performance
– Tune algorithm
parameters
26. Results
1. Faster time to insights
2. Better customer experience
3. $10’s millions in revenue
Customer issues detected up to 1
week earlier
Search is a leading indicator for call
drivers – a canary in the coal mine
Using text insights to tune search
results improved relevancy
Identifying users with common questions
made it possible to personalize the
experience
VOC data + user behavior led to a whole
new understanding of product use
Detecting and resolving customer pain
points generated $10’s of millions
27
27. Getting started?
1. Read a sample of verbatims + scope the problem
– Topic discovery or known topics?
– Sources of text and verbosity (few words, sentences, pages)?
– Estimate data volumes and define SLA’s
2. Build vs. buy
– Compare tools, build proofs of concept
– Compare results relative to a “golden set”
3. Start small
– One data source, non-verbose text, small volumes
– 1000’s of documents for statistically valid results
– Beta test reporting, QA topic-verbatim fit
4. Establish business processes
– X-functional process to action insights, let reports go viral
Scale and incorporate domain knowledge later (“phase 2”)
28
31. “Home grown” Algorithm
Unsupervised machine learning / clustering
1. Identify candidate phrases
– Sparse: Identify all combinations of bi-grams, tri-grams, four-grams
– Verbose: Use linguistic approaches to identify phrases
• Split text into sentences + identify part-of-speech for each word (noun, adj, etc.)
• Apply linguistic filters to parse candidate phrases (adj noun, verb adv, etc.)
2. Determine which phrases are “significant”
– Count word frequencies and calculate likelihood ratios
• L1 = words are independent, L2 = words are dependent
• If L2 > L1, the words appear together more often than not
3. Cluster related topics
– Represent n-grams and searches as vectors, calculate similarity (cosine
distance), and cluster related topics when similarity > pre-defined threshold
4. Identify topic “title”
32
– Construct “title” representative of the cluster (ex. most common search)
32. What’s next for text at Intuit?
1.
2.
3.
4.
Finalize evaluation of new algorithms (ex. Lingo3G, LDA, etc)
Scale through distributed processing (ie. move to Hadoop)
Support more types of text (ex. verbose)
Continue to integrate topics & usage data for complete
picture of end-to-end user experience
5. Provide text analytics as a service
6. Semantic search
7. Internationalization (future)
33
Notes de l'éditeur
In a digital world, businesses give customers many channels to communicate – throughout the end-to-end customer experience of shop, browse, buy, use, etc.Ideally, we’d “listen” equally well across all of these touch pointsYet, much of the analytics focus is either upstream (ex. Search engines) or downstream (ex. Social media)This provides insight into user intent and feedback, but misses very important insights into customer experience with your product and servicesFor example, site search, customer support channels (call centers, chat) and communities are valuable sources of insights Rather than wait for feedback on yelp or twitter, there’s an opportunity to be proactive and address customer questions during product useAlso, with many channels, there are many formats for dataA tweet doesn’t look like a blog postVoice data often gets converted to text (by a machine or an agent summarizing a call, for example)
It is not uncommon to see people trying to read through 1000’s of customer surveys, suggestions, etc., one of my first text analytics requirements sessions was with a sharp User Experience Designer who would spend her Friday afternoons reading as many feedback reports as possibleCEO’s often personally read a subset of emails from customers or listen in on support calls, our CEO doesThis is commendable, but doesn’t scale when you receive millions of communications every dayNor is it possible to keep up with ever changing topics – today’s customer questions could be completely different than yesterdays
While language has some structure, there is ambiguityWords have multiple meanings, different forms, and can be used in metaphor(ex. Can and can, tin can vs. we can. Colorful fish vs. let’s go fish vs. a fish out of water)In addition, we are human. We have our own unique way of saying things. Some of us are polite and punctuate. Others misspell and abbreviate.. Sometimes we share TMI, including our PII. With Text Analytics, all of our data gets thrown in the mix. The goal is to make sense of it all.
In order to accurately “summarize” text data, the trick is to count all related topics across the corpus
At the most basic level, we’re trying to understand the meaning of words – with uncertainty due to context, morphology, and accuracy (ex. Misspellings)More generally, we’re trying to understand user intent, sentiment, etc.Note: As documents become more verbose (ex. Blog is verbose, a tweet is sparse), the more linguistics can help. Linguistics – SoundsWords (literal meaning)Bi-grams, etc. (words that go together, like “new york”)Phrases (“who let the dogs out?”)Sentences and Part of Speech / POS (subject, object, noun, adj, verb, etc.)Context within a large block of textTerminology:CorpusDocuments (text data, could be a tweet, search query, blog entry, etc.) – called a “verbatim” if in the user’s wordsWords vs. tokensTopics / themes
Everyone has a particular writing (and speaking) styleSome people use some vocabulary more than othersI bet you could distinguish a paragraph from NYT vs. CosmopolitanStatistics can be used here – to find distributions for every word (ex. How many times is “the” used in general publications) and compare it to your writing (ex. Do you use “the” more than the average person)?Note: women use adjectives more than men
Taxes are complex, people have tons of questions from start to finish
Intuit also uses Clarabridge (rules based solution) for categorization of Support call logs and Radian6 for monitoring and sentiment analysis of social media The primary driver for unsupervised clustering of in-product search queries was to capture “emerging issues” – Things we couldn’t foresee ahead of time when building rules (ex. Bug introduced in a product launch, late legislation issues with IRS, etc.)Another benefit of unsupervised approaches is they don’t require human input or maintenance (low effort)
Numbers tell us what is happening, but not whyThis is where text completes the story.For example, you may see conversion going up or downBut, what’s driving this change?By looking at emerging issues (what people are talking about today), you can see if a bug was introduced in your recent launch, etc.Trending is also valuable – to determine if a particular topic is gaining strength or gone away (ex. after making a product enhancement)Segmentation enables you to see the types of questions new vs. returning users haveBetter yet, questions from non-convertersBut, unlike numeric data, where you can slice and dice results after aggregatingWith text, you get more accurate results if you segment before clusteringAre tax filers procrastinators? ;-) File extension is a perennial top theme the night before tax dayIntegrating text into “funnel analysis” was extremely valuable. Clickstream data tells us where users drop off,but not why. Verbatims helped pinpoint user pain points / road blocks. Resolving just one of these pain points was worth $5MAnalysis of adjectives provides directional gage for sentimentPerhaps a more accurate way to gage sentiment is to segment promoters from detractors and see what each group has to say
When I began working at Intuit 3 years ago, there were text analytics efforts centered around call logs and social.We used a rules-based categorization tool called Clarabridge to classify logs from call center agents.We obtained a data feed from facebook, twitter, blogs, etc. and evaluated results with radian6, a Sales Force tool.Both of these tools work well for their respective use cases, but we noticed a gap – we didn’t have a good way to detect emerging issues.Thus began a 3-year journey in unguided machine learning for automated topic discovery (ie. No human input required)..
Pre-processing is 90% of the solution – you can greatly reduce complexity by removing stop words, stemming, mapping synonyms, etc. Reduces the term-doc matrix.With a 30% sampling rate, we saw an equivalent set of “top themes” as with a complete, 100%, data setRules-based categorization scales linearly, butClustering is memory constrained, because everything is compared with everything else.., segmentation helps, because segments can be processed in parallelWith 64GB memory, clustering of 5 million searches took < 2 hrs,enabling next day reporting on yesterday’s clickstream by 9AMOptimizing upstream processes helps too Note: as text becomes more verbose, computation time slows, a lot. Using Part of Speech parsing to focus on nouns can help identify what a document is about, although you miss sentiment (adjectives). Rules based approaches, categorization based on keywords, is also easier. Depends what type of problem you are solving.