The best way of improving customer experience is by listening to customer feedback. But reading customer comments, such as complaints, answers to open-ended survey questions, and forum posts, is prone to bias. NLP, and text analytics in particular, can automate the analysis of customer comments. But how does one choose the right approach?
15. teaching
not helpful teachers bad learning style
good learning stylehelpful teachers
The lecturers aren’t particularly helpful and the learning style is far from perfect.
I have always found the lecturers to be very helpful and the learning style is perfect.
Same nouns & adjectives, but different feedback!
16. Purposes of Negation
• Reversing polarity
I did not like the learning style → dislike it
• Emphasising negativeness or positiveness
There is nothing I did not like about the learning style → love it
• Make weaker claims
The learning style is not bad → it’s ok
26. How can an NLP solution work on a small dataset?
• Industry-specific dictionaries & rules
But: How to avoid ambiguity errors?
• Pre-defined static categories
But: How to capture emerging themes?
• Creative data gathering
• Re-purpose survey data from related companies
• Re-purpose company-own resources
27. Example of a related dataset used to model specifics of word meanings
29. Immediately
actionable theme
Repeated
but has no meaning
Trivial,
Already knew
Insightful,
new knowledge
Aspect or general
category of business
Ideal output
from NLP analysis
Most NLP Solutions
1h Prototype
with open-source tool
Suspected,
Data verified
30. Price increase
New product feature
Marketing campaign
What happened?
✓
Themes changing over time explain the reasons behind drops!
31. 1
2
3
4
5
6
7
Capture ways people talk about the same thing
Capture positive & negative attributes separately
Capture emerging themes
Link to original for verification & action
Ensure transparency and ability to edit
Work well on small datasets
Provide actionable insights
Today I want to talk about customer feedback analysis. We all here agree that sentiment analysis plays an important role in understanding customer feedback. But I found there is a disconnect to what’s actually happening in the industry.
If you google ‘Customer Feedback Analysis software’, what you find is an overview of tools that collect people’s scores and then presenting them as pretty dashboards. Or here are the answers from Quora on ‘What’s the best customer analysis tool’. Most focus on scores and not people’s comments.
And sure, if you a consumer, a quick summary of competitors by score may be all what you need. For example, to find the best restaurant. But as the owner of a poor restaurant with a 3 score rating, how would you know what do? Would you rather have 100 scores or 10 customer comments on why they gave you that score?
We found that comments are quite important to customer insight professionals and this is how they use them.
Comments that change over time, with scores, are particularly valuable. They can explain why the scores rise and drop, and if scores stay the same, provide a richer insight.
By looking deeper into the comments, you can find out who should be following up with the customer. Imagine for example capturing all people who want to cancel a service.
And also, if you have done any changes to your offering, for example use a new recipe, did that actually get noticed and affected the score.
To summarise, applying NLP on people’s comments helps get a deeper insight and get to the action of improving customer experience faster.
My background is in NLP but over the past 2 years we’ve spent a lot of time talking to customer insight team. I noticed that many current NLP solutions do not actually provide functionality that matters to them. Therefore, today I would like to share with you the needs that we discovered while building our NLP solution at Thematic. We may not have cracked all of them yet, but we do believe that they are Must Haves.
If you own an NLP solution to CX or plan to build one, feel free to use the Must Haves as a guide.
If you are looking to buy a solution, or implement one using open-source, send me an email and I will share with you a report that we found valuable while evaluating different options.
The first Must-Have is about capturing many ways people may be referring to the same thing.
Imagine you have paid for a newspaper delivered to your door. It rained. As you are unsticking the wet pages you are frustrated that you cannot read it. How many ways do you think there are to complain about a wet news paper?
There are dozens possibilities! And if an NLP solution cannot capture them accurately, the importance of this issue may be misrepresented. Many solution out there use industry dictionaries or worse WordNet. But customer comments are messy and synonyms will be specific to your business. For example, ‘paper’ and ‘newspaper’ is rarely a synonym pair outside of publishing. And we found that ‘build’ and ‘buy’ could be either synonyms or antonyms depending on the context: real estate or software.
At Thematic we learn synonyms from the data itself. And once, we came across an unusual, and at the first glance incorrect pair. Airport is the frequent flyer currency of AirNZ, airport is usually a very different thing. After examining the results closely we found that the system was right. Autocomplete did not know about ‘airpoint’ and autocorrected it to ‘airport’, which meant that this was a dataset specific synonym pair.
This is why one size will not fit all.
While you need to capture many different ways people are talking about the same thing, when it comes to attributes, e.g. good coffee/bad coffee, often Customer Insight professionals prefer if you capture them separately. This may be relatively easy, if the attributes are clear antonyms, e.g. ‘fast service’ vs. ‘slow service’. But negation makes everything much harder.
Here is an actual example from manual categories chosen by a human tagger. An NLP system for customer feedback analysis should ideally be able to capture that the two sentences while using the same nouns and adjectives actually should be categorised differently.
Most NLP solutions do not deal with negations. Those who do, simply reverse polarity: did not like = dislike. But there are other purposes, like the emphasis: nothing I did not like means loved it. Or making a weaker claim. So ‘not bad’ does not necessarily means ‘good’, most likely it means a rather neutral statement.
When dealing with negation, parsing will help determine its focus and scope. But the next step is to actually merge negated statements with non-negated ones correctly. For this, you’ll need some sort of antonym detection. Only then, a solution can help accurately determine how many people liked or disliked a certain aspect of the business.
This is why one size will not fit all.
A common approach to summarising feedback, even when done manually, is to use a static set of categories or themes. The first problem with this, is that it reflects the bias of the person who created them. The second problem is that it is, well, static. It’s the nature of doing a business that there are always changes. There may be changes in pricing structure or in competition. If you want to capture people’s reaction to these changes, you need a solution where themes can emerge over time.
If you do not do this, and let’s say use supervised categorisation, over time, what can happen is that you end up with a very large ‘Other’ category because comments would not fit into any of the pre-defined ones. You will always have people commenting on things that are different to others. But as a rule of thumb, your ‘other’ category should not be more than 20%. This is an actual examples from one company’s data we worked with, where we helped them reduce ‘Other’ to 8% compared to a 54% of a home-baked code.
This is why one size will not fit all.
My next NLP Must Have is about the necessity of having a clear link to the original comment. Context is king, as they say, and without context it is hard to interpret, understand and act upon the results. I have seen several NLP solutions that do not provide that option.
Verification can be painful. Thematic once was tested against a human coder Kate. We identified that one of the key things students wished was improved at a university was the quality of food. Kate found the same issue, but at much lower frequency. By being able to pull out all comments on this topic, we verified them, and found that Kate was tagging only key issues in each comment, wheres we tagged all of them. As a result, the university could act upon this problem and increase student satisfaction simply by improving the situation with food.
Transparency in how the algorithm came to particular results is also important, because only then we can give somebody like Kate a chance work with algorithm to benefit from both of their strengths. Kate knows the domain, what’s important to track and what can be ignored.
Sometimes there is a wrong and right answer. For example, soccer world cup is in many countries the same as football world cup. But in other cases, it depends on the customer’s priorities whether they want to track rugby world cup separately from soccer/football world cup, or as the same thing. And they need to be able to make changes to how the system decided to do the grouping.
Small datasets are a big pain for data-driven algorithms. You can’t build a language model on Wikipedia or IMDB reviews, because words mean different things in different context. And a model built on a small dataset won’t work. Solutions are: create industry-specific rules, repurpose data from different clients, or get creative.
At Thematic, we get creative quite often. One of our customers is a DJ software company Serato. They have thousands of users, but only get a few hundred of short comments per month. So to help them, we built a language model from their community forums, that turned out to have millions of threads, and learned about things like processors, controllers, playback etc.
Finally, the result of NLP analysis should provide information that’s not trivial and easy to act on. Let’s say, an NLP system analysed 500 comments of a software company and returned that key categories like ‘product’, ‘customer service’, and the name of the company. This is not insightful. Similarly, knowing that customer service has poor sentiment is not actionable.
Keeping this in mind, NLP solutions can be evaluated according to this diagram. On the one axis we have language knowledge categorised by how actionable it is. On the other axis, we have trivial, suspected, but needed to verify using data, and finally new insightful knowledge. For example, we can easily guess which words will repeat in customer comments. These words will have zero meaning. 90% of NLP solutions that I’ve seen in the market capture general aspect of what’s in the comment and do not return any actionable results. Ideally, an NLP solution should return a mixture of themes, some of which should be insightful and actionable. Perhaps, only customer insight managers can judge if something is an insight to them or not, but in general this is where we want to be.
Coming back to our diagram from the beginning of this talk, the correct answer is ‘New product feature’. If the NLP solution works correctly, as you are moving from one month to another, you should be able to see a change in the trending themes for that month. In this particular case, the trending keyword was ‘hard to read’, and the company fixed it by changing the font in the UI.
Here they are again. If I have missed something or you disagree, let’s discuss!
If you would like a report comparing different NLP methods against these Must Haves, please send me an email.