SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
Using the Wisdom of the Crowd to Help Improve
Catering Service
Fabio Amaral
17 November 2015
Introduction
While checking the Yelp star rating and user reviews of a business is a great way for assessing the general
user perception of a service’s quality, it is still a quite subjective task and requires a thorough evaluation of
the written reviews available if one would like to learn the reasons for a business being considered a success
or a failure.
Therefore the aims of this project was to utilize the text of available Yelp reviews to try to predict the star
ratings given by the reviewer and to try to learn in the process the topics that the customers care the most.
Techniques of topic modelling were used for this purpose and the information learned should be useful for
helping users to make better informed choices more easily and for business owners and managers to identify
potential service improvement opportunities.
Methods and Data
The data set used in this project was downloaded in Json format from the link provided, which comprises of
information about local businesses in 10 cities across 4 countries. Detailed information about the data and
an ongoing competition of which this data set is part of can be found in Yelp Dataset Challenge webpage.
The R packages jsonlite, plyr, dplyr, stringr, ff, ffbase were used for parsing the data set into a standard
data.frame. The available data contains a total of 1,569,264 reviews from 61,184 unique businesses from a
wide variety of service types such as bookstores, building contractors and drugstores. Restaurants comprise
around 36 % of the businesses in this data set, therefore we focused on this category in an attempt to extract
more specific and actionable information from reviews.
The variables text (reviews), review counts, star (business), star (review) were extracted as presented
in the original data set and a few others were created to assist in the rating prediction and extraction of
relevant information from the reviews. The variable delta star was created to represent the rating variation
of the review rating in relation to the provided business rating (review star - business star). A similar
categorical variable review effect with the levels positive, negative and neutral was created to indicate
the effect of the review on the business rating (e.g. if review star - business star > 0 then review
effect = positive). The variable sentiment was created by sentiment analysis of the review text using
the function polarity from the package qdap and has a value range from -0.77720 to 0.99740 indicating how
negative or positive the words in each review are (Figure 1d).
Given the multinational nature of the data set, a number of reviews were written in languages other than
English with the most evident ones being German (435), French (334) and Spanish (9). These reviews were
automatically translated via the Microsoft Translator API using the R package translateR. The packages tm,
slam, SnowballC were used for manipulating text and creating a sparce matrix of word counts for bag-of-words
modelling. The package tau was used for generating word and n-grams frequency analysis.
The package lda was utilized for performing rating prediction via topic modelling with supervised latent
Dirichlet allocation sLDA as introduced in Blei and McAuliffe 2010 and interactive visualizations were
created with the package LDAvis. Structural topic model estimation was further performed using the package
stm by including relevant metadata information such as the review related variables aforementioned as
introduced by Roberts et al 2015 and interactive visualization was created with the package stmBrowser.
1
The topic modeling techniques used in this project are modified versions of the more general latent Dirichlet
allocation which views each document as mixture of topics formed by latent probability of some terms being
present. The topics are said to form a mixed membership of term, that is a term can appear in more than
one topic with different probabilities. In the supervised algorithm version used in this work linear regression
is used to predict the labeled review rating using the infered topic/terms coefficients as predictors.
Results
Upon inspection of the review star ratings distribution (figure 1a) we can observe an imbalance dominated by
positive reviews ratings between 5 and 3 stars with much less frequent 1 and 2 stars with a mean 3.7 and
median 4. This skewed distribution bias is not observed for the cumulative business ratings (figure 2b) which
is approximately normal with mean 3.48 and median 3.5. The distribution of the variation of review rating in
relation of the cumulative business rating is also approximately normal (mean = 0.22 and median = 0.5)
with a bit more reviews bellow the cumulative business rating than above. However a sentiment analysis
of the review texts indicate that the overall sentiment balance of the reviews is markedly positive. 83.56 %
reviews are considered positive, 11.33 % reviews are considered negative and 5.11 % reviews are considered
neutral. The discrepancy between the rating variations and sentiment score distributions highlights the
complex challenge of integrating the objective metrics of star rating with the subjectivity of written review
texts for the purpose of making predictions.
1
2
3
4
5
a) Review Ratings
Density
StarRating
0.00 0.10 0.20 0.30
1
1.5
2
2.5
3
3.5
4
4.5
5
b) Business Ratings
Density
StarRating
0.00 0.05 0.10 0.15 0.20 0.25
c) Rating Variation
Review Star − Business Star
Density
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
−0.5 0.0 0.5 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
d) Sentiment Score Distribution
Sentiment Score
Probability(RelativeFrequency)
Figure 1. Exploratory analysis of busines/review ratings and sentiments score distributions
Similarly to the original LDA algorithm, the structure of topics modeled by sLDA is mostly influenced by
the number of arbitrarily selected topics (K) and the initialization prior parameter of the Dirichlet on the
per-document topic distributions (alpha) and the response parameter (eta). Therefore a large combination
of values for these parameters was used for producing multiple models of which the combination (K=40,
alpha=1 and eta=0.1) that resulted in the lowest root mean squared error (RMSE = 0.8764) was selected for
further analysis (figure 2).
1.0
1.2
1.4
1.6
1.8
0 10 20 30 40 50
trial
RMSE
variable
K=5
K=6
K=7
K=8
K=9
K=10
K=11
K=12
K=13
K=14
K=15
Best Tuning Parameters (K = 5−15)
K=15 alpha=1 eta=0.1 RMSE=0.8927
0.88
0.90
0.92
0.94
0.96
1 2 3 4
trial
RMSE
variable
K=10
K=15
K=20
K=25
K=30
K=35
K=40
K=45
K=50
K=55
K=60
K=65
Best Tuning Parameters (K = 10−65)
K=40 alpha=1 eta=0.1 RMSE=0.8764
Figure 2. Tuning the parameter of sLDA for improved rating prediction measured by RMSE.
Figure 3 shows a summary result of the analysis of correlation between the most frequent words from the
modeled 40 topics and the estimated review rating. Since regression is used in sLDA we get a wider range
of estimated ratings than 1 to 5 stars for the combination of frequent words from each topic. As shown in
2
the legends, the sizes of each spot represent the t-value of the regression coefficients. We can see from this
analysis that topics involving friendliness, location, option variety of desert and fresh salads and restaurant
grand openings are mostly associated with the very high review ratings. On the other hand the food taste
quality, extended service waiting times and waiters mistakes are the major causes of customer dissatisfaction.
bad didnt food tasted bland
order minutes time service waiting
good pretty place decent bad
table ordered waitress asked didnt
bit make dont meat places
drive fast order mcdonalds window
counter kids order dont line
area place eat part small
im people youre dont theyre
food service stars fast quality
sauce salad red fresh flavor
nice inside tables side small
chicken fried rice wings sauce
nice good menu tea back
review star place time im
bbq meat pork potato sweet
breakfast coffee eggs cafe morning
bar beer great hour drinks
night music bar late fun
chinese soup beef rice pork
sandwich sandwiches bread cheese subway
pizza italian crust cheese pasta
steak dinner meal restaurant appetizer
fries burger cheese burgers bacon
back wasnt didnt give good
mexican tacos taco salsa burrito
lunch great quick prices food
wine dining room list menu
sushi fish roll shrimp seafood
food buffet thai dishes restaurant
montreal st french la restaurant
fresh options salads menu delicious
hot dog make style dogs
location staff friendly lot great
opening staff free open grand
cream ice dessert chocolate delicious
vegas restaurant great strip recommend
great place food family love
im time day place friend
ive love place awesome amazing
−10 −5 0 5
Estimate
Topics
abs(t.value)
20
40
60
80
−10
−5
0
5
Estimate
Figure 3. Review rating estimation associated with the 40 modeled topics.
Figure 4 shows how the estimated ratings are distributed in relation to the review ratings actually given
by the Yelp users. We can see an evident improved estimation for higher ratings as would be expected
from an imbalanced data set with regards to the numbers of reviews with each star ratings. The out-of-bag
estimation RMSE on the training set was of 0.8579872 and 0.8842336 on the test set with an R-squared value
of 0.4428283. This is a much better estimation than that of a baseline model that would classify all reviews
as the most frequent rating (4 stars) which would result in an RMSE of 1.2225789 and R-squared value of
-0.0651453.
Attempts to address the issue of review rating imbalance were made but with little success. A balanced
training set was created by randomly selecting 1,000 reviews of each rating but this had very little to no
effect on the prediction improvement of lower ratings. Another attempted solution that seemed to reduce the
overlap between the estimated ratings was by balancing the train set by randomly sampling with replacement
reviews with 1, 2, 3 and 5 ratings so they would have the same number of reviews as for rating 4 stars
(nearly 5,000 reviews each) after the train and test set had already been split. This action helped separating
better, especially the lower ratings, but caused a strong artificial bias towards the negative words in the topic
modeling and therefore was not considered for further analysis.
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4 5 6
predicted rating
density
actual
review rating
1
2
3
4
5
Figure 4. Predicted rating distributions
The package LDAvis was used to assist in the exploration and a representative image of a topic (desert),
highly related to high ratings reviews, can be viewed on figure 5. Feel free to explore the topics and their
3
related terms by clicking on the figure itself or on this link. The principal component analysis on the left
illustrate how close a topic is from each other and by clicking on each numbered topic spot the term frequency
bar plot on the right is adjusted accordingly for that topic. By hovering over the term labels of the bar plot
its possible to see in which topic each term can appear since they may be part of multiple topics. The relative
relevance of the terms in relation to each topic can be further tuned by changing the value of lambda with
the sliding knob on the top right hand corner.
Some of top predictor topics in relation to the estimate ratings (as per figure 3) are number 30 (good
impressions), 36 (Las Vegas), 3 (deserts), 15 (grand openings), 33 (locations), 34 (hot dog), 39 (Montreal), 20
(Fresh salads). Some of the topics related to the worst rating estimates are 7 (waiting times), 10 (issues with
the food), and 35 (fast food drive through).
Figure 5 Visualization of the supervise LDA analysis created with the LDAvis package. Click on this link or
on the figure above to open the analysis in a web browser for interactive visualization.
Other metadata information such as sentiment scores, rating variation between review and cumulative
business rating (delta.star), complete selected review documents were further explored by structural topic
modeling and visualized with the help of the stmBrowser package and a representative result of a positive
topic correlation with increased star rating can be viewed in figure 6. To access the interactive plot please
click on the figure itself of on this link.
To reproduce figure 6 please select the following options from the respective drop-down menus: X-axis =
delta-star, Y-axis = Topic 7, Radius = review.star, Color = sentiment. Each plotted spot represent one
sampled review document which can be clicked to read its text.
By making use of the delta.star metrics one can more easily isolate individual reviews associated with each
topic to better focus on what most influenced the user experience on the occasion of writing the review. It is
particularly informative to focus on the reviews which rating most vary from the cumulative business rating.
We can verify and confirm the relevance of the topics within the review texts, the themes most associated
to the reduction or improvement of a business rating. Figure 6 shows a very representative selected review
4
commenting on the quality of the food and the friendly and efficient service which had been all identified by
sLDA topic modeling as strong predictors of higher review ratings.
Figure 6 Visualization of the Structural Topic Model created with the stmBrowser package. Click on this
link or on the figure above to open the analysis in a web browser for interactive visualization.
Discussion
The difference between the success and the failure of a business or a good or bad customer experience can be
down to details that simple star ratings cannot inform. The availability of user reviews provides the means of
identifying what the are the main reasons for the customer satisfaction. With the ever growing numbers of
reviews an automated system to analyse them and isolate actionable information becomes very important.
We have seen that topic modeling techniques can be very effective on this task and an implementation of
such analysis on review sites such as Yelp could further increase the usefulness of its rich information data
bases both for business owners/manages as well as their customers.
We have been able to learn that in catering business an attentive and friendly service is a key indicator
of a good customer satisfaction in the catering business as well as having a good selection of tasty food,
fresh salads and vegetables and deserts, especially ice cream. A similar approach could be applied for other
business classes to be able to offer equivalent insights.
One possible limitation of the analysis presented in this study was the imbalance between review ratings
with much less reviews with 1 and 2 stars than above. It might be possible to further improve such analysis
by using a more balanced data set or by using some terms weighting scheme to better calibrate the rating
predictions.
5

Contenu connexe

En vedette

Morgan Resume 2016 Revised
Morgan Resume 2016 RevisedMorgan Resume 2016 Revised
Morgan Resume 2016 RevisedMorgan Robinson
 
SULI HYDE J Report
SULI HYDE J ReportSULI HYDE J Report
SULI HYDE J ReportJeremy Hyde
 
Richard Russell resume 2014[1][1]
Richard Russell resume 2014[1][1]Richard Russell resume 2014[1][1]
Richard Russell resume 2014[1][1]Richard Russell
 
India Health Insurance Scenario - September 2012
India Health Insurance Scenario - September 2012India Health Insurance Scenario - September 2012
India Health Insurance Scenario - September 2012Sudip Mukhopadhyay
 
Linminjung ux resume
Linminjung ux resumeLinminjung ux resume
Linminjung ux resumeLin Min-Jung
 
FINAL MICRO PAPER
FINAL MICRO PAPERFINAL MICRO PAPER
FINAL MICRO PAPERGreg Poapst
 
Stephen Kazman - School Trustee Ward 5
Stephen Kazman - School Trustee Ward 5Stephen Kazman - School Trustee Ward 5
Stephen Kazman - School Trustee Ward 5Stephen Kazman
 

En vedette (8)

Morgan Resume 2016 Revised
Morgan Resume 2016 RevisedMorgan Resume 2016 Revised
Morgan Resume 2016 Revised
 
SULI HYDE J Report
SULI HYDE J ReportSULI HYDE J Report
SULI HYDE J Report
 
Richard Russell resume 2014[1][1]
Richard Russell resume 2014[1][1]Richard Russell resume 2014[1][1]
Richard Russell resume 2014[1][1]
 
India Health Insurance Scenario - September 2012
India Health Insurance Scenario - September 2012India Health Insurance Scenario - September 2012
India Health Insurance Scenario - September 2012
 
Yoerges UX Resume
Yoerges UX ResumeYoerges UX Resume
Yoerges UX Resume
 
Linminjung ux resume
Linminjung ux resumeLinminjung ux resume
Linminjung ux resume
 
FINAL MICRO PAPER
FINAL MICRO PAPERFINAL MICRO PAPER
FINAL MICRO PAPER
 
Stephen Kazman - School Trustee Ward 5
Stephen Kazman - School Trustee Ward 5Stephen Kazman - School Trustee Ward 5
Stephen Kazman - School Trustee Ward 5
 

Similaire à Final.Version

Predicting Yelp Review Star Ratings with Language
Predicting Yelp Review Star Ratings with LanguagePredicting Yelp Review Star Ratings with Language
Predicting Yelp Review Star Ratings with LanguageSebastian W. Cheah
 
PredictingYelpReviews
PredictingYelpReviewsPredictingYelpReviews
PredictingYelpReviewsGary Giust
 
Yelp Rating Prediction
Yelp Rating PredictionYelp Rating Prediction
Yelp Rating PredictionKartik Lunkad
 
Scaling in research
Scaling  in researchScaling  in research
Scaling in researchankitsengar
 
Exploratory data analysis and data mining on yelp restaurant review
Exploratory data analysis and data mining on yelp restaurant review Exploratory data analysis and data mining on yelp restaurant review
Exploratory data analysis and data mining on yelp restaurant review PoojaPrasannan4
 
Rating Prediction for Restaurant
Rating Prediction for Restaurant Rating Prediction for Restaurant
Rating Prediction for Restaurant Yaqing Wang
 
Measurement and scaling techniques
Measurement and scaling techniquesMeasurement and scaling techniques
Measurement and scaling techniquesSarfaraz Ahmad
 
Chotu scaling techniques
Chotu scaling techniquesChotu scaling techniques
Chotu scaling techniquesPruseth Abhisek
 
Measurement and scaling techniques
Measurement and scaling techniquesMeasurement and scaling techniques
Measurement and scaling techniquesKritika Jain
 
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
Customer Satisfaction Data -  Multiple Linear Regression Model.pdfCustomer Satisfaction Data -  Multiple Linear Regression Model.pdf
Customer Satisfaction Data - Multiple Linear Regression Model.pdfruwanp2000
 
A Supervised Modeling Approach to Determine Elite Status of Yelp Members
A Supervised Modeling Approach to Determine Elite Status of Yelp MembersA Supervised Modeling Approach to Determine Elite Status of Yelp Members
A Supervised Modeling Approach to Determine Elite Status of Yelp MembersJennifer (Hui) Li
 
Text Data Mining and Predictive Modeling of Online Reviews
Text Data Mining and Predictive Modeling of Online ReviewsText Data Mining and Predictive Modeling of Online Reviews
Text Data Mining and Predictive Modeling of Online ReviewsMark Chesney
 
Measurement and scaling techniques
Measurement  and  scaling  techniquesMeasurement  and  scaling  techniques
Measurement and scaling techniquesUjjwal 'Shanu'
 
T4 measurement and scaling
T4 measurement and scalingT4 measurement and scaling
T4 measurement and scalingkompellark
 
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014Welocalize
 
Research Method for Business chapter 7
Research Method for Business chapter  7Research Method for Business chapter  7
Research Method for Business chapter 7Mazhar Poohlah
 
Comparative analysis of Retail chains based on SERVQUAL Model
Comparative analysis of Retail chains based on  SERVQUAL ModelComparative analysis of Retail chains based on  SERVQUAL Model
Comparative analysis of Retail chains based on SERVQUAL ModelSuresh Singh
 
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment AnalysisIRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment AnalysisIRJET Journal
 

Similaire à Final.Version (20)

Predicting Yelp Review Star Ratings with Language
Predicting Yelp Review Star Ratings with LanguagePredicting Yelp Review Star Ratings with Language
Predicting Yelp Review Star Ratings with Language
 
PredictingYelpReviews
PredictingYelpReviewsPredictingYelpReviews
PredictingYelpReviews
 
Yelp Rating Prediction
Yelp Rating PredictionYelp Rating Prediction
Yelp Rating Prediction
 
Scaling in research
Scaling  in researchScaling  in research
Scaling in research
 
Exploratory data analysis and data mining on yelp restaurant review
Exploratory data analysis and data mining on yelp restaurant review Exploratory data analysis and data mining on yelp restaurant review
Exploratory data analysis and data mining on yelp restaurant review
 
Rating Prediction for Restaurant
Rating Prediction for Restaurant Rating Prediction for Restaurant
Rating Prediction for Restaurant
 
Measurement and scaling techniques
Measurement and scaling techniquesMeasurement and scaling techniques
Measurement and scaling techniques
 
Chotu scaling techniques
Chotu scaling techniquesChotu scaling techniques
Chotu scaling techniques
 
Measurement and scaling techniques
Measurement and scaling techniquesMeasurement and scaling techniques
Measurement and scaling techniques
 
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
Customer Satisfaction Data -  Multiple Linear Regression Model.pdfCustomer Satisfaction Data -  Multiple Linear Regression Model.pdf
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
 
ch 13.pptx
ch 13.pptxch 13.pptx
ch 13.pptx
 
A Supervised Modeling Approach to Determine Elite Status of Yelp Members
A Supervised Modeling Approach to Determine Elite Status of Yelp MembersA Supervised Modeling Approach to Determine Elite Status of Yelp Members
A Supervised Modeling Approach to Determine Elite Status of Yelp Members
 
Text Data Mining and Predictive Modeling of Online Reviews
Text Data Mining and Predictive Modeling of Online ReviewsText Data Mining and Predictive Modeling of Online Reviews
Text Data Mining and Predictive Modeling of Online Reviews
 
Measurement and scaling techniques
Measurement  and  scaling  techniquesMeasurement  and  scaling  techniques
Measurement and scaling techniques
 
T4 measurement and scaling
T4 measurement and scalingT4 measurement and scaling
T4 measurement and scaling
 
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
 
Presentation - SERVQUAL
Presentation - SERVQUALPresentation - SERVQUAL
Presentation - SERVQUAL
 
Research Method for Business chapter 7
Research Method for Business chapter  7Research Method for Business chapter  7
Research Method for Business chapter 7
 
Comparative analysis of Retail chains based on SERVQUAL Model
Comparative analysis of Retail chains based on  SERVQUAL ModelComparative analysis of Retail chains based on  SERVQUAL Model
Comparative analysis of Retail chains based on SERVQUAL Model
 
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment AnalysisIRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
 

Final.Version

  • 1. Using the Wisdom of the Crowd to Help Improve Catering Service Fabio Amaral 17 November 2015 Introduction While checking the Yelp star rating and user reviews of a business is a great way for assessing the general user perception of a service’s quality, it is still a quite subjective task and requires a thorough evaluation of the written reviews available if one would like to learn the reasons for a business being considered a success or a failure. Therefore the aims of this project was to utilize the text of available Yelp reviews to try to predict the star ratings given by the reviewer and to try to learn in the process the topics that the customers care the most. Techniques of topic modelling were used for this purpose and the information learned should be useful for helping users to make better informed choices more easily and for business owners and managers to identify potential service improvement opportunities. Methods and Data The data set used in this project was downloaded in Json format from the link provided, which comprises of information about local businesses in 10 cities across 4 countries. Detailed information about the data and an ongoing competition of which this data set is part of can be found in Yelp Dataset Challenge webpage. The R packages jsonlite, plyr, dplyr, stringr, ff, ffbase were used for parsing the data set into a standard data.frame. The available data contains a total of 1,569,264 reviews from 61,184 unique businesses from a wide variety of service types such as bookstores, building contractors and drugstores. Restaurants comprise around 36 % of the businesses in this data set, therefore we focused on this category in an attempt to extract more specific and actionable information from reviews. The variables text (reviews), review counts, star (business), star (review) were extracted as presented in the original data set and a few others were created to assist in the rating prediction and extraction of relevant information from the reviews. The variable delta star was created to represent the rating variation of the review rating in relation to the provided business rating (review star - business star). A similar categorical variable review effect with the levels positive, negative and neutral was created to indicate the effect of the review on the business rating (e.g. if review star - business star > 0 then review effect = positive). The variable sentiment was created by sentiment analysis of the review text using the function polarity from the package qdap and has a value range from -0.77720 to 0.99740 indicating how negative or positive the words in each review are (Figure 1d). Given the multinational nature of the data set, a number of reviews were written in languages other than English with the most evident ones being German (435), French (334) and Spanish (9). These reviews were automatically translated via the Microsoft Translator API using the R package translateR. The packages tm, slam, SnowballC were used for manipulating text and creating a sparce matrix of word counts for bag-of-words modelling. The package tau was used for generating word and n-grams frequency analysis. The package lda was utilized for performing rating prediction via topic modelling with supervised latent Dirichlet allocation sLDA as introduced in Blei and McAuliffe 2010 and interactive visualizations were created with the package LDAvis. Structural topic model estimation was further performed using the package stm by including relevant metadata information such as the review related variables aforementioned as introduced by Roberts et al 2015 and interactive visualization was created with the package stmBrowser. 1
  • 2. The topic modeling techniques used in this project are modified versions of the more general latent Dirichlet allocation which views each document as mixture of topics formed by latent probability of some terms being present. The topics are said to form a mixed membership of term, that is a term can appear in more than one topic with different probabilities. In the supervised algorithm version used in this work linear regression is used to predict the labeled review rating using the infered topic/terms coefficients as predictors. Results Upon inspection of the review star ratings distribution (figure 1a) we can observe an imbalance dominated by positive reviews ratings between 5 and 3 stars with much less frequent 1 and 2 stars with a mean 3.7 and median 4. This skewed distribution bias is not observed for the cumulative business ratings (figure 2b) which is approximately normal with mean 3.48 and median 3.5. The distribution of the variation of review rating in relation of the cumulative business rating is also approximately normal (mean = 0.22 and median = 0.5) with a bit more reviews bellow the cumulative business rating than above. However a sentiment analysis of the review texts indicate that the overall sentiment balance of the reviews is markedly positive. 83.56 % reviews are considered positive, 11.33 % reviews are considered negative and 5.11 % reviews are considered neutral. The discrepancy between the rating variations and sentiment score distributions highlights the complex challenge of integrating the objective metrics of star rating with the subjectivity of written review texts for the purpose of making predictions. 1 2 3 4 5 a) Review Ratings Density StarRating 0.00 0.10 0.20 0.30 1 1.5 2 2.5 3 3.5 4 4.5 5 b) Business Ratings Density StarRating 0.00 0.05 0.10 0.15 0.20 0.25 c) Rating Variation Review Star − Business Star Density −3 −2 −1 0 1 2 3 0.00.10.20.30.4 −0.5 0.0 0.5 1.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 d) Sentiment Score Distribution Sentiment Score Probability(RelativeFrequency) Figure 1. Exploratory analysis of busines/review ratings and sentiments score distributions Similarly to the original LDA algorithm, the structure of topics modeled by sLDA is mostly influenced by the number of arbitrarily selected topics (K) and the initialization prior parameter of the Dirichlet on the per-document topic distributions (alpha) and the response parameter (eta). Therefore a large combination of values for these parameters was used for producing multiple models of which the combination (K=40, alpha=1 and eta=0.1) that resulted in the lowest root mean squared error (RMSE = 0.8764) was selected for further analysis (figure 2). 1.0 1.2 1.4 1.6 1.8 0 10 20 30 40 50 trial RMSE variable K=5 K=6 K=7 K=8 K=9 K=10 K=11 K=12 K=13 K=14 K=15 Best Tuning Parameters (K = 5−15) K=15 alpha=1 eta=0.1 RMSE=0.8927 0.88 0.90 0.92 0.94 0.96 1 2 3 4 trial RMSE variable K=10 K=15 K=20 K=25 K=30 K=35 K=40 K=45 K=50 K=55 K=60 K=65 Best Tuning Parameters (K = 10−65) K=40 alpha=1 eta=0.1 RMSE=0.8764 Figure 2. Tuning the parameter of sLDA for improved rating prediction measured by RMSE. Figure 3 shows a summary result of the analysis of correlation between the most frequent words from the modeled 40 topics and the estimated review rating. Since regression is used in sLDA we get a wider range of estimated ratings than 1 to 5 stars for the combination of frequent words from each topic. As shown in 2
  • 3. the legends, the sizes of each spot represent the t-value of the regression coefficients. We can see from this analysis that topics involving friendliness, location, option variety of desert and fresh salads and restaurant grand openings are mostly associated with the very high review ratings. On the other hand the food taste quality, extended service waiting times and waiters mistakes are the major causes of customer dissatisfaction. bad didnt food tasted bland order minutes time service waiting good pretty place decent bad table ordered waitress asked didnt bit make dont meat places drive fast order mcdonalds window counter kids order dont line area place eat part small im people youre dont theyre food service stars fast quality sauce salad red fresh flavor nice inside tables side small chicken fried rice wings sauce nice good menu tea back review star place time im bbq meat pork potato sweet breakfast coffee eggs cafe morning bar beer great hour drinks night music bar late fun chinese soup beef rice pork sandwich sandwiches bread cheese subway pizza italian crust cheese pasta steak dinner meal restaurant appetizer fries burger cheese burgers bacon back wasnt didnt give good mexican tacos taco salsa burrito lunch great quick prices food wine dining room list menu sushi fish roll shrimp seafood food buffet thai dishes restaurant montreal st french la restaurant fresh options salads menu delicious hot dog make style dogs location staff friendly lot great opening staff free open grand cream ice dessert chocolate delicious vegas restaurant great strip recommend great place food family love im time day place friend ive love place awesome amazing −10 −5 0 5 Estimate Topics abs(t.value) 20 40 60 80 −10 −5 0 5 Estimate Figure 3. Review rating estimation associated with the 40 modeled topics. Figure 4 shows how the estimated ratings are distributed in relation to the review ratings actually given by the Yelp users. We can see an evident improved estimation for higher ratings as would be expected from an imbalanced data set with regards to the numbers of reviews with each star ratings. The out-of-bag estimation RMSE on the training set was of 0.8579872 and 0.8842336 on the test set with an R-squared value of 0.4428283. This is a much better estimation than that of a baseline model that would classify all reviews as the most frequent rating (4 stars) which would result in an RMSE of 1.2225789 and R-squared value of -0.0651453. Attempts to address the issue of review rating imbalance were made but with little success. A balanced training set was created by randomly selecting 1,000 reviews of each rating but this had very little to no effect on the prediction improvement of lower ratings. Another attempted solution that seemed to reduce the overlap between the estimated ratings was by balancing the train set by randomly sampling with replacement reviews with 1, 2, 3 and 5 ratings so they would have the same number of reviews as for rating 4 stars (nearly 5,000 reviews each) after the train and test set had already been split. This action helped separating better, especially the lower ratings, but caused a strong artificial bias towards the negative words in the topic modeling and therefore was not considered for further analysis. 0.00 0.25 0.50 0.75 1.00 0 1 2 3 4 5 6 predicted rating density actual review rating 1 2 3 4 5 Figure 4. Predicted rating distributions The package LDAvis was used to assist in the exploration and a representative image of a topic (desert), highly related to high ratings reviews, can be viewed on figure 5. Feel free to explore the topics and their 3
  • 4. related terms by clicking on the figure itself or on this link. The principal component analysis on the left illustrate how close a topic is from each other and by clicking on each numbered topic spot the term frequency bar plot on the right is adjusted accordingly for that topic. By hovering over the term labels of the bar plot its possible to see in which topic each term can appear since they may be part of multiple topics. The relative relevance of the terms in relation to each topic can be further tuned by changing the value of lambda with the sliding knob on the top right hand corner. Some of top predictor topics in relation to the estimate ratings (as per figure 3) are number 30 (good impressions), 36 (Las Vegas), 3 (deserts), 15 (grand openings), 33 (locations), 34 (hot dog), 39 (Montreal), 20 (Fresh salads). Some of the topics related to the worst rating estimates are 7 (waiting times), 10 (issues with the food), and 35 (fast food drive through). Figure 5 Visualization of the supervise LDA analysis created with the LDAvis package. Click on this link or on the figure above to open the analysis in a web browser for interactive visualization. Other metadata information such as sentiment scores, rating variation between review and cumulative business rating (delta.star), complete selected review documents were further explored by structural topic modeling and visualized with the help of the stmBrowser package and a representative result of a positive topic correlation with increased star rating can be viewed in figure 6. To access the interactive plot please click on the figure itself of on this link. To reproduce figure 6 please select the following options from the respective drop-down menus: X-axis = delta-star, Y-axis = Topic 7, Radius = review.star, Color = sentiment. Each plotted spot represent one sampled review document which can be clicked to read its text. By making use of the delta.star metrics one can more easily isolate individual reviews associated with each topic to better focus on what most influenced the user experience on the occasion of writing the review. It is particularly informative to focus on the reviews which rating most vary from the cumulative business rating. We can verify and confirm the relevance of the topics within the review texts, the themes most associated to the reduction or improvement of a business rating. Figure 6 shows a very representative selected review 4
  • 5. commenting on the quality of the food and the friendly and efficient service which had been all identified by sLDA topic modeling as strong predictors of higher review ratings. Figure 6 Visualization of the Structural Topic Model created with the stmBrowser package. Click on this link or on the figure above to open the analysis in a web browser for interactive visualization. Discussion The difference between the success and the failure of a business or a good or bad customer experience can be down to details that simple star ratings cannot inform. The availability of user reviews provides the means of identifying what the are the main reasons for the customer satisfaction. With the ever growing numbers of reviews an automated system to analyse them and isolate actionable information becomes very important. We have seen that topic modeling techniques can be very effective on this task and an implementation of such analysis on review sites such as Yelp could further increase the usefulness of its rich information data bases both for business owners/manages as well as their customers. We have been able to learn that in catering business an attentive and friendly service is a key indicator of a good customer satisfaction in the catering business as well as having a good selection of tasty food, fresh salads and vegetables and deserts, especially ice cream. A similar approach could be applied for other business classes to be able to offer equivalent insights. One possible limitation of the analysis presented in this study was the imbalance between review ratings with much less reviews with 1 and 2 stars than above. It might be possible to further improve such analysis by using a more balanced data set or by using some terms weighting scheme to better calibrate the rating predictions. 5