5. User’s Opinions in Hotel
Identify Potential Hotel
Predict what ASPECTS customers like
Sales and Margin
Sentiment Analysis
6. Some Limitations of machines
Unable to read like a human
Cannot detect sarcasm
Expression of sentiments in different topic and domain
Polarity analysis
Facts Vs Opinion
7. Some machine limitation examples
“The service is as good as none”. Negation not obvious to
machine
“Swimming pool is big enough to swim with comfort” ,
“There is a big crowd at the counter complaining”. Polarity
might change with context.
“The room is warmer than the lobby”. Comparisons are
hard to classify
11. Cleaning The “Dirty” Reviews
Frequent problem : Data inconsistencies
Duplicate data
Spelling Errors != Trim from data
Foreign accent and characters
Singular / Plural conversion
Punctuations removal / replacement
Noise and incomplete data
Naming convention misused, same name but different meaning
13. Data Preprocessing
Polarity tagging using sentiment lexicon
Occurrence
HIGH
Sentiment Lexicon
Tag
The Word
+VE
BEST
Part of Speech Tag
ADJ
14. Findings
Part of Speech Tagging (POS) using Brill Tagger - NO
PROBLEM
-95% accuracy of POS tagging words after data cleaning
15. Findings
Polarity tagging using sentiment lexicon – BIG PROBLEM
-40% sentiment words not found in sentiment lexicon
-10% sentiment words with a positive or negative polarity
found are in the neutral section of sentiment lexicon
16. Problems
Sentiment lexicon not comprehensive
Domain Independent Sentiment Words
Domain Dependent Sentiment Words
20. Analysis - Bayesian
To determine polarity of sentiments
P(X | Y) = P(X) P(Y | X) / P(Y)
Probability that a sentiments is positive or negative, given
it's contents
P(sentiment | sentence) = P(sentiment)P(sentence |
sentiment) / P(sentence)
21. Validation
• Precision = N (agree & found) / N (found)
• High precision means most of the correct sentiment
words are found by the system
• Recall = N (agree & found) / N (agree)
• High recall means most of found sentiment words are
correctly labeled by the system
23. Validation Results
It is found that out of the 350 aspect-unlabelled sentiment
word pairs,
294 are founded by the methods. Thus, the precision is
about 84%.
The recall : 276 words are corrected labelled by the
system, which is about 78%
Process of exploration and analysisBy automatic / semi automatic meansWith little or no human interactionsTo discover meaningful patterns and rulesExponential growth of user’s opinionsLimitations of human analysisAccuracy of human analysisMachines can be trained to take over human analysis with advanced computer technology and it is done with LOW COST
Increase in social media and web user Increase in valuable opinion oriented data in Hotel due to web expansionIdentify potential hotel to stay by looking at the aspectsIdentify best prospects (ASPECTS), and retain customersPredict what ASPECTS customers like and promote accordinglyLearn parameters influencing trends in sales and margins Identification of opinions for customers