Fypca5

Mining User’s Opinions
in Hotel
TEY JUN HONG U095074X

Content
 Background

 Formulating the problem

 Data Mining Process

 Techniques

 Analysis

What is Data Mining?

 Extraction of patterns

 Automatic Means

 Little human Interactions

User’s Opinions in Hotel

 Identify Potential Hotel

 Predict what ASPECTS customers like

 Sales and Margin

Sentiment Analysis

Some Limitations of machines

 Unable to read like a human

 Cannot detect sarcasm

 Expression of sentiments in different topic and domain

 Polarity analysis

 Facts Vs Opinion

Some machine limitation examples

 “The service is as good as none”. Negation not obvious to
machine

 “Swimming pool is big enough to swim with comfort” ,
“There is a big crowd at the counter complaining”. Polarity
might change with context.

 “The room is warmer than the lobby”. Comparisons are
hard to classify

Sentiment Analysis

 Prediction of sentence polarity

 Classification of polarity for sentiment lexicon

 Detection of relations

Cleaning The “Dirty” Reviews
 Frequent problem : Data inconsistencies

 Duplicate data

 Spelling Errors != Trim from data

 Foreign accent and characters

 Singular / Plural conversion

 Punctuations removal / replacement

 Noise and incomplete data

 Naming convention misused, same name but different meaning

Data Preprocessing

 Part Of Speech Tags

Data Preprocessing

 Polarity tagging using sentiment lexicon

Occurrence
HIGH
Sentiment Lexicon
Tag
The Word
+VE
BEST
Part of Speech Tag
ADJ

Findings

 Part of Speech Tagging (POS) using Brill Tagger - NO
PROBLEM

-95% accuracy of POS tagging words after data cleaning

Findings

 Polarity tagging using sentiment lexicon – BIG PROBLEM

-40% sentiment words not found in sentiment lexicon

-10% sentiment words with a positive or negative polarity
found are in the neutral section of sentiment lexicon

Problems

 Sentiment lexicon not comprehensive

 Domain Independent Sentiment Words

 Domain Dependent Sentiment Words

Solutions

 Rule Based Mining

 Relation Based Mining

Analysis - Bayesian
 To determine polarity of sentiments

P(X | Y) = P(X) P(Y | X) / P(Y)

 Probability that a sentiments is positive or negative, given
it's contents
 P(sentiment | sentence) = P(sentiment)P(sentence |
sentiment) / P(sentence)

Validation

• Precision = N (agree & found) / N (found)

• High precision means most of the correct sentiment
words are found by the system

• Recall = N (agree & found) / N (agree)

• High recall means most of found sentiment words are
correctly labeled by the system

Validation Results

 It is found that out of the 350 aspect-unlabelled sentiment
word pairs,

 294 are founded by the methods. Thus, the precision is
about 84%.

 The recall : 276 words are corrected labelled by the
system, which is about 78%

Application

 Reviews Rating

 Aspect Rating

 Summary of reviews

Fypca5

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (8)

Similaire à Fypca5

Similaire à Fypca5 (16)

Fypca5

Notes de l'éditeur