SlideShare une entreprise Scribd logo
1  sur  24
Mining User’s Opinions
               in Hotel
              TEY JUN HONG U095074X
Content
      Background

 Formulating the problem

   Data Mining Process

       Techniques

        Analysis
What is Data Mining?

 Extraction of patterns

 Automatic Means

 Little human Interactions
The Web



http://www
User’s Opinions in Hotel

 Identify Potential Hotel

 Predict what ASPECTS customers like

 Sales and Margin


Sentiment Analysis
Some Limitations of machines

 Unable to read like a human

 Cannot detect sarcasm

 Expression of sentiments in different topic and domain

 Polarity analysis

 Facts Vs Opinion
Some machine limitation examples

 “The service is as good as none”. Negation not obvious to
  machine

 “Swimming pool is big enough to swim with comfort” ,
  “There is a big crowd at the counter complaining”. Polarity
  might change with context.

 “The room is warmer than the lobby”. Comparisons are
  hard to classify
Project
Sentiment Analysis

 Prediction of sentence polarity

 Classification of polarity for sentiment lexicon

 Detection of relations
Data Mining Process
Cleaning The “Dirty” Reviews
 Frequent problem : Data inconsistencies

 Duplicate data

 Spelling Errors != Trim from data

 Foreign accent and characters

 Singular / Plural conversion

 Punctuations removal / replacement

 Noise and incomplete data

 Naming convention misused, same name but different meaning
Data Preprocessing

 Part Of Speech Tags
Data Preprocessing

 Polarity tagging using sentiment lexicon

            Occurrence
              HIGH
                                 Sentiment Lexicon
                                       Tag
             The Word
                                        +VE
             BEST
                                 Part of Speech Tag
                                        ADJ
Findings

 Part of Speech Tagging (POS) using Brill Tagger - NO
  PROBLEM

  -95% accuracy of POS tagging words after data cleaning
Findings

 Polarity tagging using sentiment lexicon – BIG PROBLEM

   -40% sentiment words not found in sentiment lexicon

 -10% sentiment words with a positive or negative polarity
    found are in the neutral section of sentiment lexicon
Problems

 Sentiment lexicon not comprehensive

 Domain Independent Sentiment Words

 Domain Dependent Sentiment Words
Solutions

 Rule Based Mining

 Relation Based Mining
Rule Based Mining
Relation Based Mining
Analysis - Bayesian
 To determine polarity of sentiments



                  P(X | Y) = P(X) P(Y | X) / P(Y)


 Probability that a sentiments is positive or negative, given
  it's contents
 P(sentiment | sentence) = P(sentiment)P(sentence |
  sentiment) / P(sentence)
Validation

•   Precision = N (agree & found) / N (found)

•   High precision means most of the correct sentiment
    words are found by the system

•   Recall = N (agree & found) / N (agree)

•   High recall means most of found sentiment words are
    correctly labeled by the system
Validation Results
Validation Results

 It is found that out of the 350 aspect-unlabelled sentiment
  word pairs,

 294 are founded by the methods. Thus, the precision is
  about 84%.

 The recall : 276 words are corrected labelled by the
  system, which is about 78%
Application

 Reviews Rating

 Aspect Rating

 Summary of reviews

Contenu connexe

En vedette (8)

Fypca4
Fypca4Fypca4
Fypca4
 
Fypca4
Fypca4Fypca4
Fypca4
 
Teziv 1.3.1
Teziv 1.3.1Teziv 1.3.1
Teziv 1.3.1
 
CRM
CRMCRM
CRM
 
The way we will complain (sourced now)
The way we will complain (sourced now)The way we will complain (sourced now)
The way we will complain (sourced now)
 
D:\Agsb Subjects\Markma\My 10 Concepts
D:\Agsb Subjects\Markma\My 10 ConceptsD:\Agsb Subjects\Markma\My 10 Concepts
D:\Agsb Subjects\Markma\My 10 Concepts
 
Big Data has Big Implications for Customer Experience Management
Big Data has Big Implications for Customer Experience ManagementBig Data has Big Implications for Customer Experience Management
Big Data has Big Implications for Customer Experience Management
 
Hotel inspection data set analysis copy
Hotel inspection data set analysis   copyHotel inspection data set analysis   copy
Hotel inspection data set analysis copy
 

Similaire à Fypca5

Experiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter ZadroznyExperiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter Zadrozny
padatascience
 

Similaire à Fypca5 (16)

RCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment ClassificationRCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment Classification
 
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerRCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMiner
 
Fyp ca2
Fyp ca2Fyp ca2
Fyp ca2
 
Seminar1
Seminar1Seminar1
Seminar1
 
Zizka synasc 2012
Zizka synasc 2012Zizka synasc 2012
Zizka synasc 2012
 
Experiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter ZadroznyExperiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter Zadrozny
 
Additional2
Additional2Additional2
Additional2
 
Deep Machine Reading
Deep Machine ReadingDeep Machine Reading
Deep Machine Reading
 
Sentiment+Analysis.ppt
Sentiment+Analysis.pptSentiment+Analysis.ppt
Sentiment+Analysis.ppt
 
DETECTING OXYMORON IN A SINGLE STATEMENT
DETECTING OXYMORON IN A SINGLE STATEMENTDETECTING OXYMORON IN A SINGLE STATEMENT
DETECTING OXYMORON IN A SINGLE STATEMENT
 
Taxonomies in Search
Taxonomies in SearchTaxonomies in Search
Taxonomies in Search
 
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
 
Additional1
Additional1Additional1
Additional1
 
Lac presentation
Lac presentationLac presentation
Lac presentation
 
Zouaq wole2013
Zouaq wole2013Zouaq wole2013
Zouaq wole2013
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
 

Fypca5

  • 1. Mining User’s Opinions in Hotel TEY JUN HONG U095074X
  • 2. Content  Background  Formulating the problem  Data Mining Process  Techniques  Analysis
  • 3. What is Data Mining?  Extraction of patterns  Automatic Means  Little human Interactions
  • 5. User’s Opinions in Hotel  Identify Potential Hotel  Predict what ASPECTS customers like  Sales and Margin Sentiment Analysis
  • 6. Some Limitations of machines  Unable to read like a human  Cannot detect sarcasm  Expression of sentiments in different topic and domain  Polarity analysis  Facts Vs Opinion
  • 7. Some machine limitation examples  “The service is as good as none”. Negation not obvious to machine  “Swimming pool is big enough to swim with comfort” , “There is a big crowd at the counter complaining”. Polarity might change with context.  “The room is warmer than the lobby”. Comparisons are hard to classify
  • 9. Sentiment Analysis  Prediction of sentence polarity  Classification of polarity for sentiment lexicon  Detection of relations
  • 11. Cleaning The “Dirty” Reviews  Frequent problem : Data inconsistencies  Duplicate data  Spelling Errors != Trim from data  Foreign accent and characters  Singular / Plural conversion  Punctuations removal / replacement  Noise and incomplete data  Naming convention misused, same name but different meaning
  • 12. Data Preprocessing  Part Of Speech Tags
  • 13. Data Preprocessing  Polarity tagging using sentiment lexicon Occurrence HIGH Sentiment Lexicon Tag The Word +VE BEST Part of Speech Tag ADJ
  • 14. Findings  Part of Speech Tagging (POS) using Brill Tagger - NO PROBLEM -95% accuracy of POS tagging words after data cleaning
  • 15. Findings  Polarity tagging using sentiment lexicon – BIG PROBLEM -40% sentiment words not found in sentiment lexicon -10% sentiment words with a positive or negative polarity found are in the neutral section of sentiment lexicon
  • 16. Problems  Sentiment lexicon not comprehensive  Domain Independent Sentiment Words  Domain Dependent Sentiment Words
  • 17. Solutions  Rule Based Mining  Relation Based Mining
  • 20. Analysis - Bayesian  To determine polarity of sentiments P(X | Y) = P(X) P(Y | X) / P(Y)  Probability that a sentiments is positive or negative, given it's contents  P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) / P(sentence)
  • 21. Validation • Precision = N (agree & found) / N (found) • High precision means most of the correct sentiment words are found by the system • Recall = N (agree & found) / N (agree) • High recall means most of found sentiment words are correctly labeled by the system
  • 23. Validation Results  It is found that out of the 350 aspect-unlabelled sentiment word pairs,  294 are founded by the methods. Thus, the precision is about 84%.  The recall : 276 words are corrected labelled by the system, which is about 78%
  • 24. Application  Reviews Rating  Aspect Rating  Summary of reviews

Notes de l'éditeur

  1. Process of exploration and analysisBy automatic / semi automatic meansWith little or no human interactionsTo discover meaningful patterns and rulesExponential growth of user’s opinionsLimitations of human analysisAccuracy of human analysisMachines can be trained to take over human analysis with advanced computer technology and it is done with LOW COST
  2. Increase in social media and web user Increase in valuable opinion oriented data in Hotel due to web expansionIdentify potential hotel to stay by looking at the aspectsIdentify best prospects (ASPECTS), and retain customersPredict what ASPECTS customers like and promote accordinglyLearn parameters influencing trends in sales and margins Identification of opinions for customers