An overview of text mining and sentiment analysis for Decision Support System
1. An Overview of Text Mining
and Sentiment Analysis
- for Decision Support System
Gan Keng Hoon
School of Computer Sciences
Universiti Sains Malaysia
12 May 2015
2. Outlines
1. Decision Support Systems
2. Overview of Text Mining &
Sentiment Analysis
Techniques in Text Mining
Techniques in Sentiment Analysis
3. Applications and Challenges ahead.
3. Decision Support System
As an end user,
every day, we
need to make
decision ..
What to
eat for
lunch? What
subject to
choose?
Which
hotel to
stay?
4. Decision Support System
every
hour/minute/sec
ond, business
provider needs to
make crucial
decision ..
Source:
http://attunelive.com/blog/how-a-
screening-prompted-by-clinical-
decision-support-system-helped-
save-a-patients-life/
As a business provider,
5. Decision Support System
Source: http://www.informationbuilders.com/decision-support-systems-dss
Decision
maker in a
company
checks the
sales
before
decide
which
product to
promote ..
6. Decision Support System
A hotelier wants
to know why ..
If location is
good, how can I
take advantage ..
7. Why are they/we using
Decision Support System
Business provider
Improve customer
experience
Improve products and
services
More returns …
End user
Better purchasing choice
Better value
Happier ..
8. Sample Decision Support System
Looks good, 155
person says Very
Good…
Not bad,
customers rated 4
* and above for
location,
cleanliness ..
http://www.tripadvisor.com.my
10. Many Questions …
Mr X: How is the condition of
Wifi?
Miss Y: Is the toilet really
dirty?
Family Z: Any convenience
store nearby?
Manager of Hotel: I want to
know all the complaints about
toilet!
11. Harnessing Web and Social Texts
Very influential.
Latest and most updated.
The truth (but sometimes not).
Free (most of the time).
Source: Hotel Review Sites: What’s the ‘Truth’ About
Fairness? http://www.hospitalitynet.org/news/4056065.html
12. However. With No Automation
Methods
It is impossible to scan through
each of them.
Important details could be missed.
It is hard to visualize or
summarize all the texts via
manual effort.
It is impossible to digest new
reviews generated each day.
*There are 344 reviews (as of 10/5/2015)
for the mentioned hotel.
13. Overview of Text Mining &
Sentiment Analysis
Is the toilet really dirty?
Text Mining
- Let’s mine some texts
to answer the question.
1. in the bathroom, used
toiletries (shampoo &
soap) were not thrown
and were left in the
shower area
2. dirty sink, and very very
dirty shower glass wall.
3. the shower, it's clean...
Sentiment Analysis
- Let’s find some
sentiments about these
texts.
14. Techniques in Text Mining
What is text mining?
To exploit information contained in
textual documents in various ways.
Natural Language
Processing
Information
Retrieval
15. Information Retrieval
- Find relevant sentences.
Document Collection Processing
1. Texts Preprocessing
Sentence Tokenizer
Stop Word Removal
2. Feature Selection
Bags of Words Approach
Term Frequency Inversed Document
Frequency
3. Inverted Index Creation
Term – Doc Posting
16. Information Retrieval
- Find relevant sentences.
Query Processing
1. Intention as Query
2. Query Preprocessing
Tokenization
Expansion using Synonym
3. Query-Doc Matching
Ranking
17. Information Retrieval
- Find relevant sentences.
Simple and fast
Quickly retrieve all relevant sentences or
documents given some keywords.
But losses detail like sentence structure,
word order.
Context is not captured.
E.g. a term “cold” may be referring to
air cond is cold or the receptionist is
cold.
19. Natural Language Processing
Difficult because we assume the
hearer has some background
knowledge.
Not only surface analysis of text is
required.
Need common sense analysis.
E.g. I can write words on that dusty
table top.
20. Techniques in Sentiment Analysis
Sentence Extractor
Tokenization
Boundary Detection
Sentence
Selector
Entity
Dictionary
Sentence Categorization
Sentiment Dictionary
Sentiment Extraction
Pre-processing Entity Detection Post-processing
MySQL Database
Browser
Entity Extraction Prediction Rating
Part of Summarev Framework for Entity’s Text Processing and Sentiment Analysis
http://ir.cs.usm.my/siir/project_summarev.php
21. Entity Detection (or Aspect
Selection)
Texts
1. in the bathroom, used
toiletries (shampoo &
soap) were not thrown
and were left in the
shower area
2. dirty sink, and very very
dirty shower glass wall.
3. the shower, it's clean...
…
Aspect
1. Bathroom
2. Toiletries
3. Shower
area
4. Sink
5. Shower
6. Hair dryer
7. Wifi
8. Bed
...
- POS
- Tagging
- Noun Phrase
Selection
- Term
Weighting
22. Sentiment Extraction
Texts
1. in the bathroom, used
toiletries (shampoo &
soap) were not thrown
and were left in the
shower area
2. dirty sink, and very very
dirty shower glass wall.
3. the shower, it's clean...
…
Aspect -
Sentiment
1. Sink – dirty
2. Shower – clean
3. Shower glass
wall - dirty
- POS
- Tagging
- Adjective
Phrase
Selection
23. Sentiment Scoring
Texts
1. in the bathroom, used
toiletries (shampoo &
soap) were not thrown
and were left in the
shower area
2. dirty sink, and very very
dirty shower glass wall.
3. the shower, it's clean...
…
Aspect - Sentiment
1. Sink – dirty (N:0.75)
2. Shower – clean (P:0.5)
3. Shower glass wall – dirty
(N:0.75)
Source: sentiwordnet.isti.cnr.it
25. Challenges Ahead
How to detect a more in depth sentiment.
Differentiate the spam and the credible.
Language problem
usage of mixed languages.
Usage of non standard languages.
26. Challenges Ahead
Last but not least,
The challenge is to put the research
and solution into real use.