These slides present a text segmentation system based on the sentiments expressed in the text. The system takes as input plain text (product review for instance) and uses two different resources for tagging the sentiment words: a sentiment words dictionary and SentiWordNet. Once the sentiment words are identified, the initial text is annotated with segmentation markers when polarity shifts. The system also outputs the counts of positive and negative sentiment words found in text and optionally annotates them with their valence
Pests of mustard_Identification_Management_Dr.UPR.pdf
Sentiment based text segmentation
1. Autor Conducător științific
Universitatea
Politehnica
București
Facultatea de
Automatică și
Calculatoare
Catedra de
Calculatoare
Sentiment-Based Text Segmentation
• Costin-Gabriel Chiru • Ştefan Trăuşan-Matu
Costin-Gabriel CHIRU
Politehnica University of
Bucharest
E-mail:
costin.chiru@cs.pub.ro
Asmelash Teka HADGU
Erasmus Mundus master
Politehnica University of
Bucharest
asmelashtk@gmail.com
2. Content
• Introduction
• Literature Review
• Proposed Solution
• System Architecture
• Results
• Conclusions
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
3. Introduction
• Goal: Help users decide what products to buy
• How?
– Using social knowledge available for those
products.
– And NLP (Text Mining) techniques for detecting
polarity and summarizing opinions regarding
those products or different aspects of those
products.
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
4. Other Approaches
• Surveys on opinion mining & sentiment analysis:
– Sentiment Analysis and Subjectivity – Liu, 2010
– Opinion mining and sentiment analysis – Pang and Lee, 2008
• Opinion mining / Sentiment analysis - used to identify the
sentiment orientation of the opinions in a document
• Most application use:
– Ontologies/thesaurus: SentiWordNet, General Inquirer,
– Different annotated corpora,
– Linguistic heuristics or a pre-selected set of seed words,
– Search engines results (Turney, 2002).
to learn specific features that can be used to classify other texts.
• Text segmentation - intensely treated, starting with Allan et. al., 1998
– BUT not text segmentation according to sentiments.
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
5. Proposed Solution (I)
• Our solution for sentiments-based text
segmentation in the context of product
reviews :
– The identification of product features
– The extraction of opinions associated
with these features;
– Sentiment polarity classification
Sentiment-Based Text Segmentation
Identification and Extraction of
Opinion Words
Identification and Extraction of
Opinion Words
POS
Tagging
POS
Tagging HeuristicsHeuristics
Product Features Opinion words
Sentiment polarity ClassificationSentiment polarity Classification
Sentiment
Lexicon
Sentiment
Lexicon
Assign
Polarity
Assign
Polarity
Segmentation and VisualizationSegmentation and Visualization
Text
Segments
Text
Segments VisualizationVisualization
02/26/19 ICSCS 2013
6. Proposed Solution (II)
• The identification of product
features
– Identify the nouns and noun
phrases from the reviews using
POS tagging possible product
features
– Use TFIDF technique to most
frequent ones probable
product features
– Use WordNet to exploit the
relationships between synsets
• We have built the word-cloud for
the most important terms
extracted from reviews for digital
cameras
(http://www.photographyreview.
com).
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
7. Proposed Solution (III)
• The extraction of opinions associated with the extracted
features
– We extracted the adjectives that appear close to the words
depicting the product features
– Deeper analysis can use parse information and manually or semi-
automatically developed rules or sentiment-relevant lexicons.
• Sentiment polarity classification
– Once the pairs product features – reviewers’ opinion are known,
we can evaluate the polarity of the sentiments expressed by these
opinions
– Once each opinion is tagged, we use the majority values (positive
or negative) to decide whether that feature has a positive impact
on the reviewers or a negative one
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
8. System Architecture
• 4 steps:
– POS Tagging adjectives / BOW (bag-of-words) + dictionary of
sentiment words
– Opinion words extraction
– Sentiments assessment SentiWordNet / lexicon designed by Hu
and Liu, 2004 enriched with domain specific words (using TFIDF,
POS tagging and manual annotation)
– Segmentation put segmentation markers (||) when the polarity
shifts
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
Get Text
(reviews)
Get Text
(reviews)
POS
Tagging
POS
Tagging
BOW
approach
BOW
approach
Identify the
Sentiment
Words
Identify the
Sentiment
Words
Assign
Polarity
Assign
Polarity
Text
segmentation
Text
segmentation
Sentiment
Words
Sentiment
Words
9. Results
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
• Test text: This is a great camera. Though the pictures can get a bit
blurred at times, it's awesome for the price.
• BOW method results (three sentiment words: great, blurred and
awesome, 2 of them being positive, while the third one being
negative):
– This is a great camera. Though the pictures can get a bit || blurred || at
times, it's awesome for the price.
• POS tagging method results:
– POS tagging: This/DT is/VBZ a/DT great/JJ camera/NN ./. Though/IN
the/DT pictures/NNS can/MD get/VB a/DT bit/NN blurred/VBD at/IN
times/NNS ,/, it/PRP 's/VBZ awesome/JJ for/IN the/DT price/NN ./.
– The adjectives are identified (great and awesome) and their valences are
evaluated according to SentiWordNet: “great” is considered to be
objective and “awesome” is considered to be positive the whole
phrase is categorized as being positive because no polarity shifts have
been determined.
10. Improving Results (I)
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
• Improving the sentiment words recognition:
– POS tagging method: use the average valence of a
given word instead of simply considering its first
sense still not powerful enough
– Combine the two methods by building an extended
list comprising of the words from the sentiment
words dictionary, along with the adjectives from the
SentiWordNet. if still not powerful enough
– Enhance this list with the words having other POS
than the ones already considered (for example
adverbs and verbs).
11. • Improving segmentation:
– Use Stanford Parser to place the boundaries in the natural places and not where
the shifts are detected go up from the sentiments words until reaching the first
conflict and classify each sub-tree according to the expressed sentiment.
Improving Results
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
S
SBAR(IN Though) , NP
(NP (DT
the) (NNS
pictures))
(VP (MD can) (VP (VB
get) (SBAR (S (NP (DT
a) (NN bit)) (VP (VBD
blurred) (PP (IN at) (NP
(NNS times))))))))
PRP it
VP
(VBZ 's) (ADJP (JJ
awesome) (PP (IN
for) (NP (DT the)
(NN price)))))
.
Root
S
The final segmentation would be:
This is a great camera. || Though the pictures can get a bit
blurred at times ||, it's awesome for the price.
(ROOT
(S
(NP (DT This))
(VP (VBZ is)
(NP (DT a) (JJ great) (NN
camera)))
(. .)))
(ROOT
(S
(SBAR (IN Though)
(S
(NP (DT the) (NNS pictures))
(VP (MD can)
(VP (VB get)
(SBAR
(S
(NP (DT a) (NN bit))
(VP (VBD blurred)
(PP (IN at)
(NP (NNS times))))))))))
(, ,)
(NP (PRP it))
(VP (VBZ 's)
(ADJP (JJ awesome)
(PP (IN for)
(NP (DT the) (NN price)))))
(. .)))
12. Conclusions
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
• We implemented two approaches for sentiment-based
text segmentation:
– One based on the POS tagging and some heuristics for
identifying the sentiment words’ valence using
SentiWordNet.
– One based on the bag-of-words approach and a sentiment
words dictionary provided by Hu and Liu.
• Since the results were not satisfactory, we thought of
methods of improving our results:
– Combining the two methods, or
– Using different existing resources (such as ANEW), or
– Including the words with other POS tags in our analysis, and
– Using phrases parse trees for better segmenting the text.