Powerful Google developer tools for immediate impact! (2023-24 C)
Rule based approach to sentiment analysis at romip’11 slides
1. Rule-based approach to
sentiment analysis at ROMIP’11
Dmitry Kan
dmitry.kan@gmail.com
Twitter: @DmitryKan
AlphaSense Inc
Dialogue, 2012
2. Outline
• Problem definition
• Base level for accuracy
• Towards shallow parsing of input text
• Rule-based algorithm
• Object-oriented sentiment detection
• Performance
• Open problems
3. Problem definition
• What is sentiment for people:
– Mood of the author? Mood of the reader? Personal
attitude?
– Opinion about the target object (product etc)?
– Something else, defined by an annotator’s boss?
• What is sentiment for a computer:
– General polarity background
– General opinion mining
– Object (product) oriented opinion mining
– Polarity strength detection
4. Base level for accuracy
• cross-annotator agreement gives 80% [1]
• Real performance of the system is the one it
shows when used on un-annotated data
• Real example: ”CEO of the company turned
50” (was marked as positive -> why?)
• Some machine learning (ML) methods can
give 90% and more on test data
• Hard (unless impossible) to do object oriented
sentiment detection with ML
5. Towards shallow parsing of input text
Opposite conjunction
negation totalSentimentScore =
Subclause 1 Subclause 2 totalPositiveScore – totalNegativeScore -
½ * sentimentCount, if opp. conj found
0, if no opp conj found
Majority likes this, but I do not like this
NOT(polarity) = opposite_polarity
Opposite conjunction Object: iPhone Sentiment: positive
negation
Subclause 1 Subclause 2 Object: GalaxyS Sentiment: negative
Object: - Sentiment: neutral
(mixed)
I liked new iPhone, but GalaxyS is not easy to use
iPhone GalaxyS
6. Rule based algorithm flow on example
sentence
Majority likes this, but I do not like this.
Phase1 (negations): posScore = 0 – negation weight = -2
Phase2 (individual words):
Word ”likes”: posScore = -2 + 1 = -1
Word ”not”: negScore = 0 + 1 = 1
Word ”like”: posScore = -1 + 1 = 0
Phase3 (oppositeConjuctions): sentimentCount = 3
totalScore = posScore – negScore – ½ * sentimentCount =
0 – 1 – 3/2 = -5/2
Sentiment: Negative
7. Rule-based algorithm #1/3
• Suits micro-posts (twitter) or individual sentences
• Polarity dictionaries for Russian (1739 positive
and 2338 negative words)
• All words are lemmatized (A. Zaliznyak [2])
• Set of negations of Russian, that tend to
noticeably affect on polarity of connected
word(s): не плохо (not bad); also gap between
words are processed correctly, for example: Я не
сильно люблю это (I do not strongly like this)
8. Rule-based algorithm #2/3
• Set of opposite conjunctions of Russian, which
affect on polarity of sentence’s subclauses in
relation to each other: Большинству это всё
нравится, а мне нет (Majority likes this, but I do
not)
• totalScore = positiveScore – negativeScore -
oppositeConjuctionSentimentScore, where
oppositeConjuctionSentimentScore removes the
polarity mass from the sentence with a conjunction
and is: sentimentWordCount / 2
9. Rule-based algorithm #3/3
• Object oriented sentiment detection
• First each sentence of the input text is examined for the
presense of the keywords of the object
• If the sentence was found, it is checked for the presence of
conjuctions or other boundaries of subclauses (like
punctuation)
• If there is no boundary found, the sentiment of the entire
found sentence is detected according to the algorithm
described above
• If there is a boundary, the subclause containing the
keywords is identified and sentiment of the subclause is
detected according to the algorithm described above
10. Performance
• Test data: text reviews (many sentences)
• Accuracy of 64%
• 92% precision and 69% recall for positive class
when two annotators have agreed
• Much lower precision and recall for negative class
(not enough dictionary entries, sentiment for text
level to be defined)
• Worked slightly better for 2-way classifier
ensemble with Multinomial Naive Bayes [3]
11. Open problems
• Multi-sentence sentiment detection
• Domain adaptation: mining polarity words [4]
• Adding more rules for shallow parsing
• Trying out formal syntactic parsing
• Automatic detection of product names
(Named Entity Recognition)
13. Bibliography
• [1] Bermingham, A. and Smeaton, A.F. (2009).
A study of interannotator agreement for
opinion retrieval. In SIGIR, 784-785.
• [2] Andrey Zaliznyak. Grammaticheskij slovar'
russkogo jazyka. Moskva, 1977, (further
editions are 1980, 1987, 2003).
• [3] Poroshin V. (2012). Proof of concept
statistical sentiment classification at ROMIP
2011. In Dialog.
14. Bibliography
• [4] Chetverkin I., Loukachevitch N. (2010).
Automatic Extraction of Domain-specific
Opinion Words. Dialogue.
• [5] Minqing Hu, Bing Liu. (2004). Mining and
summarizing customer reviews. In Proc. of the
tenth ACM SIGKDD international conference
on Knowledge discovery and data mining.