Rule based approach to sentiment analysis at romip’11 slides

Rule-based approach to
sentiment analysis at ROMIP’11
Dmitry Kan
dmitry.kan@gmail.com
Twitter: @DmitryKan
AlphaSense Inc
Dialogue, 2012

Outline
• Problem definition
• Base level for accuracy
• Towards shallow parsing of input text
• Rule-based algorithm
• Object-oriented sentiment detection
• Performance
• Open problems

Problem definition
• What is sentiment for people:
– Mood of the author? Mood of the reader? Personal
attitude?
– Opinion about the target object (product etc)?
– Something else, defined by an annotator’s boss?
• What is sentiment for a computer:
– General polarity background
– General opinion mining
– Object (product) oriented opinion mining
– Polarity strength detection

Base level for accuracy

• cross-annotator agreement gives 80% [1]
• Real performance of the system is the one it
shows when used on un-annotated data
• Real example: ”CEO of the company turned
50” (was marked as positive -> why?)
• Some machine learning (ML) methods can
give 90% and more on test data
• Hard (unless impossible) to do object oriented
sentiment detection with ML

Towards shallow parsing of input text
Opposite conjunction
negation totalSentimentScore =
Subclause 1 Subclause 2 totalPositiveScore – totalNegativeScore -
½ * sentimentCount, if opp. conj found

0, if no opp conj found
Majority likes this, but I do not like this
NOT(polarity) = opposite_polarity

Opposite conjunction Object: iPhone Sentiment: positive
negation
Subclause 1 Subclause 2 Object: GalaxyS Sentiment: negative

Object: - Sentiment: neutral
(mixed)
I liked new iPhone, but GalaxyS is not easy to use
iPhone GalaxyS

Rule based algorithm flow on example
sentence
Majority likes this, but I do not like this.
Phase1 (negations): posScore = 0 – negation weight = -2
Phase2 (individual words):
Word ”likes”: posScore = -2 + 1 = -1
Word ”not”: negScore = 0 + 1 = 1
Word ”like”: posScore = -1 + 1 = 0
Phase3 (oppositeConjuctions): sentimentCount = 3

totalScore = posScore – negScore – ½ * sentimentCount =
0 – 1 – 3/2 = -5/2

Sentiment: Negative

Rule-based algorithm #1/3
• Suits micro-posts (twitter) or individual sentences
• Polarity dictionaries for Russian (1739 positive
and 2338 negative words)
• All words are lemmatized (A. Zaliznyak [2])
• Set of negations of Russian, that tend to
noticeably affect on polarity of connected
word(s): не плохо (not bad); also gap between
words are processed correctly, for example: Я не
сильно люблю это (I do not strongly like this)


• Set of opposite conjunctions of Russian, which
affect on polarity of sentence’s subclauses in
relation to each other: Большинству это всё
нравится, а мне нет (Majority likes this, but I do
not)
• totalScore = positiveScore – negativeScore -
oppositeConjuctionSentimentScore, where
oppositeConjuctionSentimentScore removes the
polarity mass from the sentence with a conjunction
and is: sentimentWordCount / 2

• Object oriented sentiment detection

• First each sentence of the input text is examined for the
presense of the keywords of the object
• If the sentence was found, it is checked for the presence of
conjuctions or other boundaries of subclauses (like
punctuation)
• If there is no boundary found, the sentiment of the entire
found sentence is detected according to the algorithm
described above
• If there is a boundary, the subclause containing the
keywords is identified and sentiment of the subclause is
detected according to the algorithm described above

Performance
• Test data: text reviews (many sentences)
• Accuracy of 64%
• 92% precision and 69% recall for positive class
when two annotators have agreed
• Much lower precision and recall for negative class
(not enough dictionary entries, sentiment for text
level to be defined)
• Worked slightly better for 2-way classifier
ensemble with Multinomial Naive Bayes [3]

Open problems
• Multi-sentence sentiment detection
• Domain adaptation: mining polarity words [4]
• Adding more rules for shallow parsing
• Trying out formal syntactic parsing
• Automatic detection of product names
(Named Entity Recognition)

Bibliography
• [1] Bermingham, A. and Smeaton, A.F. (2009).
A study of interannotator agreement for
opinion retrieval. In SIGIR, 784-785.
• [2] Andrey Zaliznyak. Grammaticheskij slovar'
russkogo jazyka. Moskva, 1977, (further
editions are 1980, 1987, 2003).
• [3] Poroshin V. (2012). Proof of concept
statistical sentiment classification at ROMIP
2011. In Dialog.

Bibliography
• [4] Chetverkin I., Loukachevitch N. (2010).
Automatic Extraction of Domain-specific
Opinion Words. Dialogue.
• [5] Minqing Hu, Bing Liu. (2004). Mining and
summarizing customer reviews. In Proc. of the
tenth ACM SIGKDD international conference
on Knowledge discovery and data mining.

Rule based approach to sentiment analysis at romip’11 slides

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (8)

Similaire à Rule based approach to sentiment analysis at romip’11 slides

Similaire à Rule based approach to sentiment analysis at romip’11 slides (20)

Plus de Dmitry Kan

Plus de Dmitry Kan (20)

Dernier

Dernier (20)

Rule based approach to sentiment analysis at romip’11 slides