Intro to sentiment analysis

“Information about people’s opinions plays important role for more accurate
decision making in a number of domains. There are number of opinion-rich
resources available on the internet e.g. personal blogs, reviews, specialized
websites which helps in understanding the opinions of others.”
Yenkar, Pranali Sawarkar, SD - A conceptual study of Sentiment Mining, 2012
1

“The term opinion mining appears in a paper by Dave et al. that was published in the
proceedings of the 2003 WWW conference; the publication venue may explain the popularity
of the term within communities strongly associated with Web search or information
retrieval. According to Dave et al., the ideal opinion-mining tool would “process a set of
search results for a given item, generating a list of product attributes (quality, features, etc.)
and aggregating opinions about each of them (poor, mixed, good)”. Much of the subsequent
research self-identified as opinion mining fits this description in its emphasis on extracting
and analyzing judgments on various aspects of given items. However, the term has recently
also been interpreted more broadly to include many different types of analysis of evaluative
text.
The history of the phrase sentiment analysis parallels that of “opinion mining” in certain
respects. The term “sentiment” used in reference to the automatic analysis of evaluative text
and tracking of the predictive judgments therein appears in 2001 papers by Das and Chen
[66] and Tong [297], due to these authors’ interest in analyzing market sentiment. It
subsequently occurred within 2002 papers by Turney [299] and Pang et al. [235], which were
published in the proceedings of the annual meeting of the Association for Computational
Linguistics (ACL) and the annual conference on Empirical Methods in Natural Language
Processing (EMNLP). Moreover, Nasukawa and Yi [221] entitled their 2003 paper, “Sentiment
analysis: Capturing favorability using natural language processing”, and a paper in the same
year byYi et al. [324]was named “Sentiment Analyzer: Extracting sentiments about a given
topic using natural language processing techniques”. These events together may explain the
popularity of “sentiment analysis” among communities self-identified as focused on NLP. A
sizeable number of papers mentioning “sentiment analysis” focus on the specific application
of classifying reviews as to their polarity (either positive or negative), a fact that appears to
have caused some authors to suggest that the phrase refers specifically to this narrowly
defined task. However, nowadays many construe the term more broadly to mean the
computational treatment of opinion, sentiment, and subjectivity in text.”
B.Bang, L. Lee - Opinion Mining and Sentiment Analysis 2008
3

References:
•2001 Sanjiv Das and Mike Chen. Yahoo! for Amazon: Extracting market sentiment
from stock message boards. In Proceedings of the Asia Pacific Finance Association
Annual Conference (APFA), 2001.
•Richard M. Tong. An operational system for detecting and tracking opinions in on-
line discussion. In Proceedings of the Workshop on Operational Text Classification
(OTC), 2001.
•Satoshi Morinaga, Kenji Yamanishi, Kenji Tateishi, and Toshikazu Fukushima. Mining
product reputations on the web. In Proceedings of the ACM SIGKDD Conference on
Knowledge Discovery and Data Mining (KDD), pages 341–349, 2002. Industry track.
•Kushal Dave, Steve Lawrence, and David M. Pennock. Mining the peanut gallery:
Opinion extraction and semantic classification of product reviews. In Proceedings of
WWW, pages 519–528, 2003.
•Tetsuya Nasukawa and Jeonghee Yi. Sentiment analysis: Capturing favorability using
natural language processing. In Proceedings of the Conference on Knowledge Capture
(K-CAP), 2003.
4

Themoodmap.uk
•real time
•uses ANEW in a Python implementation
•green is happy, red is sad tweet
5

Twittermood.org
This is a quite advertised application that shows a real time Twitter mood map, this
time for US. As the previous Twitter UK mood map, this application is also based on
the ANEW word list. The algorithm does not use simple average calculations over the
sentiments of a sentence but works with relative fractions. An example of how this
algorithm works is found in the research paper of O’Connor and Balasubramanyan.
To determine the emotion of a sentence they take the number of positive words in
that sentence and divide it with the total number of words in the sentence. The real
time mood map displays dark blue and orange circles. Dark blue indicate negative
posts and orange circles indicate positive posts. The diameter of the circles
emphasize the number of posts of the area. On this mood map one cannot read the
actually posted tweets.
• B. O’Connor and R. Balasubramanyan. From Tweets to Polls: Linking Text Sentiment
to Public Opinion Time Series. ICWSM, 11:122–129, 2010.
6

Machine Learning is everywhere now in our life:
• simplest example is Amazon and the product
recommendations it shows you - it learns from your previous
bought products and recommends new items
• moreover - everything which has to do with lots and lots of
data, a machine can try to make sense of it based on what
answers we need: learn about disease in medical records,
customer behavior in web click data and so on…
• a machine learning can be used to learn to recognize your
handwriting on an envelope
• text classification
8

ANEW has around 2500 English rated words
9

1- 4 negative , 4-6, neutral, 6-9 positive
10

Themoodmap.uk
This real time mood map for UK is based on Twitter posts. You can watch how
Tweeter users feel at this right moment and also see what they post. Taking a look in
the background the algorithm is based on ANEW word list and is implemented in
Python. Taken as example the following sentence:
I have an example sentence have.
We can separate the sentence into words like this [i, have, an, example, sentence,
have]
such that its tokens list would contain [(I,1),(have,2),
(an,1),(example,1),(sentence,1))].
The actual rating of the sentence is done according to the following equation applying
this for all tokens in a repetitive cycle:
emotion = emotion + valence(of word) * number of times it appears
count = count +number of times the words appears sentence
emotion = emotion / count
From a range of red to dark green it displays on the UK map the negative or positive
tweets. Through clicking a point one can also see the live tweet just posted.
• P. Patel and D. Saul. Mood map python code. URL https://github.com/
priyeshpatel/moodmap/blob/master/moodmap/sentiment.py. Accessed:
2014-02-16.
11

What does a bad road mean?
How can we rate the sentence/text?
12

Davidov, Dmitry, Oren Tsur, and Ari Rappoport. "Semi-supervised recognition of
sarcastic sentences in twitter and amazon." Proceedings of the Fourteenth
Conference on Computational Natural Language Learning. Association for
Computational Linguistics, 2010.
http://aclweb.org/anthology//W/W10/W10-2914.pdf
1. “thank you Janet Jackson for yet another year of Super Bowl classic rock!” (Twitter)
2. “He’s with his other woman: XBox 360. It’s 4:30 fool. Sure I can sleep through the
gunfire” (Twitter)
3. “Wow GPRS data speeds are blazing fast.” (Twitter)
4. “[I] Love The Cover” (book, amazon)
5. “Defective by design” (music player, amazon)
13

• Precision - how right are we?
• Recall - did we find all relevant solutions?
• Accuracy - % of true or correct responses from the total.
Wikipedia : http://en.wikipedia.org/wiki/Precision_and_recall
15

Intro to sentiment analysis

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Intro to sentiment analysis

Similaire à Intro to sentiment analysis (20)

Dernier

Dernier (20)

Intro to sentiment analysis