SlideShare a Scribd company logo
1 of 11
A PROJECT REPORT ON
SARCASM ANALYSIS USING
MACHINE LEARNING
Submitted by:
Subhadarsini prusty
B212045, IIIT BBSR
ABSTRACT
Sarcasm analysis being one of the toughest challenges in Natural
Language Processing (NLP) , has been a hot topic of research these
years. A lot of work has already been done in the field of sentiment
analysis but still there are enormous challenges to it. The property of
sarcasm that makes it difficult to analyze and detect is the gap
between its literal and intended meaning. Detecting sentiment in social
media texts like facebook, twitter, online blogs, reviews has become an
important task as they influence every business organizations. In this
work I have presented the detection of sarcasm in tweets using
machine learning approach.
INTRODUCTION
Wikipedia defines sentiment analysis as the process that “aims to
determine the attitude of a speaker or a writer with respect to some
topic”. Social media and social networking have fueled the online
space. Ratings, reviews, comments etc are everywhere. The need for
clear, reliable information about consumer preferences has led to
increasing interests in high level analysis of online social media content.
For many businesses, online opinion has turned into a kind of virtual
currency that can make or break a product in the marketplace.
Sentiment analysis actually means monitoring social media posts and
discussions, then figuring out how participants are reacting to a brand
or event.
Sarcasm is derived from the French word “Sarcasmor” that means “tear
flesh” or “grind the teeth”. In simple words it means to speak bitterly.
The literal meaning is different than what the speaker intends to say
through sarcasm. As defined by Wikipedia sources sarcasm is a “sharp,
bitter, or cutting expression or remark; a bitter gibe or taunt”. Sarcasm
is also defined as a contrast between a positive sentiment and negative
situation and vice versa. For example, “I love working on holidays”. In
this sentence “love” is giving a positive sentiment but “working on
holidays” is referring to a negative situation since people generally used
to relax on holidays. Sarcasm is known as “the activity of saying or
writing the opposite of what you mean, or of speaking in a way
intended to make someone else feel stupid or show them that you are
angry” (Macmilan,2007). Example: “it feels great being bored”. The
sentence contains both positive (“great”) and negative word (“bored”)
in a sentence. Therefore, it can be classified as a sarcastic sentence.
For the task of sarcasm analysis it is very important to have a
rudimentary knowledge of Natural Language Processing (NLP). NLP
aims to acquire, understand and generate the human languages such as
English, Chinese etc. The natural language analysis goes through many
stages namely tokenization, lexical analysis, syntactic analysis, semantic
analysis, and pragmatic analysis. Lexical analysis is converting a
sequence of characters into a sequence of tokens i.e. meaningful
character strings. Syntactic analysis provides an order and structure of
each sentence in the text. Semantic analysis is to find the literal
meaning, and pragmatic analysis is to determine the meaning of the
text in context. These tasks are further broken down into parsing, pos-
tagging etc. There are challenges in every field of NLP and extensive
research works are going on for these. Some of the challenges include:
POS-tagging, named entity recognition, sentiment analysis, coreference
resolution, parsing, machine translation, information translation, text
summarization etc.
Sarcasm can be detected considering certain features. There are
pragmatic and lexical features. Features can also be developed using
certain patterns. There can be features based on oral or gestural clues
such as: emoticons, onomatopoeic expressions for laughter, positive
interjections, quotation marks, use of punctuations, can also help in
detecting sarcasm. But all these features are not enough to identify
sarcasm in a piece of text. There is a need of context, a common ground
between tweeters and general world knowledge. The machine should
be aware of what is the context of the text and relating it to general
world knowledge in order to identify sarcasm more accurately. In my
work I have taken some of the features like intensifiers and interjection
words and through different machine learning algorithms checked how
well the model is performing.
LITERATURE REVIEW
(Francesco Barbieri, Francesco Ronzano, Horracio sagion) considered 7
sets of lexical features aim to detect sarcasm through its inner structure
such as unexpectedness, intensity of the terms or imbalance between
registers etc. They haven’t used any pattern of words as features. Irony
and sarcasm has been approached as computation problem recently by
(Carvalho et al) who created an automatic system for detecting irony
relying on emoticons and special punctuations. They focused on
detection of ironic style in newspaper articles. (Lukin and Walker) used
bootstrapping to improve the performance of sarcasm and nastiness
classifiers for online dialogue and (Liebrecht et al) designed a model to
detect sarcasm in Dutch tweets. Finally (Riloff) built a model to detect
sarcasm with a bootstrapping algorithm that automatically learn lists of
positive sentiment phrases and negative situation phrases from
sarcastic tweet, in order to detect the characteristic of sarcasm of being
a contrast between positive sentiment and negative situation.(Pang and
Lee) used a supervised approach to classify movie reviews into two
classes after performing subjective feature extraction. They achieve an
in-language classification accuracy of 86% using unigram model. In
contrast (Dave et al) find that bigram based features produce better
results.(Chaumartin et al) use sentiwordnet to for finding the polarity of
the newspaper headings. (Verma and Bhattacharyya et al) used the
same resource for developing both resource-based and machine
learning-based methods for classification of movie reviews.
METHODOLOGY
The most important task for sarcasm detection is to prepare a feature
list according to which the machine will be trained to detect sarcastic
and non sarcastic tweets. I have taken a list of interjection words and
intensifiers to build my feature vector. The algorithm for the feature
vector is as follows:
Input- preprocessed sarcastic tweets.
Output- interjection words and intensifiers from the sarcastic tweets.
1.def getFeatureVector(tweet):
2. featureVector=[]
3. words=tweet.split()
4. words_in=[wow,aww,poo,yay,oh,yeah,very,really,extremely,….]
5. for w in words:
6. for I in words_in:
7. if (w==I) :
8. featureVector.append(w)
9. return featureVector
#end
Once a feature vector is ready rest of the things become easy. The
quality of our result will solely depend on the feature vector made. The
entire process is as below.
1. Tweets are preprocessed i.e tweets are converted to lower cases,
URLs and hashtags are removed, @username are replaced by ATUSER
etc.
2. Stopwords are removed and feature vector is made as shown above.
3. From the file, the tweets and labels are extracted and a feature
vector is obtained from them.
4. A feature list is made that consist of all feature words extracted from
tweets.
5. After this the training set is made ready with the help of Natural
Language Processing Toolkit (NLTK) and using those extracted features
from tweets.
6. Once the training set gets ready now the tweets are classified using
different classifiers such as Naïve Bayes, Maximum entropy and
Decision tree classifiers and accuracies are noted and analyzed.
RESULTS
The feature vector will appear as
For training 5000 sarcastic and 5000 non sarcastic tweets are taken. For
good results it is important to have a balanced corpus. And for testing
2000 sarcastic and 2000 non sarcastic tweets are taken. The accuracies
of the classifiers are found as
CLASSIFIERS ACCURACY RECALL PRECISION F-SCORE
Naïve Bayes 0.55 0.43 0.41 0.42
Maximum entropy 0.60 0.47 0.46 0.46
Decision Tree 0.63 0.51 0.42 0.46
SARCASTIC TWEETS FEATURE WORDS
oh really!! I did not know that
before. Thanks for making me
informed.
‘oh’, ’really’
aww! I love working on holidays. ‘aww’
yay! Thoroughly enjoy the smell of
pot permeating through my house
while clean, thanks neighbours.
‘yay’
being ignored makes me extremely
happy. yeah!!
‘extremely’, ’yeah’
CONCLUSION
Since I managed to get an average 50%-60% accuracy, it can be made
far better by incorporating better and detailed features. The better is
the feature vector, better will be the model. Also an important factor
for the model’s performance is the corpus. Corpus should be chosen
wisely to get maximum efficiency and accuracy.
FUTURE WORK
I would try and bring in more detailed features to detect sarcasm more
precisely. Other than this there are also other aspects of sarcasm
analysis to be worked upon.
 Coreference resolution.
 Identify context in written data for more accurate sarcasm
detection.
 Identify sarcasm in text for Indian languages.
REFERENCES
[1] Carvalho, P., Silva, M., Sarmento, L. and De Oliveira, E. 2009. Clues
for Detecting Irony in User-Generated
Contents: Oh...!! It’s “so easy" ;-). [pdf] Hong Kong, China: ACM.
Available through: Google Scholar
http://xldb.di.fc.ul.pt/xldb/publications/Carvalho09:Clues:Detecting:Iro
ny_document.pdf [Accessed: 4 March 2013].
[2] R. J. Kreuz and G. M. Caucci, “Lexical influences on the perception of
sarcasm,” in Proceedings of the Workshop on computational
approaches to Figurative Language, ACL, pp. 1–4, 2007.
[3] C. Liebrecht, F. Kunneman, and A. van den Bosch, “The perfect
solution for detecting sarcasm in tweets# not,” Association for
Computational Linguistics, pp. 29–37, 2013.
[4] E. Riloff, A. Qadir, P. Surve, L. De Silva, N. Gilbert, and R. Huang,
“Sarcasm as contrast between a positive sentiment and negative situa-
tion.,” in EMNLP, pp. 704–714, 2013.
[5] E. Lunando and A. Purwarianti, “Indonesian social media sentiment
analysis with sarcasm detection,” in Advanced Computer Science and
Information Systems (ICACSIS), 2013 International Conference on
IEEE, pp. 195–198, 2013.
[6] F. Barbieri, H. Saggion, and F. Ronzano, “Modelling sarcasm in
twitter, a novel approach,” in Association for Computational Linguistics,
pp. 50–58, 2014.
[7] D. Davidov, O. Tsur, and A. Rappoport, “Semi-supervised recognition
of sarcastic sentences in twitter and amazon,” in Proceedings of the
Fourteenth Conference on Computational Natural Language Learning,
pp. 107–116, 2010, Association for Computational Linguistics.
[8] E. Filatova, “Irony and sarcasm: Corpus generation and analysis
using crowdsourcing.,” in LREC, pp. 392–398, 2012.
[9] R. Justo, T. Corcoran, S. M. Lukin, M. Walker, and M. I. Torres, “Ex-
tracting relevant knowledge for the detection of sarcasm and nastiness
in the social web,” Knowledge-Based Systems, vol. 69, pp. 124–133,
2014.
[10] F. Kunneman, C. Liebrecht, M. van Mulken, and A. van den Bosch,
“Signaling sarcasm: From hyperbole to hashtag,” Information Process-
ing & Management (in press), 2014.
[11] P. Liu, W. Chen, G. Ou, T. Wang, D. Yang, and K. Lei, “Sarcasm
detection in social media based on imbalanced classification,” in Web-
Age Information Management, pp. 459–471, 2014.
[12] D. Maynard and M. A. Greenwood, “Who cares about sarcastic
tweets?
investigating the impact of sarcasm on sentiment analysis,” in Proceed-
ings of LREC, pp. 4238–4243, 2014.
[13] O. Tsur, D. Davidov, and A. Rappoport, “Icwsm-a great catchy
name:
Semi-supervised recognition of sarcastic sentences in online product
reviews.,” in ICWSM, pp. 162–169, 2010.
[14] D. Tayal, S. Yadav, K. Gupta, B. Rajput, and K. Kumari, “Polarity
detection of sarcastic political tweets,” in Computing for Sustainable
Global Development (INDIACom), 2014 International Conference on
IEEE, pp. 625–628, 2014.
[15] A. Rajadesingan, R. Zafarani, and H. Liu, “Sarcasm detection on
twitter:
A behavioral modeling approach,” in Proceedings of the Eighth ACM
International Conference on Web Search and Data Mining, pp. 97–106,
2015.
[16] P. Tungthamthiti, K. Shirai, and M. Mohd, “Recognition of sarcasm
in tweets based on concept level sentiment analysis and supervised
learning approaches,” pp. 404–413, 2014.
[17] R. J. Kreuz and R. M. Roberts, “Two cues for verbal irony:
Hyperbole and the ironic tone of voice,” Metaphor and symbol, vol. 10,
no. 1,pp. 21–31, 1995.
[18] A. Utsumi, “Verbal irony as implicit display of ironic environment:
Distinguishing ironic utterances from nonirony,” Journal of Pragmatics,
vol. 32, no. 12, pp. 1777–1806, 2000.
[19] J. W. Pennebaker, M. E. Francis, and R. J. Booth, “Linguistic inquiry
and word count: Liwc 2001,” Mahway: Lawrence Erlbaum Associates,
vol. 71, pp. 1–11, 2001.
[20] C. Strapparava, A. Valitutti, et al., “Wordnet affect: an affective
extension of wordnet.,” in LREC, vol. 4, pp. 1083–1086, 2004.
[21] B. Liu, “Sentiment analysis and opinion mining,” Synthesis Lectures
on Human Language Technologies, vol. 5, no. 1, pp. 1–167, 2012.

More Related Content

What's hot

Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...
Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...
Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...Association for Computational Linguistics
 
Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Dhruv Gohil
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]Sagar Ahire
 
A performance of svm with modified lesk approach for word sense disambiguatio...
A performance of svm with modified lesk approach for word sense disambiguatio...A performance of svm with modified lesk approach for word sense disambiguatio...
A performance of svm with modified lesk approach for word sense disambiguatio...eSAT Journals
 
SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisAditya Joshi
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingdhruv_chaudhari
 
Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion miningAnkush Mehta
 
Sentiment Analysis
Sentiment Analysis Sentiment Analysis
Sentiment Analysis prnk08
 
The Editor as EAP Instructor
The Editor as EAP InstructorThe Editor as EAP Instructor
The Editor as EAP InstructorLawrie Hunter
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyanrudolf eremyan
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLPAnuj Gupta
 

What's hot (14)

Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...
Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...
Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...
 
Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical
 
sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]
 
A performance of svm with modified lesk approach for word sense disambiguatio...
A performance of svm with modified lesk approach for word sense disambiguatio...A performance of svm with modified lesk approach for word sense disambiguatio...
A performance of svm with modified lesk approach for word sense disambiguatio...
 
SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment Analysis
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Blenderbot
BlenderbotBlenderbot
Blenderbot
 
Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion mining
 
Sentiment Analysis
Sentiment Analysis Sentiment Analysis
Sentiment Analysis
 
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
 
The Editor as EAP Instructor
The Editor as EAP InstructorThe Editor as EAP Instructor
The Editor as EAP Instructor
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLP
 

Viewers also liked

Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Diana Maynard
 
NLP Asignment Final Presentation [IIT-Bombay]
NLP Asignment Final Presentation [IIT-Bombay]NLP Asignment Final Presentation [IIT-Bombay]
NLP Asignment Final Presentation [IIT-Bombay]Sagar Ahire
 
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]Sagar Ahire
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEDiana Maynard
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment AnalysisSagar Ahire
 
Sentiments Analysis using Python and nltk
Sentiments Analysis using Python and nltk Sentiments Analysis using Python and nltk
Sentiments Analysis using Python and nltk Ashwin Perti
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEDiana Maynard
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltkWei-Ting Kuo
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Sentiment analysis of twitter data
Sentiment analysis of twitter dataSentiment analysis of twitter data
Sentiment analysis of twitter dataBhagyashree Deokar
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweetsVasu Jain
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Rachit Goel
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 

Viewers also liked (16)

Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
 
NLP Asignment Final Presentation [IIT-Bombay]
NLP Asignment Final Presentation [IIT-Bombay]NLP Asignment Final Presentation [IIT-Bombay]
NLP Asignment Final Presentation [IIT-Bombay]
 
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]
 
Python NLTK
Python NLTKPython NLTK
Python NLTK
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATE
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Sentiments Analysis using Python and nltk
Sentiments Analysis using Python and nltk Sentiments Analysis using Python and nltk
Sentiments Analysis using Python and nltk
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltk
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Sentiment analysis of twitter data
Sentiment analysis of twitter dataSentiment analysis of twitter data
Sentiment analysis of twitter data
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweets
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 

Similar to sent_analysis_report

NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsKimberly Pulley
 
NLP - updated (Natural Language Processing))
NLP - updated (Natural Language Processing))NLP - updated (Natural Language Processing))
NLP - updated (Natural Language Processing))Jitendra Kumar Yadav
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...CITE
 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmIJSRD
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWJournal For Research
 
Metaphic or the art of looking another way.
Metaphic or the art of looking another way.Metaphic or the art of looking another way.
Metaphic or the art of looking another way.Suresh Manian
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewINFOGAIN PUBLICATION
 
A Hybrid Method of Long Short-Term Memory and AutoEncoder Architectures for S...
A Hybrid Method of Long Short-Term Memory and AutoEncoder Architectures for S...A Hybrid Method of Long Short-Term Memory and AutoEncoder Architectures for S...
A Hybrid Method of Long Short-Term Memory and AutoEncoder Architectures for S...AhmedAdilNafea
 
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...IRJET Journal
 
A Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining TechniquesA Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining TechniquesKhan Mostafa
 
A Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment AnalysisA Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment AnalysisRichard Hogue
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingBhavya Chawla
 

Similar to sent_analysis_report (20)

Sarcasm Detection
Sarcasm DetectionSarcasm Detection
Sarcasm Detection
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel Reviews
 
NLP(Natural Language Processing)
NLP(Natural Language Processing)NLP(Natural Language Processing)
NLP(Natural Language Processing)
 
NLP - updated (Natural Language Processing))
NLP - updated (Natural Language Processing))NLP - updated (Natural Language Processing))
NLP - updated (Natural Language Processing))
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithm
 
Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
 
Metaphic or the art of looking another way.
Metaphic or the art of looking another way.Metaphic or the art of looking another way.
Metaphic or the art of looking another way.
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A Review
 
A Hybrid Method of Long Short-Term Memory and AutoEncoder Architectures for S...
A Hybrid Method of Long Short-Term Memory and AutoEncoder Architectures for S...A Hybrid Method of Long Short-Term Memory and AutoEncoder Architectures for S...
A Hybrid Method of Long Short-Term Memory and AutoEncoder Architectures for S...
 
2
22
2
 
N01741100102
N01741100102N01741100102
N01741100102
 
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
 
A Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining TechniquesA Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining Techniques
 
Sentiment analysis on unstructured review
Sentiment analysis on unstructured reviewSentiment analysis on unstructured review
Sentiment analysis on unstructured review
 
A Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment AnalysisA Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment Analysis
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

sent_analysis_report

  • 1. A PROJECT REPORT ON SARCASM ANALYSIS USING MACHINE LEARNING Submitted by: Subhadarsini prusty B212045, IIIT BBSR
  • 2. ABSTRACT Sarcasm analysis being one of the toughest challenges in Natural Language Processing (NLP) , has been a hot topic of research these years. A lot of work has already been done in the field of sentiment analysis but still there are enormous challenges to it. The property of sarcasm that makes it difficult to analyze and detect is the gap between its literal and intended meaning. Detecting sentiment in social media texts like facebook, twitter, online blogs, reviews has become an important task as they influence every business organizations. In this work I have presented the detection of sarcasm in tweets using machine learning approach. INTRODUCTION Wikipedia defines sentiment analysis as the process that “aims to determine the attitude of a speaker or a writer with respect to some topic”. Social media and social networking have fueled the online space. Ratings, reviews, comments etc are everywhere. The need for clear, reliable information about consumer preferences has led to increasing interests in high level analysis of online social media content. For many businesses, online opinion has turned into a kind of virtual currency that can make or break a product in the marketplace. Sentiment analysis actually means monitoring social media posts and discussions, then figuring out how participants are reacting to a brand or event. Sarcasm is derived from the French word “Sarcasmor” that means “tear flesh” or “grind the teeth”. In simple words it means to speak bitterly. The literal meaning is different than what the speaker intends to say
  • 3. through sarcasm. As defined by Wikipedia sources sarcasm is a “sharp, bitter, or cutting expression or remark; a bitter gibe or taunt”. Sarcasm is also defined as a contrast between a positive sentiment and negative situation and vice versa. For example, “I love working on holidays”. In this sentence “love” is giving a positive sentiment but “working on holidays” is referring to a negative situation since people generally used to relax on holidays. Sarcasm is known as “the activity of saying or writing the opposite of what you mean, or of speaking in a way intended to make someone else feel stupid or show them that you are angry” (Macmilan,2007). Example: “it feels great being bored”. The sentence contains both positive (“great”) and negative word (“bored”) in a sentence. Therefore, it can be classified as a sarcastic sentence. For the task of sarcasm analysis it is very important to have a rudimentary knowledge of Natural Language Processing (NLP). NLP aims to acquire, understand and generate the human languages such as English, Chinese etc. The natural language analysis goes through many stages namely tokenization, lexical analysis, syntactic analysis, semantic analysis, and pragmatic analysis. Lexical analysis is converting a sequence of characters into a sequence of tokens i.e. meaningful character strings. Syntactic analysis provides an order and structure of each sentence in the text. Semantic analysis is to find the literal meaning, and pragmatic analysis is to determine the meaning of the text in context. These tasks are further broken down into parsing, pos- tagging etc. There are challenges in every field of NLP and extensive research works are going on for these. Some of the challenges include: POS-tagging, named entity recognition, sentiment analysis, coreference resolution, parsing, machine translation, information translation, text summarization etc.
  • 4. Sarcasm can be detected considering certain features. There are pragmatic and lexical features. Features can also be developed using certain patterns. There can be features based on oral or gestural clues such as: emoticons, onomatopoeic expressions for laughter, positive interjections, quotation marks, use of punctuations, can also help in detecting sarcasm. But all these features are not enough to identify sarcasm in a piece of text. There is a need of context, a common ground between tweeters and general world knowledge. The machine should be aware of what is the context of the text and relating it to general world knowledge in order to identify sarcasm more accurately. In my work I have taken some of the features like intensifiers and interjection words and through different machine learning algorithms checked how well the model is performing. LITERATURE REVIEW (Francesco Barbieri, Francesco Ronzano, Horracio sagion) considered 7 sets of lexical features aim to detect sarcasm through its inner structure such as unexpectedness, intensity of the terms or imbalance between registers etc. They haven’t used any pattern of words as features. Irony and sarcasm has been approached as computation problem recently by (Carvalho et al) who created an automatic system for detecting irony relying on emoticons and special punctuations. They focused on detection of ironic style in newspaper articles. (Lukin and Walker) used bootstrapping to improve the performance of sarcasm and nastiness classifiers for online dialogue and (Liebrecht et al) designed a model to detect sarcasm in Dutch tweets. Finally (Riloff) built a model to detect sarcasm with a bootstrapping algorithm that automatically learn lists of positive sentiment phrases and negative situation phrases from
  • 5. sarcastic tweet, in order to detect the characteristic of sarcasm of being a contrast between positive sentiment and negative situation.(Pang and Lee) used a supervised approach to classify movie reviews into two classes after performing subjective feature extraction. They achieve an in-language classification accuracy of 86% using unigram model. In contrast (Dave et al) find that bigram based features produce better results.(Chaumartin et al) use sentiwordnet to for finding the polarity of the newspaper headings. (Verma and Bhattacharyya et al) used the same resource for developing both resource-based and machine learning-based methods for classification of movie reviews. METHODOLOGY The most important task for sarcasm detection is to prepare a feature list according to which the machine will be trained to detect sarcastic and non sarcastic tweets. I have taken a list of interjection words and intensifiers to build my feature vector. The algorithm for the feature vector is as follows: Input- preprocessed sarcastic tweets. Output- interjection words and intensifiers from the sarcastic tweets. 1.def getFeatureVector(tweet): 2. featureVector=[] 3. words=tweet.split() 4. words_in=[wow,aww,poo,yay,oh,yeah,very,really,extremely,….] 5. for w in words: 6. for I in words_in:
  • 6. 7. if (w==I) : 8. featureVector.append(w) 9. return featureVector #end Once a feature vector is ready rest of the things become easy. The quality of our result will solely depend on the feature vector made. The entire process is as below. 1. Tweets are preprocessed i.e tweets are converted to lower cases, URLs and hashtags are removed, @username are replaced by ATUSER etc. 2. Stopwords are removed and feature vector is made as shown above. 3. From the file, the tweets and labels are extracted and a feature vector is obtained from them. 4. A feature list is made that consist of all feature words extracted from tweets. 5. After this the training set is made ready with the help of Natural Language Processing Toolkit (NLTK) and using those extracted features from tweets. 6. Once the training set gets ready now the tweets are classified using different classifiers such as Naïve Bayes, Maximum entropy and Decision tree classifiers and accuracies are noted and analyzed. RESULTS The feature vector will appear as
  • 7. For training 5000 sarcastic and 5000 non sarcastic tweets are taken. For good results it is important to have a balanced corpus. And for testing 2000 sarcastic and 2000 non sarcastic tweets are taken. The accuracies of the classifiers are found as CLASSIFIERS ACCURACY RECALL PRECISION F-SCORE Naïve Bayes 0.55 0.43 0.41 0.42 Maximum entropy 0.60 0.47 0.46 0.46 Decision Tree 0.63 0.51 0.42 0.46 SARCASTIC TWEETS FEATURE WORDS oh really!! I did not know that before. Thanks for making me informed. ‘oh’, ’really’ aww! I love working on holidays. ‘aww’ yay! Thoroughly enjoy the smell of pot permeating through my house while clean, thanks neighbours. ‘yay’ being ignored makes me extremely happy. yeah!! ‘extremely’, ’yeah’
  • 8. CONCLUSION Since I managed to get an average 50%-60% accuracy, it can be made far better by incorporating better and detailed features. The better is the feature vector, better will be the model. Also an important factor for the model’s performance is the corpus. Corpus should be chosen wisely to get maximum efficiency and accuracy. FUTURE WORK I would try and bring in more detailed features to detect sarcasm more precisely. Other than this there are also other aspects of sarcasm analysis to be worked upon.  Coreference resolution.  Identify context in written data for more accurate sarcasm detection.  Identify sarcasm in text for Indian languages. REFERENCES [1] Carvalho, P., Silva, M., Sarmento, L. and De Oliveira, E. 2009. Clues for Detecting Irony in User-Generated Contents: Oh...!! It’s “so easy" ;-). [pdf] Hong Kong, China: ACM. Available through: Google Scholar http://xldb.di.fc.ul.pt/xldb/publications/Carvalho09:Clues:Detecting:Iro ny_document.pdf [Accessed: 4 March 2013]. [2] R. J. Kreuz and G. M. Caucci, “Lexical influences on the perception of
  • 9. sarcasm,” in Proceedings of the Workshop on computational approaches to Figurative Language, ACL, pp. 1–4, 2007. [3] C. Liebrecht, F. Kunneman, and A. van den Bosch, “The perfect solution for detecting sarcasm in tweets# not,” Association for Computational Linguistics, pp. 29–37, 2013. [4] E. Riloff, A. Qadir, P. Surve, L. De Silva, N. Gilbert, and R. Huang, “Sarcasm as contrast between a positive sentiment and negative situa- tion.,” in EMNLP, pp. 704–714, 2013. [5] E. Lunando and A. Purwarianti, “Indonesian social media sentiment analysis with sarcasm detection,” in Advanced Computer Science and Information Systems (ICACSIS), 2013 International Conference on IEEE, pp. 195–198, 2013. [6] F. Barbieri, H. Saggion, and F. Ronzano, “Modelling sarcasm in twitter, a novel approach,” in Association for Computational Linguistics, pp. 50–58, 2014. [7] D. Davidov, O. Tsur, and A. Rappoport, “Semi-supervised recognition of sarcastic sentences in twitter and amazon,” in Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pp. 107–116, 2010, Association for Computational Linguistics. [8] E. Filatova, “Irony and sarcasm: Corpus generation and analysis using crowdsourcing.,” in LREC, pp. 392–398, 2012. [9] R. Justo, T. Corcoran, S. M. Lukin, M. Walker, and M. I. Torres, “Ex- tracting relevant knowledge for the detection of sarcasm and nastiness in the social web,” Knowledge-Based Systems, vol. 69, pp. 124–133, 2014.
  • 10. [10] F. Kunneman, C. Liebrecht, M. van Mulken, and A. van den Bosch, “Signaling sarcasm: From hyperbole to hashtag,” Information Process- ing & Management (in press), 2014. [11] P. Liu, W. Chen, G. Ou, T. Wang, D. Yang, and K. Lei, “Sarcasm detection in social media based on imbalanced classification,” in Web- Age Information Management, pp. 459–471, 2014. [12] D. Maynard and M. A. Greenwood, “Who cares about sarcastic tweets? investigating the impact of sarcasm on sentiment analysis,” in Proceed- ings of LREC, pp. 4238–4243, 2014. [13] O. Tsur, D. Davidov, and A. Rappoport, “Icwsm-a great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews.,” in ICWSM, pp. 162–169, 2010. [14] D. Tayal, S. Yadav, K. Gupta, B. Rajput, and K. Kumari, “Polarity detection of sarcastic political tweets,” in Computing for Sustainable Global Development (INDIACom), 2014 International Conference on IEEE, pp. 625–628, 2014. [15] A. Rajadesingan, R. Zafarani, and H. Liu, “Sarcasm detection on twitter: A behavioral modeling approach,” in Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 97–106, 2015. [16] P. Tungthamthiti, K. Shirai, and M. Mohd, “Recognition of sarcasm in tweets based on concept level sentiment analysis and supervised learning approaches,” pp. 404–413, 2014.
  • 11. [17] R. J. Kreuz and R. M. Roberts, “Two cues for verbal irony: Hyperbole and the ironic tone of voice,” Metaphor and symbol, vol. 10, no. 1,pp. 21–31, 1995. [18] A. Utsumi, “Verbal irony as implicit display of ironic environment: Distinguishing ironic utterances from nonirony,” Journal of Pragmatics, vol. 32, no. 12, pp. 1777–1806, 2000. [19] J. W. Pennebaker, M. E. Francis, and R. J. Booth, “Linguistic inquiry and word count: Liwc 2001,” Mahway: Lawrence Erlbaum Associates, vol. 71, pp. 1–11, 2001. [20] C. Strapparava, A. Valitutti, et al., “Wordnet affect: an affective extension of wordnet.,” in LREC, vol. 4, pp. 1083–1086, 2004. [21] B. Liu, “Sentiment analysis and opinion mining,” Synthesis Lectures on Human Language Technologies, vol. 5, no. 1, pp. 1–167, 2012.