SlideShare a Scribd company logo
1 of 28
Sarcasm Detection on Twitter
May 2016
Hao Lyu, MSIS Student
Guided by Dr. Byron Wallace
17/7/2016
Content
1. Introduction
2. Data
3. Feature Models(machine learning)
4. Experimental settings
5. Result and discuss
7/7/2016
2
Why social media?
Mine and analyze data in blogs, postings, tweets
can:
• Support marketing and customer service activities
• Help decision making
• Enhance the products and services
• Improve the competitive advantage of companies
Twitter is one of the most important social media
resources.
Support different types of data: text, pictures, videos
7/7/2016
3
Sarcasm poses problems for
algorithms in U.S. election 2016
7/7/2016
4
In the race for the White House in 2016, election
campaigns rely on social media analysis to help
them tailor advertising and other outreach to
particular groups of voters.
Average follower growth since
Jan 26 --- Feb 26
1. @realDonaldTrump 20,900
2. @BernieSanders 10,400
3. @HillaryClinton 10,300
4. @MarcoRubio 5,320
5. @TedCruz 3,950
6. @RealBenCarson 1,870
7. @JohnKasich 1,440
Stay Classy
7/7/2016
5
A predictive analysis firm,
examined Tweets
containing the expression
“classy” and found 72
percent of them used it in a
positive way.
But when used near the
name of Republican
presidential candidate
Donald Trump, around three
quarters of tweets citing
"classy" were negative.
What is Sarcasm on Twitter
7/7/2016
6
A sarcastic tweet. The speaker is clearly not
welcoming allergy season back.
Lexical clues could provide enough knowledge to
detect sarcasm.
What is Sarcasm on Twitter
7/7/2016
7
Another sarcastic tweet. The speaker actually
supports democrat.
This one needs contextual information surrounding
his posting to detect it is whether or not sarcastic.
Sarcasm Detection on Twitter
State-of-the-art method combines lexical and contextual
information to achieve robust classification performance.
In this project, I re-implement of a recent method for automatic
sarcasm detection due to Bamman and Smith (2015).
I utilize multiple approaches to extract large mount of data and
apply machine learning models to detect sarcastic and non-
sarcastic tweets.
7/7/2016 8
DATA
Bamman dataset: 19534 tweets, around half
sarcastic tweets, while the other half non-sarcastic
tweets. Bamman shares the IDs of those tweets.
Tweets are dispearing with time goes, because
users may quit Twitter, protect their accounts from
viewing by the public or delete tweets. After data
crawling, I finally collected 17926 tweets.
DATA
The labels of tweets are inferred from self-
declaration of sarcasm, e.g. a tweet is marked as
sarcastic if it contains the hashtag #sarcasm or
#sarcastic and non-sarcastic otherwise.
DATA
Historical(past) tweets and profiles of user
DATA
Audience(the user who responded to the target
tweet, or was mentioned in the target tweet)
Original Tweet(the tweet to which the target tweet
responded)
DATA EXTRACTION
Static web crawling
Dynamic web crawling
Twitter Stream API
DATA EXTRACTION
Static web crawling:Scrapes static web pages
and extracts text from the HTML mark
profile
DATA EXTRACTION
Dynamic web crawling: Focus on the data sent from the
Twitter server when I interact with a website, e.g. scroll down
the page to view more tweets from a user
DATA EXTRACTION
Twitter Stream API: Make it efficient to collect
public tweets. Twitter provides an interface to
developers using its API.
Limit: 1% of public tweets
DATA PROCESSING
Remove tweets that are:
• Not English
• Shorter than 3 words
• Retweet
Replace URLs and user mentions
Remove hashtags #sarcastic and #sarcasm in the Sarcastic
tweets
Normalize profile data, e.g.,
timezone data are mapped to different area using Google
geocoder package
Numbers in Twitter are displayed in string, like ’22K’ or ‘2
Million’, and they are converted to numeric type.
FEATURE ENGINEERING
In machine learning and pattern recognition, a feature is an
individual measurable property of a phenomenon being observed.
Similar concept: the explanatory variable used in statistical
techniques such as linear regression
FEATURE ENGINEERING
Tweet Features Author Features
Represent the lexical and grammatical
information of the target tweet.
Using only text of the target tweet
Capture information about the author of
the target tweet.
Using historical tweets and profile
information of the author
Audience Features Response Features
Encode information about the addressee
of the tweet
Using historical tweets, profile information
of the audience, and the communication
between audience and the author
Consider the interaction between the
target tweet and the tweet that it is
responding to.
Using text of the original tweet
TWEET FEATURES
Bag of Words: In this model, a text (such as a sentence or a
document) is represented as the bag (multiset) of its words,
disregarding grammar and even word order but keeping
multiplicity.
“Get in am at work (not) #Work”  1 1 1 1 0 0
“Love my new work #Work”  0 0 1 0 1 1
Stop words are removed.
get am work not love new
Pronunciation features: Twitter users have specific writing styles,
e.g., RT (Retweet), CHK (Check) and IIRC (If I recall correctly).
I count the number of words that only have alphabetic characters
but no vowels, and the words with more than three syllables.
Wow! wtf man? RT @latimes: Gov. Brown signs bills to
raise smoking age to 21, restrict e-cigarettes
2 0
AUTHOR FEATURES
Author historical topics:Historical topic features are inferred
under LDA with 100 topics over all historical tweets.
LDA , short for Latent Dirichlet Allocation, is a generative
statistical model that allows sets of observations to be explained
by unobserved groups that explain why some parts of the data are
similar(Blei, Ng, and Jordan 2003)
Author 1 (tweet01, tweet11… tweetX1)
Author 2 (tweet02, tweet12… tweetX2)
Topic 1, Topic2 ,…, Topic 100
0.3232 0.932 ,…, 0.1522
0.4232 0.3322 ,…, 0.5522
Each topic is defined by multiple words, e.g.,
Topic 1 : basketball, StephCurry, Stadium, fans, awesome,
champion…
AUDIENCE FEATURES
Author/Audience Interactional topics: This feature measures the
similarity of historical topics of the audience and author.
I take the element-wise product of the author and audience's
historical topic distribution. Similar topics will have higher
distribution.
Author historical topic
Audience historical topic
element-wise product 0.05 0.81 ,…, 0.01
Topic 1, Topic2 ,…, Topic 100
0.1 0.9 ,…, 0.1
0.5 0.9 ,…, 0.1
RESPONSE FEATURES
Bag of Words: Here we use the BoW from the original tweet(the
tweet that it is responding to the target tweet)
EXPERIMENTAL SETTING
Data  meaningful features
Machine learning model: Logistic Regression
Tune
set
LR
Model
Optimized
Parameter Train
set
LR
Model
Fit
Test
set
Evalute
Results
69.1%
73.3%
75.7%
75.3%
77.6%
78.3%
7/7/2016
25
Discussion
• Combining lexical information of text and contextual
information can generate the best accuracy in detecting
sarcasm.
• Collecting historical tweets is very expensive in both time
and computing. Not very practical!
• I suggest to use less contextual information of the author,
especially the data that can be collected easily and fast.
E.g., the profile information of the author and the response
features are relatively effective and cost less.
7/7/2016
26
Discussion
• Extract the historical tweets around the target tweet. From
intuition, these surrounding tweets posted in the closer
time could probably emphasize on the similar object more
often.
• Random sampling from the historical tweet cans also both
generate the topic distribution and reduce cost.
7/7/2016
27
Questions?
lyuhao@utexas.edu
5127183100
287/7/2016

More Related Content

What's hot

Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
Rachit Goel
 
Text mining on Twitter information based on R platform
Text mining on Twitter information based on R platformText mining on Twitter information based on R platform
Text mining on Twitter information based on R platform
Fayan TAO
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
Parvathy Devaraj
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.
Deepak K
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
SonuCreation
 
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Knowledge Media Institute - The Open University
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
piya chauhan
 

What's hot (19)

Pydata Taipei 2020
Pydata Taipei 2020Pydata Taipei 2020
Pydata Taipei 2020
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Social media analysis project
Social media analysis projectSocial media analysis project
Social media analysis project
 
Reference List Citations - APA 6th Edition
Reference List Citations - APA 6th EditionReference List Citations - APA 6th Edition
Reference List Citations - APA 6th Edition
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
Finding stories by newsgathering and monitoring on social web .pptx
Finding stories by newsgathering and monitoring  on social web .pptxFinding stories by newsgathering and monitoring  on social web .pptx
Finding stories by newsgathering and monitoring on social web .pptx
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
Data Analytics Capstone
Data Analytics CapstoneData Analytics Capstone
Data Analytics Capstone
 
Text mining on Twitter information based on R platform
Text mining on Twitter information based on R platformText mining on Twitter information based on R platform
Text mining on Twitter information based on R platform
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using python
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 
Finding Missing Tweets using Topic Structure and Browsing Time
Finding Missing Tweets using Topic Structure and Browsing TimeFinding Missing Tweets using Topic Structure and Browsing Time
Finding Missing Tweets using Topic Structure and Browsing Time
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
 
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
 

Similar to Hao lyu slides_sarcasm

Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique
IJERA Editor
 
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docxBrand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
AASTHA76
 
OSINT using Twitter & Python
OSINT using Twitter & PythonOSINT using Twitter & Python
OSINT using Twitter & Python
37point2
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
moresmile
 
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxBUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
jasoninnes20
 
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxBUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
curwenmichaela
 
GeospatialDataAnalysis
GeospatialDataAnalysisGeospatialDataAnalysis
GeospatialDataAnalysis
Taylor Graham
 

Similar to Hao lyu slides_sarcasm (20)

Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique
 
IRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data Analysis
 
Twitter introduction
Twitter introductionTwitter introduction
Twitter introduction
 
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docxBrand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
 
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
 IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
 
Accessing and analysing your own social media data.pptx
Accessing and analysing your own social media data.pptxAccessing and analysing your own social media data.pptx
Accessing and analysing your own social media data.pptx
 
OSINT using Twitter & Python
OSINT using Twitter & PythonOSINT using Twitter & Python
OSINT using Twitter & Python
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxBUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
 
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxBUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
 
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation SystemLatent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
 
Five steps to search and store tweets by keywords
Five steps to search and store tweets by keywordsFive steps to search and store tweets by keywords
Five steps to search and store tweets by keywords
 
Characterizing microblogs
Characterizing microblogsCharacterizing microblogs
Characterizing microblogs
 
Hashtags & friends
Hashtags & friendsHashtags & friends
Hashtags & friends
 
Conducting Twitter Reserch
Conducting Twitter ReserchConducting Twitter Reserch
Conducting Twitter Reserch
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
 
How To Extract Data from Twitter.pdf
How To Extract Data from Twitter.pdfHow To Extract Data from Twitter.pdf
How To Extract Data from Twitter.pdf
 
GeospatialDataAnalysis
GeospatialDataAnalysisGeospatialDataAnalysis
GeospatialDataAnalysis
 

Recently uploaded

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 

Recently uploaded (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 

Hao lyu slides_sarcasm

  • 1. Sarcasm Detection on Twitter May 2016 Hao Lyu, MSIS Student Guided by Dr. Byron Wallace 17/7/2016
  • 2. Content 1. Introduction 2. Data 3. Feature Models(machine learning) 4. Experimental settings 5. Result and discuss 7/7/2016 2
  • 3. Why social media? Mine and analyze data in blogs, postings, tweets can: • Support marketing and customer service activities • Help decision making • Enhance the products and services • Improve the competitive advantage of companies Twitter is one of the most important social media resources. Support different types of data: text, pictures, videos 7/7/2016 3
  • 4. Sarcasm poses problems for algorithms in U.S. election 2016 7/7/2016 4 In the race for the White House in 2016, election campaigns rely on social media analysis to help them tailor advertising and other outreach to particular groups of voters. Average follower growth since Jan 26 --- Feb 26 1. @realDonaldTrump 20,900 2. @BernieSanders 10,400 3. @HillaryClinton 10,300 4. @MarcoRubio 5,320 5. @TedCruz 3,950 6. @RealBenCarson 1,870 7. @JohnKasich 1,440
  • 5. Stay Classy 7/7/2016 5 A predictive analysis firm, examined Tweets containing the expression “classy” and found 72 percent of them used it in a positive way. But when used near the name of Republican presidential candidate Donald Trump, around three quarters of tweets citing "classy" were negative.
  • 6. What is Sarcasm on Twitter 7/7/2016 6 A sarcastic tweet. The speaker is clearly not welcoming allergy season back. Lexical clues could provide enough knowledge to detect sarcasm.
  • 7. What is Sarcasm on Twitter 7/7/2016 7 Another sarcastic tweet. The speaker actually supports democrat. This one needs contextual information surrounding his posting to detect it is whether or not sarcastic.
  • 8. Sarcasm Detection on Twitter State-of-the-art method combines lexical and contextual information to achieve robust classification performance. In this project, I re-implement of a recent method for automatic sarcasm detection due to Bamman and Smith (2015). I utilize multiple approaches to extract large mount of data and apply machine learning models to detect sarcastic and non- sarcastic tweets. 7/7/2016 8
  • 9. DATA Bamman dataset: 19534 tweets, around half sarcastic tweets, while the other half non-sarcastic tweets. Bamman shares the IDs of those tweets. Tweets are dispearing with time goes, because users may quit Twitter, protect their accounts from viewing by the public or delete tweets. After data crawling, I finally collected 17926 tweets.
  • 10. DATA The labels of tweets are inferred from self- declaration of sarcasm, e.g. a tweet is marked as sarcastic if it contains the hashtag #sarcasm or #sarcastic and non-sarcastic otherwise.
  • 12. DATA Audience(the user who responded to the target tweet, or was mentioned in the target tweet) Original Tweet(the tweet to which the target tweet responded)
  • 13. DATA EXTRACTION Static web crawling Dynamic web crawling Twitter Stream API
  • 14. DATA EXTRACTION Static web crawling:Scrapes static web pages and extracts text from the HTML mark profile
  • 15. DATA EXTRACTION Dynamic web crawling: Focus on the data sent from the Twitter server when I interact with a website, e.g. scroll down the page to view more tweets from a user
  • 16. DATA EXTRACTION Twitter Stream API: Make it efficient to collect public tweets. Twitter provides an interface to developers using its API. Limit: 1% of public tweets
  • 17. DATA PROCESSING Remove tweets that are: • Not English • Shorter than 3 words • Retweet Replace URLs and user mentions Remove hashtags #sarcastic and #sarcasm in the Sarcastic tweets Normalize profile data, e.g., timezone data are mapped to different area using Google geocoder package Numbers in Twitter are displayed in string, like ’22K’ or ‘2 Million’, and they are converted to numeric type.
  • 18. FEATURE ENGINEERING In machine learning and pattern recognition, a feature is an individual measurable property of a phenomenon being observed. Similar concept: the explanatory variable used in statistical techniques such as linear regression
  • 19. FEATURE ENGINEERING Tweet Features Author Features Represent the lexical and grammatical information of the target tweet. Using only text of the target tweet Capture information about the author of the target tweet. Using historical tweets and profile information of the author Audience Features Response Features Encode information about the addressee of the tweet Using historical tweets, profile information of the audience, and the communication between audience and the author Consider the interaction between the target tweet and the tweet that it is responding to. Using text of the original tweet
  • 20. TWEET FEATURES Bag of Words: In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity. “Get in am at work (not) #Work”  1 1 1 1 0 0 “Love my new work #Work”  0 0 1 0 1 1 Stop words are removed. get am work not love new Pronunciation features: Twitter users have specific writing styles, e.g., RT (Retweet), CHK (Check) and IIRC (If I recall correctly). I count the number of words that only have alphabetic characters but no vowels, and the words with more than three syllables. Wow! wtf man? RT @latimes: Gov. Brown signs bills to raise smoking age to 21, restrict e-cigarettes 2 0
  • 21. AUTHOR FEATURES Author historical topics:Historical topic features are inferred under LDA with 100 topics over all historical tweets. LDA , short for Latent Dirichlet Allocation, is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar(Blei, Ng, and Jordan 2003) Author 1 (tweet01, tweet11… tweetX1) Author 2 (tweet02, tweet12… tweetX2) Topic 1, Topic2 ,…, Topic 100 0.3232 0.932 ,…, 0.1522 0.4232 0.3322 ,…, 0.5522 Each topic is defined by multiple words, e.g., Topic 1 : basketball, StephCurry, Stadium, fans, awesome, champion…
  • 22. AUDIENCE FEATURES Author/Audience Interactional topics: This feature measures the similarity of historical topics of the audience and author. I take the element-wise product of the author and audience's historical topic distribution. Similar topics will have higher distribution. Author historical topic Audience historical topic element-wise product 0.05 0.81 ,…, 0.01 Topic 1, Topic2 ,…, Topic 100 0.1 0.9 ,…, 0.1 0.5 0.9 ,…, 0.1
  • 23. RESPONSE FEATURES Bag of Words: Here we use the BoW from the original tweet(the tweet that it is responding to the target tweet)
  • 24. EXPERIMENTAL SETTING Data  meaningful features Machine learning model: Logistic Regression Tune set LR Model Optimized Parameter Train set LR Model Fit Test set Evalute
  • 26. Discussion • Combining lexical information of text and contextual information can generate the best accuracy in detecting sarcasm. • Collecting historical tweets is very expensive in both time and computing. Not very practical! • I suggest to use less contextual information of the author, especially the data that can be collected easily and fast. E.g., the profile information of the author and the response features are relatively effective and cost less. 7/7/2016 26
  • 27. Discussion • Extract the historical tweets around the target tweet. From intuition, these surrounding tweets posted in the closer time could probably emphasize on the similar object more often. • Random sampling from the historical tweet cans also both generate the topic distribution and reduce cost. 7/7/2016 27