SlideShare a Scribd company logo
1 of 19
Discovering Agreement and
Disagreement between Users
within a Twitter Conversation
Thread
PRESENTATION BY:
ARVIND KRISHNAA JAGANNATHAN
Objective
“Given the thread of conversation
among multiple users on Twitter, based
on the initiating statement (i.e Tweet)
of a user (say the initiator),
automatically those responses which
agree with the statement of the user
and those that disagree”
Phase 1: Baseline Setup
The Experimental Setup
Phase 1: Supervised Classifier – The Labor-Intensive Baseline
 Training Set: Initial Tweet + Response pairs for all twitter threads with <=15 and >=10
responses. Hand-annotated as “Agreement”, “Disagreement” and “Neither”
 Around 10000 manually annotated pairs
 Test/Development Set: Tweet + Response pairs for threads with >15 responses.
 Around 1500 pairs
 Classifier Applied: MIRA classifier. Implemented in Python
Results:
• 81.47% Accuracy on the
test/development data.
• Will be the baseline for
comparison
0
20
40
60
80
100
120
5 10 15 20 25 30
Baseline
Accuracy on Development Data
Accuracy on Training Data
Top 10 Lexical Features- By Weight
Vector
 f**k
 completely_disagree
 lol
 roflmao
 ROFL
 K_RT
 _RT
 love_you
 yeah_right
 #truth
Phase 2: Structural
Correspondence Learning
Structural Correspondence
Learning
 Domain adaptation technique to leverage abundant
labeled data in one domain and utilize it in a target
domain with less/no labeled data
 Source Domain: 14 annotated meeting threads from AMI
meeting corpus
 Around 10k statement-response adjacency pairs
 Target Domain: Initial tweet-response pairs from threads
having 10-15 responses (~10k)
Structured Correspondence
Learning Algorithm: Pivot Features
Phase 2: SCL Implementation
 Choose m pivot features from source and target
domains, such that
 They occur frequently in both domains
 Are characteristic of the task we want to achieve (i.e., indicate
agreement or disagreement)
 Chosen using labeled source data, unlabeled source and target
data
 Pivot Features: 50 most frequently occurring terms in pairs annotated
as “agreement”, “disagreement” and “backchannel (AMI)/ neither
(Twitter)”
Structured Correspondence
Learning Algorithm
 Step 1: Construct m pivot feature vectors for the source and target
domain
 Step 2: Construct one binary prediction problem per adjacency pair
of source domain
 Binary prediction question: For the given adjacency pair, does the pivot
feature mi occur in the response?
 Train a classifier on the annotated AMI corpus to construct a weight
vector, W such that,
Wi = Weight assigned to the ith adjacency pair for a particular pivot feature
 For each pivot feature, there will be a weight vector W
Structure Correspondence
Learning
Source Domain
(AMI
Annotated Meeting Corpus)
Extract Features which
strongly correlate with
agreement/disagreement
Source Feature Vector
Common Latent
Space
USVT
Project onto
Target Domain
3. Obtain the
mapping matrix UT
Twitter
Corpus
Target Feature Vector
MIRA
Classifier
Labels
Structure
Correspondence
Structured Correspondence Learning
Algorithm: Application in Target Domain
 Step 3: Construct a matrix L, whose column vectors are the pivot
predictor weight vectors
 Step 4: Perform SVD on L, i.e.,
 L = UDVT
 = UT , which is a projection from original feature space to a latent
space common to both source and target domains.
 Step 5: Apply the features from each row of on the data from Twitter
adjacency pairs and AMI adjacency pairs.
 Step 6: Through Step 5 induce correspondences between features
indicating agreement/disagreement in the AMI corpus and Twitter corpus
Results
Visualizing the correspondences
between source and target domains
AMI Corpus: Features strongly
associated with the feature disagree
disagree
wrong
incorrect
Uh
obviouslythough
tend_to
um
Twitter Corpus: Corresponding
features
disagree
completely
#stupid
ROFL
liarhave_to
hate
#WTF
1. f**k
2. completely_disagree
3. lol
4. roflmao
5. ROFL
6. K_RT
7. _RT
8. love_you
9. yeah_right
10.#truth
Results
 Three instances of the target classifier was set up:
 Labeled source domain data; unlabeled target domain data
 Labeled source domain data; unlabeled data from source and target
 Unlabeled data from source domain to augment extraction of
corresponding features
 Annotated adjacency pairs from10 meeting threads(~8k)
 Labeled source domain data; unlabeled target and source domain
data; small amount of labeled target domain data
 Twitter conversation threads with exactly 10 responses (~2k)
 Features extracted from the target domain are applied to a MIRA
classifier, and the accuracy is computed in each of the three
scenarios
Results: Comparison with Baseline
result
77.61
80.74
83.54
74
75
76
77
78
79
80
81
82
83
84
Labeled Source + Unlabeled Target Labeled Source + Unlabeled Target + Unlabeled
Source
Labeled Source + Unlabeled Target + Unlabeled
Source + Labeled Target (~2k)
ACCURACY(%)
SCENARIO
SCL: Accuracy on Twitter Test Data
Results: Comparison with Baseline
result
81.03
82.16
82.79
83.24
83.54
79.5
80
80.5
81
81.5
82
82.5
83
83.5
84
500 750 1000 1500 2000
ACCURACY(%)
NUMBER OF LABELED TARGET DATA
Varying the size of Labeled Target Data
Discussions
Salient Points of Discussion
 Purely unlabeled data, provides classification accuracy very close to
baseline
 Compared with gains from SCL applied in POS tagging
 Blitzer et. Al’s* task was from a significantly larger corpus
 Conversations in both AMI and Twitter corpus, are generally short (AMI –
around 10-12 words; Twitter maximum of 140 characters)
 Certain twitter specific constructs were not leveraged (especially retweets)
 Significantly differing lexicons to convey a similar feeling (use of single
swear words followed by a retweet for instance)
 Able to beat the baseline, with minimally available annotated data from
target domain
 Current implementation does not take into account the initial
statement/tweet
Future Work
 Use more unlabeled data to see if baseline can
be defeated without any labeled target domain
data
 Incorporate the words used in the statement into
the model
 Restrict categories of Twitter conversation to
particular domain/personalities (perhaps may
lead to better results)
 Clean up the code and make it ready for public
distribution!

More Related Content

What's hot

Duplicate Contact Trigger || Trigger Logic Building || #ApexTrigger #Salesforce
Duplicate Contact Trigger || Trigger Logic Building || #ApexTrigger #SalesforceDuplicate Contact Trigger || Trigger Logic Building || #ApexTrigger #Salesforce
Duplicate Contact Trigger || Trigger Logic Building || #ApexTrigger #SalesforceAmit Singh
 
Finding bursty topics from microblogs
Finding bursty topics from microblogsFinding bursty topics from microblogs
Finding bursty topics from microblogsmoresmile
 
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 3 - Profe...
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 3 - Profe...ICPSR - Complex Systems Models in the Social Sciences - Lab Session 3 - Profe...
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 3 - Profe...Daniel Katz
 
Prelim Project OOP
Prelim Project OOPPrelim Project OOP
Prelim Project OOPDwight Sabio
 
Panoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURFPanoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURFEric Jansen
 
House price prediction
House price predictionHouse price prediction
House price predictionKaranseth30
 

What's hot (7)

A57040102
A57040102A57040102
A57040102
 
Duplicate Contact Trigger || Trigger Logic Building || #ApexTrigger #Salesforce
Duplicate Contact Trigger || Trigger Logic Building || #ApexTrigger #SalesforceDuplicate Contact Trigger || Trigger Logic Building || #ApexTrigger #Salesforce
Duplicate Contact Trigger || Trigger Logic Building || #ApexTrigger #Salesforce
 
Finding bursty topics from microblogs
Finding bursty topics from microblogsFinding bursty topics from microblogs
Finding bursty topics from microblogs
 
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 3 - Profe...
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 3 - Profe...ICPSR - Complex Systems Models in the Social Sciences - Lab Session 3 - Profe...
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 3 - Profe...
 
Prelim Project OOP
Prelim Project OOPPrelim Project OOP
Prelim Project OOP
 
Panoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURFPanoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURF
 
House price prediction
House price predictionHouse price prediction
House price prediction
 

Viewers also liked

Design and Analysis of Algorithms
Design and Analysis of AlgorithmsDesign and Analysis of Algorithms
Design and Analysis of AlgorithmsArvind Krishnaa
 
Elän joka päivä enemmän - miten hyvinvointitietoisuus näkyy kuluttajan valinn...
Elän joka päivä enemmän - miten hyvinvointitietoisuus näkyy kuluttajan valinn...Elän joka päivä enemmän - miten hyvinvointitietoisuus näkyy kuluttajan valinn...
Elän joka päivä enemmän - miten hyvinvointitietoisuus näkyy kuluttajan valinn...Darwin Oy
 
Technology Transfer in the Renewable Energy Space: Key Challenges and Opportu...
Technology Transfer in the Renewable Energy Space: Key Challenges and Opportu...Technology Transfer in the Renewable Energy Space: Key Challenges and Opportu...
Technology Transfer in the Renewable Energy Space: Key Challenges and Opportu...CambridgeIP Ltd
 
cdac@parag.gajbhiye@test123
cdac@parag.gajbhiye@test123cdac@parag.gajbhiye@test123
cdac@parag.gajbhiye@test123Parag Gajbhiye
 
Uuden tuotteen lanseerauksen haasteet elintarviketeollisuudessa - miten vältt...
Uuden tuotteen lanseerauksen haasteet elintarviketeollisuudessa - miten vältt...Uuden tuotteen lanseerauksen haasteet elintarviketeollisuudessa - miten vältt...
Uuden tuotteen lanseerauksen haasteet elintarviketeollisuudessa - miten vältt...Darwin Oy
 
Improvement Profs e-Learning Presentation
Improvement Profs e-Learning PresentationImprovement Profs e-Learning Presentation
Improvement Profs e-Learning PresentationWim Vrolijk
 
Sosiaalisen median case-kimara
Sosiaalisen median case-kimaraSosiaalisen median case-kimara
Sosiaalisen median case-kimaraDarwin Oy
 
Ansaitse huomioita - luova pr hyvinvointituotteen markkinoinnissa: Case Johns...
Ansaitse huomioita - luova pr hyvinvointituotteen markkinoinnissa: Case Johns...Ansaitse huomioita - luova pr hyvinvointituotteen markkinoinnissa: Case Johns...
Ansaitse huomioita - luova pr hyvinvointituotteen markkinoinnissa: Case Johns...Darwin Oy
 
Immigration Laws
Immigration LawsImmigration Laws
Immigration Lawsdrfelix12
 
Baccetti tx timing_for_twin_block_therapy
Baccetti tx timing_for_twin_block_therapyBaccetti tx timing_for_twin_block_therapy
Baccetti tx timing_for_twin_block_therapyConsultório Particular
 
The Original Adjustable Door Hinge
The Original Adjustable Door HingeThe Original Adjustable Door Hinge
The Original Adjustable Door HingeBill Bragman
 
Actualize Consulting Overview
Actualize Consulting OverviewActualize Consulting Overview
Actualize Consulting Overviewguestdc4d74
 
20130528 raker rb_daerah_2
20130528 raker rb_daerah_220130528 raker rb_daerah_2
20130528 raker rb_daerah_2Mohammad Subhan
 
Cheryl Physician Presentation
Cheryl Physician PresentationCheryl Physician Presentation
Cheryl Physician PresentationTerri Whitesel
 
Digiaika - Mikä Muuttuu Markkinoinnissa
Digiaika - Mikä Muuttuu MarkkinoinnissaDigiaika - Mikä Muuttuu Markkinoinnissa
Digiaika - Mikä Muuttuu MarkkinoinnissaDarwin Oy
 
как изменился уровень жизни россиян 2011
как изменился уровень жизни россиян 2011как изменился уровень жизни россиян 2011
как изменился уровень жизни россиян 2011SalesDog
 
Ota sosiaalinen media tehokäyttöön
Ota sosiaalinen media tehokäyttöönOta sosiaalinen media tehokäyttöön
Ota sosiaalinen media tehokäyttöönDarwin Oy
 

Viewers also liked (20)

Design and Analysis of Algorithms
Design and Analysis of AlgorithmsDesign and Analysis of Algorithms
Design and Analysis of Algorithms
 
Elän joka päivä enemmän - miten hyvinvointitietoisuus näkyy kuluttajan valinn...
Elän joka päivä enemmän - miten hyvinvointitietoisuus näkyy kuluttajan valinn...Elän joka päivä enemmän - miten hyvinvointitietoisuus näkyy kuluttajan valinn...
Elän joka päivä enemmän - miten hyvinvointitietoisuus näkyy kuluttajan valinn...
 
Technology Transfer in the Renewable Energy Space: Key Challenges and Opportu...
Technology Transfer in the Renewable Energy Space: Key Challenges and Opportu...Technology Transfer in the Renewable Energy Space: Key Challenges and Opportu...
Technology Transfer in the Renewable Energy Space: Key Challenges and Opportu...
 
cdac@parag.gajbhiye@test123
cdac@parag.gajbhiye@test123cdac@parag.gajbhiye@test123
cdac@parag.gajbhiye@test123
 
Uuden tuotteen lanseerauksen haasteet elintarviketeollisuudessa - miten vältt...
Uuden tuotteen lanseerauksen haasteet elintarviketeollisuudessa - miten vältt...Uuden tuotteen lanseerauksen haasteet elintarviketeollisuudessa - miten vältt...
Uuden tuotteen lanseerauksen haasteet elintarviketeollisuudessa - miten vältt...
 
Improvement Profs e-Learning Presentation
Improvement Profs e-Learning PresentationImprovement Profs e-Learning Presentation
Improvement Profs e-Learning Presentation
 
clodfoundrydoc.pdf
clodfoundrydoc.pdfclodfoundrydoc.pdf
clodfoundrydoc.pdf
 
OSCAR WILDE
OSCAR WILDEOSCAR WILDE
OSCAR WILDE
 
Sosiaalisen median case-kimara
Sosiaalisen median case-kimaraSosiaalisen median case-kimara
Sosiaalisen median case-kimara
 
Ansaitse huomioita - luova pr hyvinvointituotteen markkinoinnissa: Case Johns...
Ansaitse huomioita - luova pr hyvinvointituotteen markkinoinnissa: Case Johns...Ansaitse huomioita - luova pr hyvinvointituotteen markkinoinnissa: Case Johns...
Ansaitse huomioita - luova pr hyvinvointituotteen markkinoinnissa: Case Johns...
 
Immigration Laws
Immigration LawsImmigration Laws
Immigration Laws
 
Baccetti tx timing_for_twin_block_therapy
Baccetti tx timing_for_twin_block_therapyBaccetti tx timing_for_twin_block_therapy
Baccetti tx timing_for_twin_block_therapy
 
The Original Adjustable Door Hinge
The Original Adjustable Door HingeThe Original Adjustable Door Hinge
The Original Adjustable Door Hinge
 
Actualize Consulting Overview
Actualize Consulting OverviewActualize Consulting Overview
Actualize Consulting Overview
 
20130528 raker rb_daerah_2
20130528 raker rb_daerah_220130528 raker rb_daerah_2
20130528 raker rb_daerah_2
 
Cheryl Physician Presentation
Cheryl Physician PresentationCheryl Physician Presentation
Cheryl Physician Presentation
 
Digiaika - Mikä Muuttuu Markkinoinnissa
Digiaika - Mikä Muuttuu MarkkinoinnissaDigiaika - Mikä Muuttuu Markkinoinnissa
Digiaika - Mikä Muuttuu Markkinoinnissa
 
как изменился уровень жизни россиян 2011
как изменился уровень жизни россиян 2011как изменился уровень жизни россиян 2011
как изменился уровень жизни россиян 2011
 
Vchitel_projekt
Vchitel_projektVchitel_projekt
Vchitel_projekt
 
Ota sosiaalinen media tehokäyttöön
Ota sosiaalinen media tehokäyttöönOta sosiaalinen media tehokäyttöön
Ota sosiaalinen media tehokäyttöön
 

Similar to Twitter Agreement Analysis

IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37manish jindal
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTrilok Sharma
 
Software Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and MetricsSoftware Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and MetricsProgrameter
 
Choosing the Right Transformer for Your Data Challenge
Choosing the Right Transformer for Your Data ChallengeChoosing the Right Transformer for Your Data Challenge
Choosing the Right Transformer for Your Data ChallengeSafe Software
 
A Two Step Ranking Solution for Twitter User Engagement
A Two Step Ranking Solution for Twitter User Engagement�A Two Step Ranking Solution for Twitter User Engagement�
A Two Step Ranking Solution for Twitter User EngagementBehnoush Abdollahi
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
ECET 375 Effective Communication/tutorialrank.com
 ECET 375 Effective Communication/tutorialrank.com ECET 375 Effective Communication/tutorialrank.com
ECET 375 Effective Communication/tutorialrank.comjonhson203
 
ECET 375 Invent Yourself/newtonhelp.com
ECET 375 Invent Yourself/newtonhelp.comECET 375 Invent Yourself/newtonhelp.com
ECET 375 Invent Yourself/newtonhelp.comlechenau125
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Gianmario Spacagna
 
AUTOCODECOVERGEN: PROTOTYPE OF DATA DRIVEN UNIT TEST GENRATION TOOL THAT GUAR...
AUTOCODECOVERGEN: PROTOTYPE OF DATA DRIVEN UNIT TEST GENRATION TOOL THAT GUAR...AUTOCODECOVERGEN: PROTOTYPE OF DATA DRIVEN UNIT TEST GENRATION TOOL THAT GUAR...
AUTOCODECOVERGEN: PROTOTYPE OF DATA DRIVEN UNIT TEST GENRATION TOOL THAT GUAR...acijjournal
 
SubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntitySubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntityAnkita Kumari
 
Mining Stack Overflow to Tun the IDE into a Self-confident Programming Prompter
Mining Stack Overflow to Tun the IDE into a Self-confident Programming PrompterMining Stack Overflow to Tun the IDE into a Self-confident Programming Prompter
Mining Stack Overflow to Tun the IDE into a Self-confident Programming PrompterLuca Ponzanelli
 
The Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicThe Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicDavid Solivan
 
Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...
Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...
Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...ArunkumarAkkineni1
 
Acceptance Test Driven Development With Spec Flow And Friends
Acceptance Test Driven Development With Spec Flow And FriendsAcceptance Test Driven Development With Spec Flow And Friends
Acceptance Test Driven Development With Spec Flow And FriendsChristopher Bartling
 
Daniel Egan Msdn Tech Days Oc Day2
Daniel Egan Msdn Tech Days Oc Day2Daniel Egan Msdn Tech Days Oc Day2
Daniel Egan Msdn Tech Days Oc Day2Daniel Egan
 

Similar to Twitter Agreement Analysis (20)

IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
Software Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and MetricsSoftware Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and Metrics
 
Choosing the Right Transformer for Your Data Challenge
Choosing the Right Transformer for Your Data ChallengeChoosing the Right Transformer for Your Data Challenge
Choosing the Right Transformer for Your Data Challenge
 
A Two Step Ranking Solution for Twitter User Engagement
A Two Step Ranking Solution for Twitter User Engagement�A Two Step Ranking Solution for Twitter User Engagement�
A Two Step Ranking Solution for Twitter User Engagement
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
ECET 375 Effective Communication/tutorialrank.com
 ECET 375 Effective Communication/tutorialrank.com ECET 375 Effective Communication/tutorialrank.com
ECET 375 Effective Communication/tutorialrank.com
 
selenium_master.pdf
selenium_master.pdfselenium_master.pdf
selenium_master.pdf
 
ECET 375 Invent Yourself/newtonhelp.com
ECET 375 Invent Yourself/newtonhelp.comECET 375 Invent Yourself/newtonhelp.com
ECET 375 Invent Yourself/newtonhelp.com
 
Olap
OlapOlap
Olap
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...
 
AUTOCODECOVERGEN: PROTOTYPE OF DATA DRIVEN UNIT TEST GENRATION TOOL THAT GUAR...
AUTOCODECOVERGEN: PROTOTYPE OF DATA DRIVEN UNIT TEST GENRATION TOOL THAT GUAR...AUTOCODECOVERGEN: PROTOTYPE OF DATA DRIVEN UNIT TEST GENRATION TOOL THAT GUAR...
AUTOCODECOVERGEN: PROTOTYPE OF DATA DRIVEN UNIT TEST GENRATION TOOL THAT GUAR...
 
SubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntitySubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an Entity
 
Mining Stack Overflow to Tun the IDE into a Self-confident Programming Prompter
Mining Stack Overflow to Tun the IDE into a Self-confident Programming PrompterMining Stack Overflow to Tun the IDE into a Self-confident Programming Prompter
Mining Stack Overflow to Tun the IDE into a Self-confident Programming Prompter
 
The Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicThe Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs Public
 
Matlab OOP
Matlab OOPMatlab OOP
Matlab OOP
 
Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...
Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...
Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...
 
Acceptance Test Driven Development With Spec Flow And Friends
Acceptance Test Driven Development With Spec Flow And FriendsAcceptance Test Driven Development With Spec Flow And Friends
Acceptance Test Driven Development With Spec Flow And Friends
 
Daniel Egan Msdn Tech Days Oc Day2
Daniel Egan Msdn Tech Days Oc Day2Daniel Egan Msdn Tech Days Oc Day2
Daniel Egan Msdn Tech Days Oc Day2
 
React context
React context  React context
React context
 

More from Arvind Krishnaa

Recognition of unistroke gesture sequences
Recognition of unistroke gesture sequencesRecognition of unistroke gesture sequences
Recognition of unistroke gesture sequencesArvind Krishnaa
 
Human Altruism and Cooperation
Human Altruism and CooperationHuman Altruism and Cooperation
Human Altruism and CooperationArvind Krishnaa
 
Final review presentation
Final review presentationFinal review presentation
Final review presentationArvind Krishnaa
 
Third review presentation
Third review presentationThird review presentation
Third review presentationArvind Krishnaa
 
Second review presentation
Second review presentationSecond review presentation
Second review presentationArvind Krishnaa
 
First review presentation
First review presentationFirst review presentation
First review presentationArvind Krishnaa
 
Zeroth review presentation - eBay Turmeric / SMC
Zeroth review presentation - eBay Turmeric / SMCZeroth review presentation - eBay Turmeric / SMC
Zeroth review presentation - eBay Turmeric / SMCArvind Krishnaa
 
Canvas Based Presentation tool - First Review
Canvas Based Presentation tool - First ReviewCanvas Based Presentation tool - First Review
Canvas Based Presentation tool - First ReviewArvind Krishnaa
 
Canvas Based Presentation - Zeroth Review
Canvas Based Presentation - Zeroth ReviewCanvas Based Presentation - Zeroth Review
Canvas Based Presentation - Zeroth ReviewArvind Krishnaa
 
Data Binding and Data Grid View Classes
Data Binding and Data Grid View ClassesData Binding and Data Grid View Classes
Data Binding and Data Grid View ClassesArvind Krishnaa
 
Smart camera monitoring system
Smart camera monitoring systemSmart camera monitoring system
Smart camera monitoring systemArvind Krishnaa
 
Unix Shell and System Boot Process
Unix Shell and System Boot ProcessUnix Shell and System Boot Process
Unix Shell and System Boot ProcessArvind Krishnaa
 

More from Arvind Krishnaa (17)

Analogical thinking
Analogical thinkingAnalogical thinking
Analogical thinking
 
Recognition of unistroke gesture sequences
Recognition of unistroke gesture sequencesRecognition of unistroke gesture sequences
Recognition of unistroke gesture sequences
 
Human Altruism and Cooperation
Human Altruism and CooperationHuman Altruism and Cooperation
Human Altruism and Cooperation
 
Chowka bhara
Chowka bharaChowka bhara
Chowka bhara
 
Canscape
CanscapeCanscape
Canscape
 
Final review presentation
Final review presentationFinal review presentation
Final review presentation
 
Third review presentation
Third review presentationThird review presentation
Third review presentation
 
Second review presentation
Second review presentationSecond review presentation
Second review presentation
 
First review presentation
First review presentationFirst review presentation
First review presentation
 
Zeroth review presentation - eBay Turmeric / SMC
Zeroth review presentation - eBay Turmeric / SMCZeroth review presentation - eBay Turmeric / SMC
Zeroth review presentation - eBay Turmeric / SMC
 
Canvas Based Presentation tool - First Review
Canvas Based Presentation tool - First ReviewCanvas Based Presentation tool - First Review
Canvas Based Presentation tool - First Review
 
Canvas Based Presentation - Zeroth Review
Canvas Based Presentation - Zeroth ReviewCanvas Based Presentation - Zeroth Review
Canvas Based Presentation - Zeroth Review
 
Data Binding and Data Grid View Classes
Data Binding and Data Grid View ClassesData Binding and Data Grid View Classes
Data Binding and Data Grid View Classes
 
Smart camera monitoring system
Smart camera monitoring systemSmart camera monitoring system
Smart camera monitoring system
 
Marine Pollution
Marine PollutionMarine Pollution
Marine Pollution
 
Unix Shell and System Boot Process
Unix Shell and System Boot ProcessUnix Shell and System Boot Process
Unix Shell and System Boot Process
 
Multithreading Concepts
Multithreading ConceptsMultithreading Concepts
Multithreading Concepts
 

Twitter Agreement Analysis

  • 1. Discovering Agreement and Disagreement between Users within a Twitter Conversation Thread PRESENTATION BY: ARVIND KRISHNAA JAGANNATHAN
  • 2. Objective “Given the thread of conversation among multiple users on Twitter, based on the initiating statement (i.e Tweet) of a user (say the initiator), automatically those responses which agree with the statement of the user and those that disagree”
  • 4. The Experimental Setup Phase 1: Supervised Classifier – The Labor-Intensive Baseline  Training Set: Initial Tweet + Response pairs for all twitter threads with <=15 and >=10 responses. Hand-annotated as “Agreement”, “Disagreement” and “Neither”  Around 10000 manually annotated pairs  Test/Development Set: Tweet + Response pairs for threads with >15 responses.  Around 1500 pairs  Classifier Applied: MIRA classifier. Implemented in Python Results: • 81.47% Accuracy on the test/development data. • Will be the baseline for comparison 0 20 40 60 80 100 120 5 10 15 20 25 30 Baseline Accuracy on Development Data Accuracy on Training Data
  • 5. Top 10 Lexical Features- By Weight Vector  f**k  completely_disagree  lol  roflmao  ROFL  K_RT  _RT  love_you  yeah_right  #truth
  • 7. Structural Correspondence Learning  Domain adaptation technique to leverage abundant labeled data in one domain and utilize it in a target domain with less/no labeled data  Source Domain: 14 annotated meeting threads from AMI meeting corpus  Around 10k statement-response adjacency pairs  Target Domain: Initial tweet-response pairs from threads having 10-15 responses (~10k)
  • 8. Structured Correspondence Learning Algorithm: Pivot Features Phase 2: SCL Implementation  Choose m pivot features from source and target domains, such that  They occur frequently in both domains  Are characteristic of the task we want to achieve (i.e., indicate agreement or disagreement)  Chosen using labeled source data, unlabeled source and target data  Pivot Features: 50 most frequently occurring terms in pairs annotated as “agreement”, “disagreement” and “backchannel (AMI)/ neither (Twitter)”
  • 9. Structured Correspondence Learning Algorithm  Step 1: Construct m pivot feature vectors for the source and target domain  Step 2: Construct one binary prediction problem per adjacency pair of source domain  Binary prediction question: For the given adjacency pair, does the pivot feature mi occur in the response?  Train a classifier on the annotated AMI corpus to construct a weight vector, W such that, Wi = Weight assigned to the ith adjacency pair for a particular pivot feature  For each pivot feature, there will be a weight vector W
  • 10. Structure Correspondence Learning Source Domain (AMI Annotated Meeting Corpus) Extract Features which strongly correlate with agreement/disagreement Source Feature Vector Common Latent Space USVT Project onto Target Domain 3. Obtain the mapping matrix UT Twitter Corpus Target Feature Vector MIRA Classifier Labels Structure Correspondence
  • 11. Structured Correspondence Learning Algorithm: Application in Target Domain  Step 3: Construct a matrix L, whose column vectors are the pivot predictor weight vectors  Step 4: Perform SVD on L, i.e.,  L = UDVT  = UT , which is a projection from original feature space to a latent space common to both source and target domains.  Step 5: Apply the features from each row of on the data from Twitter adjacency pairs and AMI adjacency pairs.  Step 6: Through Step 5 induce correspondences between features indicating agreement/disagreement in the AMI corpus and Twitter corpus
  • 13. Visualizing the correspondences between source and target domains AMI Corpus: Features strongly associated with the feature disagree disagree wrong incorrect Uh obviouslythough tend_to um Twitter Corpus: Corresponding features disagree completely #stupid ROFL liarhave_to hate #WTF 1. f**k 2. completely_disagree 3. lol 4. roflmao 5. ROFL 6. K_RT 7. _RT 8. love_you 9. yeah_right 10.#truth
  • 14. Results  Three instances of the target classifier was set up:  Labeled source domain data; unlabeled target domain data  Labeled source domain data; unlabeled data from source and target  Unlabeled data from source domain to augment extraction of corresponding features  Annotated adjacency pairs from10 meeting threads(~8k)  Labeled source domain data; unlabeled target and source domain data; small amount of labeled target domain data  Twitter conversation threads with exactly 10 responses (~2k)  Features extracted from the target domain are applied to a MIRA classifier, and the accuracy is computed in each of the three scenarios
  • 15. Results: Comparison with Baseline result 77.61 80.74 83.54 74 75 76 77 78 79 80 81 82 83 84 Labeled Source + Unlabeled Target Labeled Source + Unlabeled Target + Unlabeled Source Labeled Source + Unlabeled Target + Unlabeled Source + Labeled Target (~2k) ACCURACY(%) SCENARIO SCL: Accuracy on Twitter Test Data
  • 16. Results: Comparison with Baseline result 81.03 82.16 82.79 83.24 83.54 79.5 80 80.5 81 81.5 82 82.5 83 83.5 84 500 750 1000 1500 2000 ACCURACY(%) NUMBER OF LABELED TARGET DATA Varying the size of Labeled Target Data
  • 18. Salient Points of Discussion  Purely unlabeled data, provides classification accuracy very close to baseline  Compared with gains from SCL applied in POS tagging  Blitzer et. Al’s* task was from a significantly larger corpus  Conversations in both AMI and Twitter corpus, are generally short (AMI – around 10-12 words; Twitter maximum of 140 characters)  Certain twitter specific constructs were not leveraged (especially retweets)  Significantly differing lexicons to convey a similar feeling (use of single swear words followed by a retweet for instance)  Able to beat the baseline, with minimally available annotated data from target domain  Current implementation does not take into account the initial statement/tweet
  • 19. Future Work  Use more unlabeled data to see if baseline can be defeated without any labeled target domain data  Incorporate the words used in the statement into the model  Restrict categories of Twitter conversation to particular domain/personalities (perhaps may lead to better results)  Clean up the code and make it ready for public distribution!