SlideShare a Scribd company logo
1 of 40
Download to read offline
Stance and Gender Detection in
Tweets on Catalan Independence
@Ibereval 2017
Viviana Patti, Cristina Bosco, Università degli Studi di Torino
Mariona Taulé, M. Antònia Martí, Universitat de Barcelona
Francisco Rangel, Autoritas & Universitat Politècnica de València
Paolo Rosso, Universitat Politècnica de València
StanceCat
• Introduction and Motivations:
Stance vs. Polarity detection
• StanceCat: Task Description
• TW-CaSe corpus
• Evaluation Metrics
• Overview of the submitted approaches
http://stel.ub.edu/Stance-IberEval2017/index.html
StanceCat: Introduction & Motivation
The rise of social media is encouraging users to voice and
share their views, generating a large amount of
social data
They can be a great opportunity to investigate
communicative behaviors and conversational contexts,
for extracting knowledge about
all real life domains
The proposal of the shared task is collocated within the
wider context of a research about communication in
online political debates in Twitter
To develop resources and tools for under-resourced
languages
StanceCat: Introduction & Motivation
Online debates are a large source of informal and opinion-
sharing dialogue on current socio-political issues
Several works rely on finer-grained sentiment analysis
techniques to analyze such debates.
Among these works some is dedicated to the classification of
users’ stance, i.e. the detection of positions pro or con a
particular target entity that users assume within debates
Dual-sided debates where two possible polarizing sides
can be taken by participants
StanceCat: Introduction & Motivation
Stance detection, formalized as the task of identifying the
speaker’s opinion towards a particular target, has recently
attracted the attention of researchers in sentiment analysis
Applied to data from microblogging platforms such as Twitter
Monitoring sentiment in a specific political debate
Stance detection does not only provide information for improving
the performance of a sentiment analysis system, but can help
to better understand the way in which people communicate
ideas to highlight their point of view towards a target entity.
StanceCat: Introduction & Motivation
Being able to detect stance in user-generated content can
provide useful insights to discover novel information about
social network structures (Lai et al. CLEF 2017)
Detecting stance in social media could become a helpful tool for
journalism, companies, government
Politics is an especially good application domain: focusing on
stance is interesting when the target entity is a controversial
issue, e.g., political reforms, or a polarizing person, e.g.,
candidates in political elections, and we observe the
interaction between polarized communities.
StanceCat: Introduction & Motivation
Semeval 2016 - Track II. Sentiment analysis Task 6
Detecting Stance in Tweets
http://alt.qcri.org/semeval2016/task6/
“Given a tweet text and a target entity (person, organization,
movement, policy, etc.), automatic natural language systems must
determine whether the tweeter is in favor of the target, against the
given target, or whether neither inference is likely”
Stance detection: automatically determining from text whether
the author is in favor / against / neutral-none w.r.t. a target
StanceCat: Stance vs Sentiment
Stance detection is of course related to sentiment analysis
BUT there are significant differences.
In a classical sentiment analysis tasks systems have to
determine if a piece of text is positive, negative or
neutral.
In stance detection systems have to determine the
favorability towards a given target entity of interest,
where the target may not be explicitly mentioned in the
text.
StanceCat: Stance vs Sentiment
Example [source: training set of SemEval-2016 Task 6]
Support #independent #BernieSanders because he’s not a liar.
#POTUS #libcrib #democrats #tlot #republicans #WakeUpAmerica
#SemST.
• Target: Hillary Clinton [context: Party presidential primaries
for Democratic and Rapublican parties in US]
• The tweeter expresses a positive opinion towards an adversary of
the target (Sanders)
• We can infer that the tweeter expresses a negative stance towards
the target, i.e. she/he is likely unfavorable towards Hillary Clinton
• Important: tweet does not contain any explicit clue to find the target
• In many cases the stance must be inferred
• For a deeper exploration of the relation between sentiment and
stance in the Semeval dataset see:
Saif M. Mohammad, Parinaz Sobhani, Svetlana Kiritchenko:
Stance and Sentiment in Tweets. ACM Trans. Internet Techn. 17(3):
26:1-26:23 (2017)
• An interactive visualization of the dataset is available at:
http://www.saifmohammad.com/WebPages/StanceDataset.htm
a useful tool to explore the stance-target combinations present in
the annotated dataset and the relations between stance and
sentiment.
StanceCat: Stance vs Sentiment
Our focus and stance target: Catalan Independence
corpus from Twitter, filtering by the hashtag #independencia #27S
timelapse: end of September 2015 – December 2015
27S: September 27, 2015
Regional elections in Catalonia
de facto
referendum on independence
#independencia #27S
two of the hashtags which has been accepted within the dialogical
and social context growing around the topic; largely exploited in the
debate
StanceCat: Introduction & Motivation
Multilingual perspective
different socio-political debates
French: #mariagepourtous
Debate on the homosexual wedding
in France (Bosco et al. @LREC2016)
Italian: #labuonascuola
Debate on the reform of the education sector
in Italy (Stranisci et al. @LREC2016)
StanceCat: Introduction & Motivation
Multilingual perspective
different socio-political debates
Engish: #brexit
Debate on British Exit from EU
(Lai et al. @CLEF2017)
StanceCat: Introduction & Motivation
• http://stel.ub.edu/Stance-
IberEval2017/index.html
StanceCat: Task Description
StanceCat: Task Description
StanceCat Task
SubTask 1- Stance Detection
Deciding whether each message
is neutral, in favor or against the
target: ‘Catalan Independence’
SubTask 2- Gender
Detection
Identification of the gender of
the author of the message
Languages: Catalan and Spanish
StanceCat: Task Description
StanceCat Task
SubTask 1- Stance Detection
Deciding whether each message
is neutral, in favor or against the
target: ‘Catalan Independence’
SubTask 2- Gender
Detection
Identification of the gender of
the author of the message
Languages: Catalan and Spanish
Stance detection: SemEval-2016, Task-6; author profiling: PAN@CLEF.
Novelty: The two tasks have never been performed together for Spanish
and Catalan as part of one single task.
Results will be of interest not only for sentiment analysis but also for
author profiling and for socio-political studies
StanceCat: Task Description
StanceCat: Corpus
• TW-CaSe corpus  10.800 tweets
#Independencia
#27S
TW-
CaSe
Female Male Total
Catalan 2,700 2,700 5,400
Spanish 2,700 2,700 5,400
Cosmos tool (by Autoritas)
Training Test
4,319 1,081
4,319 1,081
80% 20%
StanceCat: Corpus
• Annotation Scheme:
Stance Tags
–AGAINST: Negative stance
–FAVOR: Positive stance
–NONE: Neutral stance + stance cannot be inferred
Gender Tags
–FEMALE
–MALE
StanceCat: Corpus
• Example:
Language: Catalan
Stance: FAVOR
Gender: FEMALE
Tweet: 15 diplomàtics internacional observen les plesbiscitàries, será que
interessen a tothom menys a Espanya #27
‘ 15 international diplomats observe the plebiscite, perhaps it is of
interest to everybody except to Spain #27’
StanceCat: Corpus
• Criteria:
– Writing text: emoticons, @mentions and #hashtags √
– Links (webpages, photographs, videos…) (NO, in TW-CaSe 0.1)
(YES, in TW-CaSe 1.0)
StanceCat: Corpus
• Annotation procedure:
1.3 trained annotators tagged the stance in 500 Catalan tweets and in
500 Spanish tweets in parallel
2.Interannotator Agreement Test (IAT)
3.Annotation of the whole corpus individually.
Annotators: 3 trained annotators + 2 seniors researchers
Meetings: once a week  problematic cases solved by common
consensus
StanceCat: Corpus
• Interannotator Agreement Test: Results
Annotator pairs
Pairwise agreement
TW-CaSe-CA TW-CaSe-ES
A-B 75.78% 76.40%
A-C 79.54% 77.80%
B-C 82.46% 81%
Average Agreement 79.26% 78.40%
Fleiss’ Kappa 0.60 0.60
StanceCat: Corpus
• Disagreements: Communicative intentions are unclear
Language: Spanish
Stance: NONE A= AGAINST B and C = NONE
Gender: MALE
Tweet: #27 voy a denunciar a todo aquel q me siga insultando usando ls
red. Yo no soy imbécil, ni mi bandera es n trapo
‘#27 I’m going to denounce anyone who continues to insult me using the web. I’m
not stupid, neither my flag is a rag’
StanceCat: Corpus
• Disagreements: Communicative intentions are unclear
Language: Catalan
Stance: NONE A= AGAINST B= FAVOR C = NONE
Gender: MALE
Tweet: La @cupnacional t la clau de Matrix
‘The @cupnacional has the key of Matrix’
• Distribution of labels for stance, gender and
language
StanceCat: Corpus
Female Male
Total Dataset
favor against none favor against none
Cat
1,456 57 646 1,192 74 894 4,319 training
365 14 162 298 18 224 1,081 test
Spa
145 693 1,322 190 753 1,216 4,319 training
36 173 331 48 188 305 1,081 test
StanceCat: Evaluation Metrics
• Macro-average on F-score (FAVOR &
AGAINST) to evaluate Stance
(Semeval 2016)
• Accuracy to evaluate Gender
(PAN@CLEF)
StanceCat: Baselines
• Majority class: A random basis approach
that returns the majority class.
• LDR (Low Dimensionality Representation):
– The key concept is the probability of occurrence
(weight) of each word in the training set in
each of the possible classes.
– The distribution of weights for a document
should be more similar to the distribution of
weights of its corresponding class.
StanceCat: Participation
10 PARTICIPANTS 31 RUNS
STANCE GENDER
CA ES CA ES
9 10 4 5
StanceCat: Approaches
CLASSIFICATION APPROACHES PARTICIPANT FEATURES
SVM
DECISION TREES
RANDOM FOREST
LOGISTIC REGRESSION
MULTINOMIAL NB
NEURAL NETWORKS
MULTILAYER PERCEPTRON
LSTM
CNN
MLP
FASTTEXT
KIM
BI-LSTM
ltl_uni_due
iTACOS
ARA1337
ELiRF-UPV
LTRC_IIITH
atoppe
LuSer
deepCybErNet
Word n-grams
Character n-grams
POS
Hashtags
Stylistic features (number of
hashtags, number of words…)
(Stance&gender) Specific tokens
Word embeddings
N-gram embeddings
One-hot vectors
StanceCat: Stance Results
StanceCat: Gender Results
StanceCat: Stance vs. Gender (Catalan)
StanceCat: Stance vs. Gender (Spanish)
StanceCat: Error Analysis
• More errors in case of males.
• In Catalan, more errors from Against to
Favor. In Spanish, more errors from Favor
to Against.
• In Spanish, errors from Against to Favor
are minimal (2%).
StanceCat: Error Analysis
females
males
StanceCat: Error Analysis
females
males
StanceCat: Conclusions
• Stance and gender identification shared
task
• High participation
– 10 teams, 5 countries, 31 runs
• Challenging task
– F-measures below 50%
• Dataset released to the community
– Spanish and Catalan
StanceCat: Credits
Programa I+D: TIN2015-71147
Thank you!
patti@di.unito.it
bosco@di.unito.it
francisco.rangel@autoritas.es prosso@dsic.upv.es
amarti@ub.edu
mtaule@ub.edu

More Related Content

What's hot

Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key Cryptography
anusachu .
 
08 c++ Operator Overloading.ppt
08 c++ Operator Overloading.ppt08 c++ Operator Overloading.ppt
08 c++ Operator Overloading.ppt
Tareq Hasan
 

What's hot (20)

Ch14
Ch14Ch14
Ch14
 
block ciphers
block ciphersblock ciphers
block ciphers
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key Cryptography
 
Message AUthentication Code
Message AUthentication CodeMessage AUthentication Code
Message AUthentication Code
 
Socket Programming
Socket ProgrammingSocket Programming
Socket Programming
 
CS8792 - Cryptography and Network Security
CS8792 - Cryptography and Network SecurityCS8792 - Cryptography and Network Security
CS8792 - Cryptography and Network Security
 
CS6701 CRYPTOGRAPHY AND NETWORK SECURITY
CS6701 CRYPTOGRAPHY AND NETWORK SECURITYCS6701 CRYPTOGRAPHY AND NETWORK SECURITY
CS6701 CRYPTOGRAPHY AND NETWORK SECURITY
 
hill cipher
hill cipherhill cipher
hill cipher
 
IP Address
IP AddressIP Address
IP Address
 
Sniffing and spoofing
Sniffing and spoofingSniffing and spoofing
Sniffing and spoofing
 
Constructor ppt
Constructor pptConstructor ppt
Constructor ppt
 
3 public key cryptography
3 public key cryptography3 public key cryptography
3 public key cryptography
 
[OOP - Lec 04,05] Basic Building Blocks of OOP
[OOP - Lec 04,05] Basic Building Blocks of OOP[OOP - Lec 04,05] Basic Building Blocks of OOP
[OOP - Lec 04,05] Basic Building Blocks of OOP
 
Buffer overflow
Buffer overflowBuffer overflow
Buffer overflow
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key Cryptography
 
Pass by value and pass by reference
Pass by value and pass by reference Pass by value and pass by reference
Pass by value and pass by reference
 
Pointers, virtual function and polymorphism
Pointers, virtual function and polymorphismPointers, virtual function and polymorphism
Pointers, virtual function and polymorphism
 
Lightweight cryptography
Lightweight cryptographyLightweight cryptography
Lightweight cryptography
 
08 c++ Operator Overloading.ppt
08 c++ Operator Overloading.ppt08 c++ Operator Overloading.ppt
08 c++ Operator Overloading.ppt
 
Three address code In Compiler Design
Three address code In Compiler DesignThree address code In Compiler Design
Three address code In Compiler Design
 

Similar to Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN 2017

1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1crore projects
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
anargha gangadharan
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
Parvathy Devaraj
 
1234 Oak StreetComment by Author Good letter format that f
1234 Oak StreetComment by Author Good letter format that f1234 Oak StreetComment by Author Good letter format that f
1234 Oak StreetComment by Author Good letter format that f
ChantellPantoja184
 
1234 Oak StreetComment by Author Good letter format that f
1234 Oak StreetComment by Author Good letter format that f1234 Oak StreetComment by Author Good letter format that f
1234 Oak StreetComment by Author Good letter format that f
CicelyBourqueju
 
Writing Project 3 PrewritingFor this project I have chos.docx
Writing Project 3 PrewritingFor this project I have chos.docxWriting Project 3 PrewritingFor this project I have chos.docx
Writing Project 3 PrewritingFor this project I have chos.docx
jeffevans62972
 

Similar to Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN 2017 (20)

Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
 
The Relevance of content analysis to the media
The Relevance of content analysis to the mediaThe Relevance of content analysis to the media
The Relevance of content analysis to the media
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
 
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
 
User Classification of Organization and Organization Affiliated Users during ...
User Classification of Organization and Organization Affiliated Users during ...User Classification of Organization and Organization Affiliated Users during ...
User Classification of Organization and Organization Affiliated Users during ...
 
Language of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisLanguage of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 Analysis
 
Twitter for Nonprofits
Twitter for NonprofitsTwitter for Nonprofits
Twitter for Nonprofits
 
Cca presentation
Cca presentationCca presentation
Cca presentation
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATAREAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
 
CDTW Capstone Presentation
CDTW Capstone Presentation CDTW Capstone Presentation
CDTW Capstone Presentation
 
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYFRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
 
1234 Oak StreetComment by Author Good letter format that f
1234 Oak StreetComment by Author Good letter format that f1234 Oak StreetComment by Author Good letter format that f
1234 Oak StreetComment by Author Good letter format that f
 
1234 Oak StreetComment by Author Good letter format that f
1234 Oak StreetComment by Author Good letter format that f1234 Oak StreetComment by Author Good letter format that f
1234 Oak StreetComment by Author Good letter format that f
 
Measuring your Social Media Impact
Measuring your Social Media ImpactMeasuring your Social Media Impact
Measuring your Social Media Impact
 
Spanish revolution 23 4-2014 en
Spanish revolution 23 4-2014 enSpanish revolution 23 4-2014 en
Spanish revolution 23 4-2014 en
 
Writing Project 3 PrewritingFor this project I have chos.docx
Writing Project 3 PrewritingFor this project I have chos.docxWriting Project 3 PrewritingFor this project I have chos.docx
Writing Project 3 PrewritingFor this project I have chos.docx
 
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
 

More from Francisco Manuel Rangel Pardo

More from Francisco Manuel Rangel Pardo (20)

Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
 
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
 
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
 
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling ...
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling  ...Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling  ...
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling ...
 
AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019
 
Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.
 
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
 
RusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRERusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRE
 
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
 
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
 
Redes sociales y preadolescentes
Redes sociales y preadolescentesRedes sociales y preadolescentes
Redes sociales y preadolescentes
 
AL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building TrustAL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building Trust
 
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
 
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
 
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
 
Smart Listening - MUIinf
Smart Listening - MUIinfSmart Listening - MUIinf
Smart Listening - MUIinf
 
IA + Big Data = problema + oportunidad
IA + Big Data = problema + oportunidadIA + Big Data = problema + oportunidad
IA + Big Data = problema + oportunidad
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...
 
Author Profiling task at PAN Lab at CLEF 2015
Author Profiling task at PAN Lab at CLEF 2015Author Profiling task at PAN Lab at CLEF 2015
Author Profiling task at PAN Lab at CLEF 2015
 

Recently uploaded

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 

Recently uploaded (20)

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN 2017

  • 1. Stance and Gender Detection in Tweets on Catalan Independence @Ibereval 2017 Viviana Patti, Cristina Bosco, Università degli Studi di Torino Mariona Taulé, M. Antònia Martí, Universitat de Barcelona Francisco Rangel, Autoritas & Universitat Politècnica de València Paolo Rosso, Universitat Politècnica de València
  • 2. StanceCat • Introduction and Motivations: Stance vs. Polarity detection • StanceCat: Task Description • TW-CaSe corpus • Evaluation Metrics • Overview of the submitted approaches http://stel.ub.edu/Stance-IberEval2017/index.html
  • 3. StanceCat: Introduction & Motivation The rise of social media is encouraging users to voice and share their views, generating a large amount of social data They can be a great opportunity to investigate communicative behaviors and conversational contexts, for extracting knowledge about all real life domains The proposal of the shared task is collocated within the wider context of a research about communication in online political debates in Twitter To develop resources and tools for under-resourced languages
  • 4. StanceCat: Introduction & Motivation Online debates are a large source of informal and opinion- sharing dialogue on current socio-political issues Several works rely on finer-grained sentiment analysis techniques to analyze such debates. Among these works some is dedicated to the classification of users’ stance, i.e. the detection of positions pro or con a particular target entity that users assume within debates Dual-sided debates where two possible polarizing sides can be taken by participants
  • 5. StanceCat: Introduction & Motivation Stance detection, formalized as the task of identifying the speaker’s opinion towards a particular target, has recently attracted the attention of researchers in sentiment analysis Applied to data from microblogging platforms such as Twitter Monitoring sentiment in a specific political debate Stance detection does not only provide information for improving the performance of a sentiment analysis system, but can help to better understand the way in which people communicate ideas to highlight their point of view towards a target entity.
  • 6. StanceCat: Introduction & Motivation Being able to detect stance in user-generated content can provide useful insights to discover novel information about social network structures (Lai et al. CLEF 2017) Detecting stance in social media could become a helpful tool for journalism, companies, government Politics is an especially good application domain: focusing on stance is interesting when the target entity is a controversial issue, e.g., political reforms, or a polarizing person, e.g., candidates in political elections, and we observe the interaction between polarized communities.
  • 7. StanceCat: Introduction & Motivation Semeval 2016 - Track II. Sentiment analysis Task 6 Detecting Stance in Tweets http://alt.qcri.org/semeval2016/task6/ “Given a tweet text and a target entity (person, organization, movement, policy, etc.), automatic natural language systems must determine whether the tweeter is in favor of the target, against the given target, or whether neither inference is likely” Stance detection: automatically determining from text whether the author is in favor / against / neutral-none w.r.t. a target
  • 8. StanceCat: Stance vs Sentiment Stance detection is of course related to sentiment analysis BUT there are significant differences. In a classical sentiment analysis tasks systems have to determine if a piece of text is positive, negative or neutral. In stance detection systems have to determine the favorability towards a given target entity of interest, where the target may not be explicitly mentioned in the text.
  • 9. StanceCat: Stance vs Sentiment Example [source: training set of SemEval-2016 Task 6] Support #independent #BernieSanders because he’s not a liar. #POTUS #libcrib #democrats #tlot #republicans #WakeUpAmerica #SemST. • Target: Hillary Clinton [context: Party presidential primaries for Democratic and Rapublican parties in US] • The tweeter expresses a positive opinion towards an adversary of the target (Sanders) • We can infer that the tweeter expresses a negative stance towards the target, i.e. she/he is likely unfavorable towards Hillary Clinton • Important: tweet does not contain any explicit clue to find the target • In many cases the stance must be inferred
  • 10. • For a deeper exploration of the relation between sentiment and stance in the Semeval dataset see: Saif M. Mohammad, Parinaz Sobhani, Svetlana Kiritchenko: Stance and Sentiment in Tweets. ACM Trans. Internet Techn. 17(3): 26:1-26:23 (2017) • An interactive visualization of the dataset is available at: http://www.saifmohammad.com/WebPages/StanceDataset.htm a useful tool to explore the stance-target combinations present in the annotated dataset and the relations between stance and sentiment. StanceCat: Stance vs Sentiment
  • 11. Our focus and stance target: Catalan Independence corpus from Twitter, filtering by the hashtag #independencia #27S timelapse: end of September 2015 – December 2015 27S: September 27, 2015 Regional elections in Catalonia de facto referendum on independence #independencia #27S two of the hashtags which has been accepted within the dialogical and social context growing around the topic; largely exploited in the debate StanceCat: Introduction & Motivation
  • 12. Multilingual perspective different socio-political debates French: #mariagepourtous Debate on the homosexual wedding in France (Bosco et al. @LREC2016) Italian: #labuonascuola Debate on the reform of the education sector in Italy (Stranisci et al. @LREC2016) StanceCat: Introduction & Motivation
  • 13. Multilingual perspective different socio-political debates Engish: #brexit Debate on British Exit from EU (Lai et al. @CLEF2017) StanceCat: Introduction & Motivation
  • 15. StanceCat: Task Description StanceCat Task SubTask 1- Stance Detection Deciding whether each message is neutral, in favor or against the target: ‘Catalan Independence’ SubTask 2- Gender Detection Identification of the gender of the author of the message Languages: Catalan and Spanish
  • 16. StanceCat: Task Description StanceCat Task SubTask 1- Stance Detection Deciding whether each message is neutral, in favor or against the target: ‘Catalan Independence’ SubTask 2- Gender Detection Identification of the gender of the author of the message Languages: Catalan and Spanish Stance detection: SemEval-2016, Task-6; author profiling: PAN@CLEF. Novelty: The two tasks have never been performed together for Spanish and Catalan as part of one single task. Results will be of interest not only for sentiment analysis but also for author profiling and for socio-political studies
  • 18. StanceCat: Corpus • TW-CaSe corpus  10.800 tweets #Independencia #27S TW- CaSe Female Male Total Catalan 2,700 2,700 5,400 Spanish 2,700 2,700 5,400 Cosmos tool (by Autoritas) Training Test 4,319 1,081 4,319 1,081 80% 20%
  • 19. StanceCat: Corpus • Annotation Scheme: Stance Tags –AGAINST: Negative stance –FAVOR: Positive stance –NONE: Neutral stance + stance cannot be inferred Gender Tags –FEMALE –MALE
  • 20. StanceCat: Corpus • Example: Language: Catalan Stance: FAVOR Gender: FEMALE Tweet: 15 diplomàtics internacional observen les plesbiscitàries, será que interessen a tothom menys a Espanya #27 ‘ 15 international diplomats observe the plebiscite, perhaps it is of interest to everybody except to Spain #27’
  • 21. StanceCat: Corpus • Criteria: – Writing text: emoticons, @mentions and #hashtags √ – Links (webpages, photographs, videos…) (NO, in TW-CaSe 0.1) (YES, in TW-CaSe 1.0)
  • 22. StanceCat: Corpus • Annotation procedure: 1.3 trained annotators tagged the stance in 500 Catalan tweets and in 500 Spanish tweets in parallel 2.Interannotator Agreement Test (IAT) 3.Annotation of the whole corpus individually. Annotators: 3 trained annotators + 2 seniors researchers Meetings: once a week  problematic cases solved by common consensus
  • 23. StanceCat: Corpus • Interannotator Agreement Test: Results Annotator pairs Pairwise agreement TW-CaSe-CA TW-CaSe-ES A-B 75.78% 76.40% A-C 79.54% 77.80% B-C 82.46% 81% Average Agreement 79.26% 78.40% Fleiss’ Kappa 0.60 0.60
  • 24. StanceCat: Corpus • Disagreements: Communicative intentions are unclear Language: Spanish Stance: NONE A= AGAINST B and C = NONE Gender: MALE Tweet: #27 voy a denunciar a todo aquel q me siga insultando usando ls red. Yo no soy imbécil, ni mi bandera es n trapo ‘#27 I’m going to denounce anyone who continues to insult me using the web. I’m not stupid, neither my flag is a rag’
  • 25. StanceCat: Corpus • Disagreements: Communicative intentions are unclear Language: Catalan Stance: NONE A= AGAINST B= FAVOR C = NONE Gender: MALE Tweet: La @cupnacional t la clau de Matrix ‘The @cupnacional has the key of Matrix’
  • 26. • Distribution of labels for stance, gender and language StanceCat: Corpus Female Male Total Dataset favor against none favor against none Cat 1,456 57 646 1,192 74 894 4,319 training 365 14 162 298 18 224 1,081 test Spa 145 693 1,322 190 753 1,216 4,319 training 36 173 331 48 188 305 1,081 test
  • 27. StanceCat: Evaluation Metrics • Macro-average on F-score (FAVOR & AGAINST) to evaluate Stance (Semeval 2016) • Accuracy to evaluate Gender (PAN@CLEF)
  • 28. StanceCat: Baselines • Majority class: A random basis approach that returns the majority class. • LDR (Low Dimensionality Representation): – The key concept is the probability of occurrence (weight) of each word in the training set in each of the possible classes. – The distribution of weights for a document should be more similar to the distribution of weights of its corresponding class.
  • 29. StanceCat: Participation 10 PARTICIPANTS 31 RUNS STANCE GENDER CA ES CA ES 9 10 4 5
  • 30. StanceCat: Approaches CLASSIFICATION APPROACHES PARTICIPANT FEATURES SVM DECISION TREES RANDOM FOREST LOGISTIC REGRESSION MULTINOMIAL NB NEURAL NETWORKS MULTILAYER PERCEPTRON LSTM CNN MLP FASTTEXT KIM BI-LSTM ltl_uni_due iTACOS ARA1337 ELiRF-UPV LTRC_IIITH atoppe LuSer deepCybErNet Word n-grams Character n-grams POS Hashtags Stylistic features (number of hashtags, number of words…) (Stance&gender) Specific tokens Word embeddings N-gram embeddings One-hot vectors
  • 33. StanceCat: Stance vs. Gender (Catalan)
  • 34. StanceCat: Stance vs. Gender (Spanish)
  • 35. StanceCat: Error Analysis • More errors in case of males. • In Catalan, more errors from Against to Favor. In Spanish, more errors from Favor to Against. • In Spanish, errors from Against to Favor are minimal (2%).
  • 38. StanceCat: Conclusions • Stance and gender identification shared task • High participation – 10 teams, 5 countries, 31 runs • Challenging task – F-measures below 50% • Dataset released to the community – Spanish and Catalan