SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
UNIBA at EVALITA 2014-SENTIPOLC Task: 
Predicting tweet sentiment polarity combining 
micro-blogging, lexicon and semantic features 
Pierpaolo Basile and Nicole Novielli 
pierpaolo.basile@uniba.it, nicole.novielli@uniba.it 
Department of Computer Science 
University of Bari Aldo Moro (ITALY) 
EVALITA 2014 
Pisa, 11th December 2014 
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 1 / 21
Introduction The Task 
Task Overview 
EVALITA 2014 SENTIment POLarity Classi
cation (SENTIPOLC) 
Subtask A - Subjectivity Classi
cation: classify subjectivity/objectivity of the tweet's 
content 
La Yamaha e buona solo ad alzar polemiche. E la Honda a crearne. ! subj 
Domani sera a Palazzo Chigi incontro informale tra Mario Monti e parti sociali: I leader 
di Cgil, Cisl, Uil e Ug... http: // bit. ly/ rNWoB0 ! obj 
Subtask B - Polarity Classi
cation: classify tweet's polarity: positive, negative, neutral 
and tweets expressing both positive and negative sentiment 
#Maroni La politica del governo Monti e sbagliata ! neg 
Coma iniziare un buon sabato? Guardandosi South Park il
lm! ! pos 
Pilot Task, Irony Detection 
governo #monti : piu equita o piu equitalia? 
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 2 / 21
The System Our Approach 
Our Approach 
Supervised approach relying on three kinds of features 
 Keywords and micro-blogging properties of tweets 
 Content representation in a distributional semantic model 
 Sentiment lexicon 
Our contribution 
 Represent both the tweets and the polarity classes in the word space 
 Automatically develop a sentiment lexicon for the Italian starting form 
SentiWordNet 
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 3 / 21
The System Features 
Building a Sentiment Lexicon for Italian 
Automatic development of a sentiment lexicon starting form SentiWordNet 
Lexicon-based features 
 features based on sentiment scores of tokens 
 sentiment variation in the tweet 
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 4 / 21
The System Features 
Building the Word Space from Unlabeled Tweets 
Homogenous representation of both tweets and polarity classes in the word space 
Distributional features 
 tweet vector 
!t 
as semantic feature in training our classi
ers 
 similarity between 
!t 
and each prototype vector 
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 5 / 21
The System Features 
Building the Word Space from Unlabeled Tweets 
Homogenous representation of both the tweets and the polarity classes in the word space 
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 6 / 21
The System Features 
Tweets downloading 
Idea 
Using four lexicons extracted from training data for each class (pos, neg, subj, obj) 
Probability assigned to each token: 
P(tjci ) = 
#t + 1 
#toti + jVj 
(1) 
Rank tokens in descending order according to the Kullback-Leibler divergence (KLD): 
KLD = P(tjcs )  log 
P(tjcs ) 
P(tjco) 
(2) 
Top terms (50) in the rank for each class are used as seeds for downloading the same 
number of tweets for each lexicon 
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 7 / 21
The System Features 
Keywords and Micro-blogging features 
Keywords 
 exploit tokens occurring in the tweets (unigrams) 
 replace the user mentions, URLs and hashtags with three metatokens:  USER , 
 URL  and  TAG . 
Micro-blogging 
 the use of upper case and character repetitions 
 positive and negative emoticons 
 informal expressions of laughters (i.e., sequences of ah) 
 the presence of exclamation and interrogative marks 
 occurrences of adversative words, disjunctive words, conclusive words and 
explicative words 
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 8 / 21
Learning 
Learning step 
 Constrained run, only the provided training data can be used, but lexicons are 
allowed: extract the features from the training data and run the learning algorithm 
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 9 / 21
Learning 
Learning step 
 Constrained run, only the provided training data can be used, but lexicons are 
allowed: extract the features from the training data and run the learning algorithm 
 Unconstrained run, additional training data can be included: investigate a 
co-training approach to automatically add new examples to the training set 
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 9 / 21
System Setup 
System Setup: overview 
 Learning algorithm 
 SVM with the RBF kernel 
 C=4 selected after a 10-folds validation on training data 
 total number of features: 12,117 
 Completely developed in JAVA 
 Weka library is adopted for learning 
 Tweets are tokenized using Twitter NLP and Part-of-Speech Tagging 
 Word space: download 10 million tweets using the Twitter Streaming API and 
build the space using word2vec 
 continuous Bag-of-Words Model (CBOW) 
 200 vector dimensions 
 remove the terms with less than ten occurrences (about 200,000 terms 
overall) 
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 10 / 21
Evaluation 
Evaluation 
Evaluating systems on their ability in 
 Task A: decide whether a given tweet is subjective or objective 
 Task B: decide the tweet polarity with respect to four classes: positive, negative, 
neutral and mixed sentiment (both positive and negative) 
Dataset 
 Training: 4,513 manually annotated tweets (495 tweets are not available for the 
download and are removed) 
 Test: 1,935 manually annotated tweets (1,748 available at the time of the 
evaluation) 
 Metrics: systems are compared against the gold standard in terms of F measure 
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 11 / 21
Evaluation Results 
Results 
Setting Task F Rank Imp. 
baseline 
Task A 0.4005 - - 
Task B 0.3718 - - 
constrained 
Task A 0.7140 1 78% 
Task B 0.6771 1 82% 
unconstrained 
Task A 0.6892 2 72% 
Task B 0.6638 1 79% 
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 12 / 21
Evaluation Discussion 
Results Discussion 
 The system always obtains the best task performance in all settings 
 Co-training approach seems to introduce noise 
 Deep error analysis 
 the co-training system slightly improves the performance in classifying 
positive tweets 
 bias in our classi
er due to the domain-speci
c lexicon about political 
topics (governo, Monti, crisi) 
 classi
er could be improved by enriching our lexicon with jargon and 
idiomatic expressions 
 ablation test: investigate the predictive power of the features in our 
model 
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 13 / 21

Contenu connexe

Similaire à UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

0.0 sds course introduction vezzoli 13-14
0.0 sds course introduction vezzoli 13-140.0 sds course introduction vezzoli 13-14
0.0 sds course introduction vezzoli 13-14LeNS_slide
 
Building a suite of online resources to support academic vocabulary learning
Building a suite of online resources to support academic vocabulary learningBuilding a suite of online resources to support academic vocabulary learning
Building a suite of online resources to support academic vocabulary learningStefania Spina
 
Ongoing Analysis and Reflection Action research methodology can pr.docx
Ongoing Analysis and Reflection Action research methodology can pr.docxOngoing Analysis and Reflection Action research methodology can pr.docx
Ongoing Analysis and Reflection Action research methodology can pr.docxkarlacauq0
 
Ongoing Analysis and ReflectionAction research methodology can p.docx
Ongoing Analysis and ReflectionAction research methodology can p.docxOngoing Analysis and ReflectionAction research methodology can p.docx
Ongoing Analysis and ReflectionAction research methodology can p.docxhopeaustin33688
 
Engaging the ELearner: Weapons of Mass Instruction 2.0
Engaging the ELearner: Weapons of Mass Instruction 2.0Engaging the ELearner: Weapons of Mass Instruction 2.0
Engaging the ELearner: Weapons of Mass Instruction 2.0coachfeliciab
 
Measuring User Experience: Case Study Student Centered e-Learning Environment
Measuring User Experience: Case Study Student Centered e-Learning EnvironmentMeasuring User Experience: Case Study Student Centered e-Learning Environment
Measuring User Experience: Case Study Student Centered e-Learning EnvironmentWorld Information Architecture Day 2016
 
Slides, SURF presentation, nj&ct, draft5, 14 mar2011
Slides, SURF presentation, nj&ct, draft5, 14 mar2011Slides, SURF presentation, nj&ct, draft5, 14 mar2011
Slides, SURF presentation, nj&ct, draft5, 14 mar2011Nick Jankowski
 
Goal-based Recommendation utilizing Latent Dirichlet Allocation
Goal-based Recommendation utilizing Latent Dirichlet AllocationGoal-based Recommendation utilizing Latent Dirichlet Allocation
Goal-based Recommendation utilizing Latent Dirichlet AllocationSebastien Louvigne
 
PhD Thesis: Operationalization of Collaborative Blended Learning Scripts
PhD Thesis: Operationalization of Collaborative Blended Learning ScriptsPhD Thesis: Operationalization of Collaborative Blended Learning Scripts
PhD Thesis: Operationalization of Collaborative Blended Learning ScriptsMar Pérez-Sanagustín
 
Webinar Online Learning Myths & Engaging (Distance) Learners!
Webinar Online Learning Myths & Engaging (Distance) Learners!Webinar Online Learning Myths & Engaging (Distance) Learners!
Webinar Online Learning Myths & Engaging (Distance) Learners!Sara Valla
 
Learning Analytics Dashboards
Learning Analytics DashboardsLearning Analytics Dashboards
Learning Analytics DashboardsSten Govaerts
 
Using iPads in the Classroom for Children with Special Needs
Using iPads in the Classroom for Children with Special NeedsUsing iPads in the Classroom for Children with Special Needs
Using iPads in the Classroom for Children with Special NeedsEric Sailers
 
Maximising the potential of IVLE: A showcase of good practices
Maximising the potential of IVLE: A showcase of good practicesMaximising the potential of IVLE: A showcase of good practices
Maximising the potential of IVLE: A showcase of good practicesCIT, NUS
 
Making Thinking Visible with Digital Resources
Making Thinking Visible with Digital ResourcesMaking Thinking Visible with Digital Resources
Making Thinking Visible with Digital ResourcesDiana Beabout
 
Promise 2011: "Are Change Metrics Good Predictors for an Evolving Software Pr...
Promise 2011: "Are Change Metrics Good Predictors for an Evolving Software Pr...Promise 2011: "Are Change Metrics Good Predictors for an Evolving Software Pr...
Promise 2011: "Are Change Metrics Good Predictors for an Evolving Software Pr...CS, NcState
 

Similaire à UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features (20)

0.0 sds course introduction vezzoli 13-14
0.0 sds course introduction vezzoli 13-140.0 sds course introduction vezzoli 13-14
0.0 sds course introduction vezzoli 13-14
 
Building a suite of online resources to support academic vocabulary learning
Building a suite of online resources to support academic vocabulary learningBuilding a suite of online resources to support academic vocabulary learning
Building a suite of online resources to support academic vocabulary learning
 
Ongoing Analysis and Reflection Action research methodology can pr.docx
Ongoing Analysis and Reflection Action research methodology can pr.docxOngoing Analysis and Reflection Action research methodology can pr.docx
Ongoing Analysis and Reflection Action research methodology can pr.docx
 
Ongoing Analysis and ReflectionAction research methodology can p.docx
Ongoing Analysis and ReflectionAction research methodology can p.docxOngoing Analysis and ReflectionAction research methodology can p.docx
Ongoing Analysis and ReflectionAction research methodology can p.docx
 
Engaging the ELearner: Weapons of Mass Instruction 2.0
Engaging the ELearner: Weapons of Mass Instruction 2.0Engaging the ELearner: Weapons of Mass Instruction 2.0
Engaging the ELearner: Weapons of Mass Instruction 2.0
 
Measuring User Experience: Case Study Student Centered e-Learning Environment
Measuring User Experience: Case Study Student Centered e-Learning EnvironmentMeasuring User Experience: Case Study Student Centered e-Learning Environment
Measuring User Experience: Case Study Student Centered e-Learning Environment
 
CV_academic
CV_academicCV_academic
CV_academic
 
Slides, SURF presentation, nj&ct, draft5, 14 mar2011
Slides, SURF presentation, nj&ct, draft5, 14 mar2011Slides, SURF presentation, nj&ct, draft5, 14 mar2011
Slides, SURF presentation, nj&ct, draft5, 14 mar2011
 
Goal-based Recommendation utilizing Latent Dirichlet Allocation
Goal-based Recommendation utilizing Latent Dirichlet AllocationGoal-based Recommendation utilizing Latent Dirichlet Allocation
Goal-based Recommendation utilizing Latent Dirichlet Allocation
 
PhD Thesis: Operationalization of Collaborative Blended Learning Scripts
PhD Thesis: Operationalization of Collaborative Blended Learning ScriptsPhD Thesis: Operationalization of Collaborative Blended Learning Scripts
PhD Thesis: Operationalization of Collaborative Blended Learning Scripts
 
Webinar Online Learning Myths & Engaging (Distance) Learners!
Webinar Online Learning Myths & Engaging (Distance) Learners!Webinar Online Learning Myths & Engaging (Distance) Learners!
Webinar Online Learning Myths & Engaging (Distance) Learners!
 
Learning Analytics Dashboards
Learning Analytics DashboardsLearning Analytics Dashboards
Learning Analytics Dashboards
 
Resume_wp_UXD_pabloq
Resume_wp_UXD_pabloqResume_wp_UXD_pabloq
Resume_wp_UXD_pabloq
 
Library orientation journal paper
Library orientation journal paper Library orientation journal paper
Library orientation journal paper
 
Cmc 2 27
Cmc 2 27Cmc 2 27
Cmc 2 27
 
townhall
townhalltownhall
townhall
 
Using iPads in the Classroom for Children with Special Needs
Using iPads in the Classroom for Children with Special NeedsUsing iPads in the Classroom for Children with Special Needs
Using iPads in the Classroom for Children with Special Needs
 
Maximising the potential of IVLE: A showcase of good practices
Maximising the potential of IVLE: A showcase of good practicesMaximising the potential of IVLE: A showcase of good practices
Maximising the potential of IVLE: A showcase of good practices
 
Making Thinking Visible with Digital Resources
Making Thinking Visible with Digital ResourcesMaking Thinking Visible with Digital Resources
Making Thinking Visible with Digital Resources
 
Promise 2011: "Are Change Metrics Good Predictors for an Evolving Software Pr...
Promise 2011: "Are Change Metrics Good Predictors for an Evolving Software Pr...Promise 2011: "Are Change Metrics Good Predictors for an Evolving Software Pr...
Promise 2011: "Are Change Metrics Good Predictors for an Evolving Software Pr...
 

Plus de Nicole Novielli

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Towards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software DevelopersTowards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software DevelopersNicole Novielli
 
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open ChallengesKeynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open ChallengesNicole Novielli
 
To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisTo Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisNicole Novielli
 
Emotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost SensorsEmotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost SensorsNicole Novielli
 
Evalita2018 iListen - itaLIan Speech acT labEliNg
Evalita2018 iListen - itaLIan Speech acT labEliNgEvalita2018 iListen - itaLIan Speech acT labEliNg
Evalita2018 iListen - itaLIan Speech acT labEliNgNicole Novielli
 
A Benchmark Study on Sentiment Analysis for Software Engineering Research
A Benchmark Study on Sentiment Analysis for Software Engineering ResearchA Benchmark Study on Sentiment Analysis for Software Engineering Research
A Benchmark Study on Sentiment Analysis for Software Engineering ResearchNicole Novielli
 
The Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemThe Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemNicole Novielli
 
Deep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment AnalysisDeep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment AnalysisNicole Novielli
 
Towards Discovering the Role of Emotions in Stack Overflow
Towards Discovering the Role of Emotions in Stack OverflowTowards Discovering the Role of Emotions in Stack Overflow
Towards Discovering the Role of Emotions in Stack OverflowNicole Novielli
 
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...Nicole Novielli
 
Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...Nicole Novielli
 

Plus de Nicole Novielli (12)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Towards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software DevelopersTowards Supporting Emotion Awareness of Software Developers
Towards Supporting Emotion Awareness of Software Developers
 
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open ChallengesKeynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
Keynote@QUATIC - Recognizing Developer's Emotions: Advances and Open Challenges
 
To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment AnalysisTo Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
To Label or Not? Advances and Open Challenges in SE-specific Sentiment Analysis
 
Emotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost SensorsEmotion Detection Using Noninvasive Low-cost Sensors
Emotion Detection Using Noninvasive Low-cost Sensors
 
Evalita2018 iListen - itaLIan Speech acT labEliNg
Evalita2018 iListen - itaLIan Speech acT labEliNgEvalita2018 iListen - itaLIan Speech acT labEliNg
Evalita2018 iListen - itaLIan Speech acT labEliNg
 
A Benchmark Study on Sentiment Analysis for Software Engineering Research
A Benchmark Study on Sentiment Analysis for Software Engineering ResearchA Benchmark Study on Sentiment Analysis for Software Engineering Research
A Benchmark Study on Sentiment Analysis for Software Engineering Research
 
The Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemThe Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer Ecosystem
 
Deep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment AnalysisDeep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment Analysis
 
Towards Discovering the Role of Emotions in Stack Overflow
Towards Discovering the Role of Emotions in Stack OverflowTowards Discovering the Role of Emotions in Stack Overflow
Towards Discovering the Role of Emotions in Stack Overflow
 
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
A Preliminary Investigation of the Effect of Social Media on Affective Trust ...
 
Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...Social Network Analysis for Global Software Engineering: Exploring relationsh...
Social Network Analysis for Global Software Engineering: Exploring relationsh...
 

Dernier

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 

Dernier (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

  • 1. UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features Pierpaolo Basile and Nicole Novielli pierpaolo.basile@uniba.it, nicole.novielli@uniba.it Department of Computer Science University of Bari Aldo Moro (ITALY) EVALITA 2014 Pisa, 11th December 2014 Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 1 / 21
  • 2. Introduction The Task Task Overview EVALITA 2014 SENTIment POLarity Classi
  • 3. cation (SENTIPOLC) Subtask A - Subjectivity Classi
  • 4. cation: classify subjectivity/objectivity of the tweet's content La Yamaha e buona solo ad alzar polemiche. E la Honda a crearne. ! subj Domani sera a Palazzo Chigi incontro informale tra Mario Monti e parti sociali: I leader di Cgil, Cisl, Uil e Ug... http: // bit. ly/ rNWoB0 ! obj Subtask B - Polarity Classi
  • 5. cation: classify tweet's polarity: positive, negative, neutral and tweets expressing both positive and negative sentiment #Maroni La politica del governo Monti e sbagliata ! neg Coma iniziare un buon sabato? Guardandosi South Park il
  • 6. lm! ! pos Pilot Task, Irony Detection governo #monti : piu equita o piu equitalia? Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 2 / 21
  • 7. The System Our Approach Our Approach Supervised approach relying on three kinds of features Keywords and micro-blogging properties of tweets Content representation in a distributional semantic model Sentiment lexicon Our contribution Represent both the tweets and the polarity classes in the word space Automatically develop a sentiment lexicon for the Italian starting form SentiWordNet Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 3 / 21
  • 8. The System Features Building a Sentiment Lexicon for Italian Automatic development of a sentiment lexicon starting form SentiWordNet Lexicon-based features features based on sentiment scores of tokens sentiment variation in the tweet Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 4 / 21
  • 9. The System Features Building the Word Space from Unlabeled Tweets Homogenous representation of both tweets and polarity classes in the word space Distributional features tweet vector !t as semantic feature in training our classi
  • 10. ers similarity between !t and each prototype vector Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 5 / 21
  • 11. The System Features Building the Word Space from Unlabeled Tweets Homogenous representation of both the tweets and the polarity classes in the word space Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 6 / 21
  • 12. The System Features Tweets downloading Idea Using four lexicons extracted from training data for each class (pos, neg, subj, obj) Probability assigned to each token: P(tjci ) = #t + 1 #toti + jVj (1) Rank tokens in descending order according to the Kullback-Leibler divergence (KLD): KLD = P(tjcs ) log P(tjcs ) P(tjco) (2) Top terms (50) in the rank for each class are used as seeds for downloading the same number of tweets for each lexicon Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 7 / 21
  • 13. The System Features Keywords and Micro-blogging features Keywords exploit tokens occurring in the tweets (unigrams) replace the user mentions, URLs and hashtags with three metatokens: USER , URL and TAG . Micro-blogging the use of upper case and character repetitions positive and negative emoticons informal expressions of laughters (i.e., sequences of ah) the presence of exclamation and interrogative marks occurrences of adversative words, disjunctive words, conclusive words and explicative words Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 8 / 21
  • 14. Learning Learning step Constrained run, only the provided training data can be used, but lexicons are allowed: extract the features from the training data and run the learning algorithm Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 9 / 21
  • 15. Learning Learning step Constrained run, only the provided training data can be used, but lexicons are allowed: extract the features from the training data and run the learning algorithm Unconstrained run, additional training data can be included: investigate a co-training approach to automatically add new examples to the training set Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 9 / 21
  • 16. System Setup System Setup: overview Learning algorithm SVM with the RBF kernel C=4 selected after a 10-folds validation on training data total number of features: 12,117 Completely developed in JAVA Weka library is adopted for learning Tweets are tokenized using Twitter NLP and Part-of-Speech Tagging Word space: download 10 million tweets using the Twitter Streaming API and build the space using word2vec continuous Bag-of-Words Model (CBOW) 200 vector dimensions remove the terms with less than ten occurrences (about 200,000 terms overall) Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 10 / 21
  • 17. Evaluation Evaluation Evaluating systems on their ability in Task A: decide whether a given tweet is subjective or objective Task B: decide the tweet polarity with respect to four classes: positive, negative, neutral and mixed sentiment (both positive and negative) Dataset Training: 4,513 manually annotated tweets (495 tweets are not available for the download and are removed) Test: 1,935 manually annotated tweets (1,748 available at the time of the evaluation) Metrics: systems are compared against the gold standard in terms of F measure Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 11 / 21
  • 18. Evaluation Results Results Setting Task F Rank Imp. baseline Task A 0.4005 - - Task B 0.3718 - - constrained Task A 0.7140 1 78% Task B 0.6771 1 82% unconstrained Task A 0.6892 2 72% Task B 0.6638 1 79% Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 12 / 21
  • 19. Evaluation Discussion Results Discussion The system always obtains the best task performance in all settings Co-training approach seems to introduce noise Deep error analysis the co-training system slightly improves the performance in classifying positive tweets bias in our classi
  • 20. er due to the domain-speci
  • 21. c lexicon about political topics (governo, Monti, crisi) classi
  • 22. er could be improved by enriching our lexicon with jargon and idiomatic expressions ablation test: investigate the predictive power of the features in our model Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 13 / 21
  • 23. Evaluation Ablation test Ablation Test Decrease of F by removing each feature group, compared to the complete feature setting Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 14 / 21
  • 24. Conclusions and Future Work Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 15 / 21
  • 25. Conclusions and Future Work Conclusions and Future Work Conclusions The combination of keyword/micro-blogging, semantic, and lexicon features results in best performance Semantic features (distributional approach) have more predictive power Future Work Validate and generalize our
  • 26. ndings: further data and languages Fix some issues in co-training approach Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 16 / 21
  • 27. Thank you! Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 17 / 21
  • 28. For Further Reading For Further Reading I Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proc. of the 23rd Intl Conf. on Computational Linguistics: Posters, COLING '10, pp. 36{44. Association for Computational Linguistics, Stroudsburg, PA, USA (2010) Basile, V., Bolioli, A., Nissim, M., Patti, V., Rosso, P.: Overview of the Evalita 2014 SENTIment POLarity Classi
  • 29. cation Task. In: Proc. of EVALITA 2014. Pisa, Italy (2014) Basile, V., Nissim, M.: Sentiment analysis on italian tweets. In: Proc. of WASSA 2013, pp. 100{107 (2013) Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using twitter hashtags and smileys. In: Proc. of the 23rd Intl Conf. on Computational Linguistics: Posters, COLING '10, pp. 241{249. Association for Computational Linguistics, Stroudsburg, PA, USA (2010) Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 18 / 21
  • 30. For Further Reading For Further Reading II Esuli, A., Sebastiani, F.: Sentiwordnet: A publicly available lexical resource for opinion mining. In: Proc. of LREC, pp. 417{422 (2006) Go, A., Bhayani, R., Huang, L.: Twitter sentiment classi
  • 31. cation using distant supervision. Processing pp. 1{6 (2009) Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A.: Twitter power: Tweets as electronic word of mouth. J. Am. Soc. Inf. Sci. Technol. 60(11), 2169{2188 (2009) Kouloumpis, E., Wilson, T., Moore, J.D.: Twitter sentiment analysis: The good the bad and the omg! In: Proc. of ICWSM 2011, pp. 538{541 (2011) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Ecient estimation of word representations in vector space. In: Proc. of ICLR Work. (2013) Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 19 / 21
  • 32. For Further Reading For Further Reading III O'Connor, B., Balasubramanyan, R., Routledge, B., Smith, N.: From tweets to polls: Linking text sentiment to public opinion time series. In: Intl AAAI Conf. on Weblogs and Social Media (ICWSM), vol. 11, pp. 122{129 (2010) Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Proc. of the Seventh Intl Conf. on Language Resources and Evaluation (LREC'10) (2010) Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1-2), 1{135 (2008) Pianta, E., Bentivogli, L., Girardi, C.: Multiwordnet: developing an aligned multilingual database. In: Proc. 1st Intl Conf. on Global WordNet, pp. 293{302 (2002) Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 20 / 21
  • 33. For Further Reading For Further Reading IV Smolensky, P.: Tensor product variable binding and the representation of symbolic structures in connectionist systems. Arti
  • 34. cial Intelligence 46(1-2), 159{216 (1990) Vanzo, A., Croce, D., Basili, R.: A context-based model for sentiment analysis in twitter. In: Proc. of COLING 2014 (2014) Wilson, T., Wiebe, J., Homann, P.: Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Comput. Linguist. 35(3), 399{433 (2009) Zanchetta, E., Baroni, M.: Morph-it!: a free corpus-based morphological resource for the italian language. Proc. of the Corpus Linguistics Conf. 2005 (2005) Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec. '14 21 / 21