SlideShare une entreprise Scribd logo
1  sur  24
Entity-oriented sentiment analysis
of tweets: results and problems
Natalia Loukachevitch
Lomonosov Moscow State University
Yuliya Rubtsova
A.P. Ershov Institute of Informatics Systems
Entity-Oriented analysis of tweets:
reputation monitoring
Sentiment
Analysis
In
general
Entity-
oriented
SentiRuEval 2014-2015
Aspect-oriented
analysis of reviews
• Restaurants
• Cars
Entity-Oriented analysis
of tweets: reputation
monitoring
• Banks [8]
• Telecom companies [7]
Testing of sentiment analysis systems
of Russian texts
SentiRuEval: Entity-Oriented
analysis of tweets
Reputation-oriented tweet may express
Task: to determine sentiment towards the mentioned
company
Participation
9 participants 33 runs
positive or negative opinion
about a company
positive or negative fact
concerning a company
SentiRuEval: Entity-Oriented
analysis of tweets
Training collection
5000 banking tweets
5000 telecom tweets
Test collection
4549 banking tweets
3845 telecom tweets
December
2013
February
2014
July
2014
August
2014
Test collection Train collection
Expert annotation
• Tweet considered as neutral
0
• Positive fact or opinion
1
• Negative fact or opinion
-1
• Positive and negative sentiments in
the same tweet
+-
• Meaningless
--
Annotation problem
Test data were annotated using the voting scheme
Agreement between 2 or 3 annotators
The number of
tweets with the
same labels from
at least 2 assessors
Full agreement The final
number
of tweets in the
test collection
Telecom 4 503 (90.06%) 2 233 (44.66%) 3 845
Banks 4 915 (98.3%) 3 818 (76.36%) 4 549
Distribution of messages in collections
according to sentiment classes
2397
973
1667
2816
413
944
Neutral Positive Negative
Telecom Training collecion
Gold standard test
collection
3569
410
2138
3592
350
670
Neutral Positive Negative
Banks Training collecion
Gold standard test
collection
Quality measure
macro-average F-measure:
F-measure of the
positive class
F-measure of the
negative class
+
2
ignored F-measure of neutral class
this does not reduce the task to the two-class prediction
Additionally micro-average F-measures were
calculated for two sentiment classes
Results
Run id Macro F Micro F
Baseline 0.1823 0.337
2 0.4882 0.5355
3 0.4804 0.5094
4 0.467 0.506
Run id Macro F Micro F
Baseline 0.1267 0.2377
4 0.3598 0.343
10 0.352 0.337
2 0.3354 0.3656
Top 3 results for telecom
tweets
Top 3 results for bank
tweets
Manual labeling of participant for telecom domain
Macro-F – 0.703
Micro-F – 0.7487
Classification methods
•lemmas and syntactic links presented as triples (head word,
dependent word, type of relation)
2
•rule-based approach accounting syntactic relations between
sentiment words and the target entities
3
•maximum entropy method on the basis of word n-grams, symbol n-
grams, and topic modeling results.
4
•word n-grams, letter n-grams, emoticons, punctuation marks,
smilies, a manual sentiment vocabulary, and automatically
generated sentiment list based on (PMI) of a word occurrences in
positive or negative training subsets.
10
Classification methods
SVM + syntactic relations
Linguistic syntax-based pattern (without
machine learning)
Maxent, SVM using various features
Explaining the difference in the
perfomance in two domains
Best results in banking and telecom domains are
different: 0.36 vs. 0.488
Difference between training and test collections:
Kullback-Leibler divergence
Explaining the difference in the
performance in two domains
The topics of reputation-oriented tweets greatly
depend on positive or negative events with
the regard of the target entities
Problems of reputation
analysis of tweets
In any moment some events influencing reputation can
occur => absence in training data
Test collections. December 2013-
February 2014. Ukraine events did
not influence target entities
Train collections in both domains.
July-August 2014 after Ukraine
events 2013-2014 Sanctions
against banks. Problems with
communication in Crimea
Analyzing difficult tweets
71 tweets in the
banking domain
wrongly classified by all
participants
85 tweets in the
telecom domain
difficult for almost all
participants (maximum 2
systems were correct)
First group. 1.1
Contains evident sentiment words
(such as понравиться – to like)
that were absent in the training set
General vocabulary of
Russian sentiment words could help
First group. 1.2
Contains words expressing well-known positive
or negative situations such as theft or murder
but absent in the training collection
General vocabulary of connotative
words would be useful
First group. 1.3
Tweets contains words and phrases describing
current events, concerning the current news
flow
Parallel analysis of the current news, revealing
correlations between tweet words and general
sentiment and connotation vocabularies in
news texts
Second group
Misclassified tweets includes
tweets that are really complicated
Mention more than one entity with
different attitudes
Several sentiment words with different
polarity orientation
Contain irony
vocabularies M-L
framework
30% Tweet in
Bank collection
15% Tweet in
Telecom collection
Were systems entity-oriented?
Test tweets mentioning two or more entities
• 58 tweets in the banking domain (15 tweets with different
polarity labels),
• 232 tweets in the telecom domain (71 tweets with
different polarity labels)
3 of 9 participants considered the task as
entity-oriented one
• Other participants always assigned the same polarity
class to all entities mentioned in a tweet
Performance
• Worse than for all tweets on average
• Entity-oriented approaches did not achieve better
results
Conclusion
We described the tasks, approaches and results in
SentiRuEval testing
– High dependence from train collections
– High impact from current dramatic events
– Capability to do entity-oriented analysis is quite restricted
– large impact for improving results can be based on
integration of a general sentiment vocabulary and a
general vocabulary of connotative words
– The most participants solved the general task of tweet
classification;
– Entity-oriented approaches did not achieve better results.
All prepared materials are accessible for research purposes
http://goo.gl/qHeAVo
Thank you!
You can help us to assess
tweets for SentiRuEval-2016
http://sentimeter.ru/assess/texts/
Yuliya Rubtsova

Contenu connexe

Tendances

Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering ShowcaseTucker Truesdale
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitterprnk08
 
Unsupervised Word Usage Similarity in Social Media Texts
Unsupervised Word Usage Similarity in Social Media TextsUnsupervised Word Usage Similarity in Social Media Texts
Unsupervised Word Usage Similarity in Social Media TextsSpandana Gella
 
Analysis of sms feedback and online feedback using sentiment analysis for ass...
Analysis of sms feedback and online feedback using sentiment analysis for ass...Analysis of sms feedback and online feedback using sentiment analysis for ass...
Analysis of sms feedback and online feedback using sentiment analysis for ass...eSAT Journals
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEEFINALYEARSTUDENTPROJECTS
 
Social Media Sentiments Analysis
Social Media Sentiments AnalysisSocial Media Sentiments Analysis
Social Media Sentiments AnalysisPratisthaSingh5
 
Towards Validating Social Network Simulations
Towards Validating Social Network SimulationsTowards Validating Social Network Simulations
Towards Validating Social Network SimulationsBruce Edmonds
 
Generic polling pres.
Generic polling pres.Generic polling pres.
Generic polling pres.Mike Walsh
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
 
Customer insight workshop c steve postlethwaite and sam hepenstal
Customer insight workshop c steve postlethwaite and sam hepenstalCustomer insight workshop c steve postlethwaite and sam hepenstal
Customer insight workshop c steve postlethwaite and sam hepenstalRichard Greening
 
Web version polling
Web version pollingWeb version polling
Web version pollingMike Walsh
 
Semantics-based Graph Approach to Complex Question-Answering
Semantics-based Graph Approach to Complex Question-AnsweringSemantics-based Graph Approach to Complex Question-Answering
Semantics-based Graph Approach to Complex Question-AnsweringJinho Choi
 
Sentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveySentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveyIJERA Editor
 
Conceptual Sentiment Analysis Model
Conceptual Sentiment Analysis Model Conceptual Sentiment Analysis Model
Conceptual Sentiment Analysis Model IJECEIAES
 
A data mining tool for the detection of suicide in social networks
A data mining tool for the detection of suicide in social networksA data mining tool for the detection of suicide in social networks
A data mining tool for the detection of suicide in social networksYassine Bensaoucha
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemGan Keng Hoon
 
Evolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelEvolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelIJERA Editor
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysisSunil Kandari
 
Mj0014 online journalism
Mj0014  online journalismMj0014  online journalism
Mj0014 online journalismsmumbahelp
 

Tendances (20)

Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering Showcase
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Unsupervised Word Usage Similarity in Social Media Texts
Unsupervised Word Usage Similarity in Social Media TextsUnsupervised Word Usage Similarity in Social Media Texts
Unsupervised Word Usage Similarity in Social Media Texts
 
Analysis of sms feedback and online feedback using sentiment analysis for ass...
Analysis of sms feedback and online feedback using sentiment analysis for ass...Analysis of sms feedback and online feedback using sentiment analysis for ass...
Analysis of sms feedback and online feedback using sentiment analysis for ass...
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
 
Social Media Sentiments Analysis
Social Media Sentiments AnalysisSocial Media Sentiments Analysis
Social Media Sentiments Analysis
 
Towards Validating Social Network Simulations
Towards Validating Social Network SimulationsTowards Validating Social Network Simulations
Towards Validating Social Network Simulations
 
Generic polling pres.
Generic polling pres.Generic polling pres.
Generic polling pres.
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Customer insight workshop c steve postlethwaite and sam hepenstal
Customer insight workshop c steve postlethwaite and sam hepenstalCustomer insight workshop c steve postlethwaite and sam hepenstal
Customer insight workshop c steve postlethwaite and sam hepenstal
 
Web version polling
Web version pollingWeb version polling
Web version polling
 
Semantics-based Graph Approach to Complex Question-Answering
Semantics-based Graph Approach to Complex Question-AnsweringSemantics-based Graph Approach to Complex Question-Answering
Semantics-based Graph Approach to Complex Question-Answering
 
Sentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveySentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A Survey
 
Conceptual Sentiment Analysis Model
Conceptual Sentiment Analysis Model Conceptual Sentiment Analysis Model
Conceptual Sentiment Analysis Model
 
A data mining tool for the detection of suicide in social networks
A data mining tool for the detection of suicide in social networksA data mining tool for the detection of suicide in social networks
A data mining tool for the detection of suicide in social networks
 
Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support System
 
Evolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelEvolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability Model
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 
Mj0014 online journalism
Mj0014  online journalismMj0014  online journalism
Mj0014 online journalism
 

Similaire à Entity-oriented sentiment analysis of tweets: results and problems

Mining public opinion about economic issues
Mining public opinion about economic issuesMining public opinion about economic issues
Mining public opinion about economic issuesIvan Abboud
 
IRJET- Product Aspect Ranking
IRJET-  	  Product Aspect RankingIRJET-  	  Product Aspect Ranking
IRJET- Product Aspect RankingIRJET Journal
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
IRJET- Analysis of Question and Answering Recommendation System
IRJET-  	  Analysis of Question and Answering Recommendation SystemIRJET-  	  Analysis of Question and Answering Recommendation System
IRJET- Analysis of Question and Answering Recommendation SystemIRJET Journal
 
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNINGTHE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNINGIRJET Journal
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
 
IRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA TechniqueIRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA TechniqueIRJET Journal
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...IRJET Journal
 
IRJET- Detection of Ranking Fraud in Mobile Applications
IRJET-  	  Detection of Ranking Fraud in Mobile ApplicationsIRJET-  	  Detection of Ranking Fraud in Mobile Applications
IRJET- Detection of Ranking Fraud in Mobile ApplicationsIRJET Journal
 
Major presentation
Major presentationMajor presentation
Major presentationPS241092
 
Evaluating Collaborative Filtering Recommender Systems
Evaluating Collaborative Filtering Recommender SystemsEvaluating Collaborative Filtering Recommender Systems
Evaluating Collaborative Filtering Recommender SystemsMegaVjohnson
 
Twitter sentiment analysis.pptx
Twitter sentiment analysis.pptxTwitter sentiment analysis.pptx
Twitter sentiment analysis.pptxRishita Gupta
 
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...Shakas Technologies
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET Journal
 
Vol 7 No 1 - November 2013
Vol 7 No 1 - November 2013Vol 7 No 1 - November 2013
Vol 7 No 1 - November 2013ijcsbi
 
Business Research Methods, Ch. 19New Message · Chapter 19 C.docx
Business Research Methods, Ch. 19New Message · Chapter 19 C.docxBusiness Research Methods, Ch. 19New Message · Chapter 19 C.docx
Business Research Methods, Ch. 19New Message · Chapter 19 C.docxhumphrieskalyn
 
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNINGA STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNINGIRJET Journal
 

Similaire à Entity-oriented sentiment analysis of tweets: results and problems (20)

Mining public opinion about economic issues
Mining public opinion about economic issuesMining public opinion about economic issues
Mining public opinion about economic issues
 
Final deck
Final deckFinal deck
Final deck
 
Abstract
AbstractAbstract
Abstract
 
IRJET- Product Aspect Ranking
IRJET-  	  Product Aspect RankingIRJET-  	  Product Aspect Ranking
IRJET- Product Aspect Ranking
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
IRJET- Analysis of Question and Answering Recommendation System
IRJET-  	  Analysis of Question and Answering Recommendation SystemIRJET-  	  Analysis of Question and Answering Recommendation System
IRJET- Analysis of Question and Answering Recommendation System
 
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNINGTHE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment Analysis
 
IRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA TechniqueIRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
 
Sns e wom_ks_iccsa_pham
Sns e wom_ks_iccsa_phamSns e wom_ks_iccsa_pham
Sns e wom_ks_iccsa_pham
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
 
IRJET- Detection of Ranking Fraud in Mobile Applications
IRJET-  	  Detection of Ranking Fraud in Mobile ApplicationsIRJET-  	  Detection of Ranking Fraud in Mobile Applications
IRJET- Detection of Ranking Fraud in Mobile Applications
 
Major presentation
Major presentationMajor presentation
Major presentation
 
Evaluating Collaborative Filtering Recommender Systems
Evaluating Collaborative Filtering Recommender SystemsEvaluating Collaborative Filtering Recommender Systems
Evaluating Collaborative Filtering Recommender Systems
 
Twitter sentiment analysis.pptx
Twitter sentiment analysis.pptxTwitter sentiment analysis.pptx
Twitter sentiment analysis.pptx
 
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
 
Vol 7 No 1 - November 2013
Vol 7 No 1 - November 2013Vol 7 No 1 - November 2013
Vol 7 No 1 - November 2013
 
Business Research Methods, Ch. 19New Message · Chapter 19 C.docx
Business Research Methods, Ch. 19New Message · Chapter 19 C.docxBusiness Research Methods, Ch. 19New Message · Chapter 19 C.docx
Business Research Methods, Ch. 19New Message · Chapter 19 C.docx
 
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNINGA STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
 

Plus de Yuliya Rubtsova

Как продать самолет с помощью соц.сетей или социальные сети для бизнеса
Как продать самолет с помощью соц.сетей или социальные сети для бизнесаКак продать самолет с помощью соц.сетей или социальные сети для бизнеса
Как продать самолет с помощью соц.сетей или социальные сети для бизнесаYuliya Rubtsova
 
Aspect extraction using conditional random fields [SentiRuEval]
Aspect extraction using conditional random fields [SentiRuEval]Aspect extraction using conditional random fields [SentiRuEval]
Aspect extraction using conditional random fields [SentiRuEval]Yuliya Rubtsova
 
Automatic term extraction of dynamically updated text collections for sentime...
Automatic term extraction of dynamically updated text collections for sentime...Automatic term extraction of dynamically updated text collections for sentime...
Automatic term extraction of dynamically updated text collections for sentime...Yuliya Rubtsova
 
Измеряй и властвуй или практическая web-аналитика
Измеряй и властвуй или практическая web-аналитика Измеряй и властвуй или практическая web-аналитика
Измеряй и властвуй или практическая web-аналитика Yuliya Rubtsova
 
Метод построения корпуса коротких текстов
Метод построения корпуса коротких текстовМетод построения корпуса коротких текстов
Метод построения корпуса коротких текстовYuliya Rubtsova
 
Веб аналитика на практике
Веб аналитика на практикеВеб аналитика на практике
Веб аналитика на практикеYuliya Rubtsova
 
Курс леций по основам интернет маркетинга и поисковой оптимизации
Курс леций по основам интернет маркетинга и поисковой оптимизацииКурс леций по основам интернет маркетинга и поисковой оптимизации
Курс леций по основам интернет маркетинга и поисковой оптимизацииYuliya Rubtsova
 
Web analytics в картинках и денежных знаках
Web analytics в картинках и денежных знакахWeb analytics в картинках и денежных знаках
Web analytics в картинках и денежных знакахYuliya Rubtsova
 
Продвижение мобильных приложений в AppStore и Google Play
Продвижение мобильных приложений в AppStore и Google PlayПродвижение мобильных приложений в AppStore и Google Play
Продвижение мобильных приложений в AppStore и Google PlayYuliya Rubtsova
 
Увеличение конверсии сайта
Увеличение конверсии сайтаУвеличение конверсии сайта
Увеличение конверсии сайтаYuliya Rubtsova
 
Как из посетителя сделать покупателя
Как из посетителя сделать покупателяКак из посетителя сделать покупателя
Как из посетителя сделать покупателяYuliya Rubtsova
 
Mobile applications market
Mobile applications marketMobile applications market
Mobile applications marketYuliya Rubtsova
 
Twitter marketing communications
Twitter marketing communicationsTwitter marketing communications
Twitter marketing communicationsYuliya Rubtsova
 

Plus de Yuliya Rubtsova (17)

Как продать самолет с помощью соц.сетей или социальные сети для бизнеса
Как продать самолет с помощью соц.сетей или социальные сети для бизнесаКак продать самолет с помощью соц.сетей или социальные сети для бизнеса
Как продать самолет с помощью соц.сетей или социальные сети для бизнеса
 
Aspect extraction using conditional random fields [SentiRuEval]
Aspect extraction using conditional random fields [SentiRuEval]Aspect extraction using conditional random fields [SentiRuEval]
Aspect extraction using conditional random fields [SentiRuEval]
 
Automatic term extraction of dynamically updated text collections for sentime...
Automatic term extraction of dynamically updated text collections for sentime...Automatic term extraction of dynamically updated text collections for sentime...
Automatic term extraction of dynamically updated text collections for sentime...
 
Измеряй и властвуй или практическая web-аналитика
Измеряй и властвуй или практическая web-аналитика Измеряй и властвуй или практическая web-аналитика
Измеряй и властвуй или практическая web-аналитика
 
Метод построения корпуса коротких текстов
Метод построения корпуса коротких текстовМетод построения корпуса коротких текстов
Метод построения корпуса коротких текстов
 
Веб аналитика на практике
Веб аналитика на практикеВеб аналитика на практике
Веб аналитика на практике
 
Mad analyst
Mad analyst   Mad analyst
Mad analyst
 
Курс леций по основам интернет маркетинга и поисковой оптимизации
Курс леций по основам интернет маркетинга и поисковой оптимизацииКурс леций по основам интернет маркетинга и поисковой оптимизации
Курс леций по основам интернет маркетинга и поисковой оптимизации
 
Web analytics в картинках и денежных знаках
Web analytics в картинках и денежных знакахWeb analytics в картинках и денежных знаках
Web analytics в картинках и денежных знаках
 
Продвижение мобильных приложений в AppStore и Google Play
Продвижение мобильных приложений в AppStore и Google PlayПродвижение мобильных приложений в AppStore и Google Play
Продвижение мобильных приложений в AppStore и Google Play
 
Увеличение конверсии сайта
Увеличение конверсии сайтаУвеличение конверсии сайта
Увеличение конверсии сайта
 
Как из посетителя сделать покупателя
Как из посетителя сделать покупателяКак из посетителя сделать покупателя
Как из посетителя сделать покупателя
 
Mobile applications market
Mobile applications marketMobile applications market
Mobile applications market
 
Intranet
IntranetIntranet
Intranet
 
Networking
NetworkingNetworking
Networking
 
Usability testing
Usability testingUsability testing
Usability testing
 
Twitter marketing communications
Twitter marketing communicationsTwitter marketing communications
Twitter marketing communications
 

Dernier

Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 

Dernier (20)

Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 

Entity-oriented sentiment analysis of tweets: results and problems

  • 1. Entity-oriented sentiment analysis of tweets: results and problems Natalia Loukachevitch Lomonosov Moscow State University Yuliya Rubtsova A.P. Ershov Institute of Informatics Systems
  • 2. Entity-Oriented analysis of tweets: reputation monitoring Sentiment Analysis In general Entity- oriented
  • 3. SentiRuEval 2014-2015 Aspect-oriented analysis of reviews • Restaurants • Cars Entity-Oriented analysis of tweets: reputation monitoring • Banks [8] • Telecom companies [7] Testing of sentiment analysis systems of Russian texts
  • 4. SentiRuEval: Entity-Oriented analysis of tweets Reputation-oriented tweet may express Task: to determine sentiment towards the mentioned company Participation 9 participants 33 runs positive or negative opinion about a company positive or negative fact concerning a company
  • 5. SentiRuEval: Entity-Oriented analysis of tweets Training collection 5000 banking tweets 5000 telecom tweets Test collection 4549 banking tweets 3845 telecom tweets December 2013 February 2014 July 2014 August 2014 Test collection Train collection
  • 6. Expert annotation • Tweet considered as neutral 0 • Positive fact or opinion 1 • Negative fact or opinion -1 • Positive and negative sentiments in the same tweet +- • Meaningless --
  • 7. Annotation problem Test data were annotated using the voting scheme Agreement between 2 or 3 annotators The number of tweets with the same labels from at least 2 assessors Full agreement The final number of tweets in the test collection Telecom 4 503 (90.06%) 2 233 (44.66%) 3 845 Banks 4 915 (98.3%) 3 818 (76.36%) 4 549
  • 8. Distribution of messages in collections according to sentiment classes 2397 973 1667 2816 413 944 Neutral Positive Negative Telecom Training collecion Gold standard test collection 3569 410 2138 3592 350 670 Neutral Positive Negative Banks Training collecion Gold standard test collection
  • 9. Quality measure macro-average F-measure: F-measure of the positive class F-measure of the negative class + 2 ignored F-measure of neutral class this does not reduce the task to the two-class prediction Additionally micro-average F-measures were calculated for two sentiment classes
  • 10. Results Run id Macro F Micro F Baseline 0.1823 0.337 2 0.4882 0.5355 3 0.4804 0.5094 4 0.467 0.506 Run id Macro F Micro F Baseline 0.1267 0.2377 4 0.3598 0.343 10 0.352 0.337 2 0.3354 0.3656 Top 3 results for telecom tweets Top 3 results for bank tweets Manual labeling of participant for telecom domain Macro-F – 0.703 Micro-F – 0.7487
  • 11. Classification methods •lemmas and syntactic links presented as triples (head word, dependent word, type of relation) 2 •rule-based approach accounting syntactic relations between sentiment words and the target entities 3 •maximum entropy method on the basis of word n-grams, symbol n- grams, and topic modeling results. 4 •word n-grams, letter n-grams, emoticons, punctuation marks, smilies, a manual sentiment vocabulary, and automatically generated sentiment list based on (PMI) of a word occurrences in positive or negative training subsets. 10
  • 12. Classification methods SVM + syntactic relations Linguistic syntax-based pattern (without machine learning) Maxent, SVM using various features
  • 13. Explaining the difference in the perfomance in two domains Best results in banking and telecom domains are different: 0.36 vs. 0.488 Difference between training and test collections: Kullback-Leibler divergence
  • 14. Explaining the difference in the performance in two domains The topics of reputation-oriented tweets greatly depend on positive or negative events with the regard of the target entities
  • 15. Problems of reputation analysis of tweets In any moment some events influencing reputation can occur => absence in training data Test collections. December 2013- February 2014. Ukraine events did not influence target entities Train collections in both domains. July-August 2014 after Ukraine events 2013-2014 Sanctions against banks. Problems with communication in Crimea
  • 16. Analyzing difficult tweets 71 tweets in the banking domain wrongly classified by all participants 85 tweets in the telecom domain difficult for almost all participants (maximum 2 systems were correct)
  • 17. First group. 1.1 Contains evident sentiment words (such as понравиться – to like) that were absent in the training set General vocabulary of Russian sentiment words could help
  • 18. First group. 1.2 Contains words expressing well-known positive or negative situations such as theft or murder but absent in the training collection General vocabulary of connotative words would be useful
  • 19. First group. 1.3 Tweets contains words and phrases describing current events, concerning the current news flow Parallel analysis of the current news, revealing correlations between tweet words and general sentiment and connotation vocabularies in news texts
  • 20. Second group Misclassified tweets includes tweets that are really complicated Mention more than one entity with different attitudes Several sentiment words with different polarity orientation Contain irony
  • 21. vocabularies M-L framework 30% Tweet in Bank collection 15% Tweet in Telecom collection
  • 22. Were systems entity-oriented? Test tweets mentioning two or more entities • 58 tweets in the banking domain (15 tweets with different polarity labels), • 232 tweets in the telecom domain (71 tweets with different polarity labels) 3 of 9 participants considered the task as entity-oriented one • Other participants always assigned the same polarity class to all entities mentioned in a tweet Performance • Worse than for all tweets on average • Entity-oriented approaches did not achieve better results
  • 23. Conclusion We described the tasks, approaches and results in SentiRuEval testing – High dependence from train collections – High impact from current dramatic events – Capability to do entity-oriented analysis is quite restricted – large impact for improving results can be based on integration of a general sentiment vocabulary and a general vocabulary of connotative words – The most participants solved the general task of tweet classification; – Entity-oriented approaches did not achieve better results. All prepared materials are accessible for research purposes http://goo.gl/qHeAVo
  • 24. Thank you! You can help us to assess tweets for SentiRuEval-2016 http://sentimeter.ru/assess/texts/ Yuliya Rubtsova

Notes de l'éditeur

  1. In general: sentiment of the whole document, fragment or sentence Entity-oriented Sentiment about a specific entity Politician, political party Company etc. Sentiment about specific parts or properties of an entity (aspects) Переходи в Билайн. «Все за 300» — отличный тариф!
  2. The goal of the Twitter sentiment analysis at SentiRuEval was to find tweets influencing the reputation of a company in two domains
  3. The datasets were collected with Streaming API Twitter
  4. To prepare the datasets, 20,000 messages were labeled including 5,000 messages in each domain for training and test collections Each collection was labeled at least by two assessors. The gold standard test collections were labeled by three assessors. Irrelevant or unclear messages were removed from the training and test sets.
  5. To avoid inconsistency and disputes, the voting scheme was applied to the test collections labeling
  6. We noticed that sometimes users do not want to be rude and add positive emoticons to clearly negative or ironic messages. That is why simple methods based on extraction of emoticons, which are used for classification on the whole tweet level, do not work well
  7. Main quality measure:
  8. The baselines are based on the majority reputation-oriented category (negative one in this case). one of the participants fulfilled independent expert labeling of telecom tweets which can be considered as the maximum possible performance of automated systems in this task.
  9. Most participants used the SVM classification method.
  10. Most participants used the SVM classification method.
  11. we computed the Kullback-Leibler divergence to compare the difference of word probability distributions in the test collections in relation to the training collections
  12. includes tweets that were misclassified because of the restricted size of the training collection, which did not contain appropriate training
  13. These words are usually considered as neutral, not-opinionated, but having positive or negative associations (so called connotations). For solving these problems, a general vocabulary of connotative words would be useful because the appearance of these words in connection with a company influences its reputation.
  14. Problematic tweets contains words and phrases describing current events, concerning the current news flow. The apperance of some events and their influence the company’s reputation are very difficult to predict, their mentioning will always be absent in the training collection. In this case, the parallel analysis of the current news, revealing correlations between tweet words and general sentiment and connotation vocabulaties in news texts, can help.
  15. It means that integration of various vocabularies into the machine-learning framework can improve the performance of reputation-oriented automatic systems