SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
A Descriptive Analysis of Twitter Activity Around
Boston Terror Attacks
Álvaro Cuesta David F. Barrero María D. R-Moreno
Computer Engineering Department
Universidad de Alcalá, Spain
ICCCI 2013
Craiova, Romania
September 11, 2013
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 1 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Summary
1 Introduction
Motivation
Objectives
Case studies
2 Framework
Framework overview
Framework messaging
Framework components
3 Sentiment analysis
Overview
Classifier
4 Case studies
Boston Terror Attack
Political analysis
5 Conclusions and future work
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 2 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Motivation
Objectives
Case studies
Introduction
Motivation
Great expansion of social networks in the last
years
One of the most successfull ones is Twitter
Microblogging platform
Short messages known as tweets
Open nature
Twitter offers great research opportunities
Open nature
Distributed human sensor network
Easy data extraction, difficult data
processing
Twitter + sentiment analysis
Lack of tools for sentiment analysis in
Spanish
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 3 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Motivation
Objectives
Case studies
Introduction
Objectives
Twitter offers excelent API ... however there is a need of some
infraestructure (mainly storage and reporting)
Objectives
1 Develop a framework for Twitter data extraction and analysis
2 Provide reporting tools
3 Foundation for sentiment analysis in Spanish
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 4 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Motivation
Objectives
Case studies
Introduction
Case studies
In order to assess the framework, we have included two study
cases
Event driven - Boston terror attack
Regular usage - Political activity on Twitter in Spanish
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 5 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Overview
Requirements
Easy to use, extensible, massive data processing
Design decisions
Modular design: Collection of independent scripts
Focus on open data formats
Built around the database: MongoDB
Set of independent scripts interchanging data
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 6 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Framework messaging
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 7 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Framework components: Miner
Miner
Extracts and stores
tweets
Stream API
Several filters
Written in Python
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 8 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Framework components: Database
Database
Storage for futher
processing
MongoDB
NoSQL database
High performance
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 9 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Framework components: Reporting
Reporting
CSV export for
futher processing
R processing
Extensibility
Powerful libraries
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 10 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Framework components: Sentiment analysis
Sentiment analysis
Supervised learning
Need of labeling
Tools for labeling
Classifier building
Classifier testing
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 11 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Overview
Classifier
Sentiment analysis
Overview
Supervised learning with Natural Language Toolkit (NLTK)
Three classes: “Positive”, “negative” and “neutral”
Need of labeled corpus
Several ones in English ...
... none in Spanish
Need of thousands manually classified tweets
Collaborative labeling
Web application to label tweets
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 12 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Overview
Classifier
Sentiment analysis
Classifier
Naïve Bayes classifier
Stop words removed
Some parameters to set
Optimus parameter setting depends on the dataset
Need of classifier evaluation
Tester
Cross validation
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 13 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Boston Terror Attack
Main objective
Evaluate the platform
Secondary objective
Describe activity around an event
Stream by string filter
The event
Terror attack on 15 Apr 2013 14:49 (GMT-4) in Boston
Internet witch-hunt motivated by the release of some photos
Shooting and manhunt
Data adquisition
Begin: Tue, 16 Apr 2013 00:43 (GMT)
End: Tue, 23 Apr 2013 00:43 (GMT)
Filter: “Maratón de Boston” (Boston Marathon in Spanish)
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 14 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Boston Terror Attack: Dataset description
Value Relative Average
Tweets 28,892 1.16/user
No-retweets 16,029 55.48 %
Reweets 12,863 44.52 %
Geolocalized 255 0.88 %
Users 24,989
Mentions 18,937 65.54 %
Replies 849 2.94 %
Non-replies 18,088 62.61 %
Size 96.39 MB 3.38 KB/tweet
Index size 0.91 MB
Disk 132.99 MB
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 15 / 25
Case study
Boston Terror attack: activity
Apr 17 Apr 19 Apr 21 Apr 23
010002500
Time
Tweets
Tweets
Apr 17 Apr 19 Apr 21 Apr 23
04001000
Time
Non−retweets
Tweets (excluding RTs)
Apr 17 Apr 19 Apr 21 Apr 23
04001000
Time
Retweets
Retweets
Dashed line: Bombing
Dotted line: Photo release
Solid line: Shooting
Gray background: Manhunt
Case study
Boston Terror attack: activity
Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00
50150
Time
Tweets
Tweets
Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00
2060120
Time
Non−retweets
Tweets (excluding RTs)
Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00
0204060
Time
Retweets
Retweets
Dotted line: Photo release
Solid line: Shooting
Gray background: Manhunt
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Political analysis: Overview
Main objective
Evaluate sentiment analysis
Secondary objective
Describe regular Twitter activity
Stream by user filter
Selection of Spanish political actors
Selected by activity and controversy
Account owner Accounts
Political party @PPopular, @PSOE, @iunida, @UPyD
Politician @agarzon, @EduMadina, @ToniCanto1, @Re-
villaMiguelA, @ccifuentes, @_Rubalcaba_
Journalist @jordievole, @iescolar
Activist organization @LA_PAH
Data adquisition
From Tue, 16 Apr 2013 00:00 (GMT)
End: 18 Apr 2013 04:00 (GMT)
Filter: Account name (“@account”)
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 18 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Political analysis: Dataset description
Value Relative Average
Tweets 65,043 1.9/user
No-retweets 28,175 43.32 %
Reweets 36,868 56.68 %
Geolocalized 528 0.81 %
Users 34,195
Mentions 56,713 87.19 %
Non-replies 46,981 72.23 %
Replies 9,732 14.96 %
Size 227.51 MB 3.58 KB/tweet
Index size 2.05 MB
Disk 237.95 MB
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 19 / 25
Case study
Political analysis: Activity
Tue Wed Thu
015003500
Time
Tweets
Tweets
Tue Wed Thu
05001500
Time
Non−retweets
Tweets (excluding RTs)
Tue Wed Thu
010002000
Time
Retweets
Retweets
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Political analysis: Sentiment analysis
9, 884 tweets were manually classified in a collaborative way
4, 739 non-neutral tweets
1, 062 positives, 3, 677 negatives
Unbalanced dataset
We tried several parameters for the Naïve Bayes classifier
N-grams: {1}, {2}, {3}, {1, 2}, {1, 3} and {2, 3}
Minimum score: 0, 1, 2, 3, 4, 5, 6 and 10
10-fold cross-validation
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 21 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Political analysis: Sentiment analysis
Accuracy
NaiveBayes-1_2-min3 0.8543
NaiveBayes-1-min3 0.8510
NaiveBayes-1_3-min3 0.8507
NaiveBayes-1-min4 0.8476
NaiveBayes-1_3-min5 0.8474
NaiveBayes-1_2-min4 0.8469
NaiveBayes-1_3-min4 0.8467
NaiveBayes-1_3-min1 0.8459
NaiveBayes-1-min6 0.8452
NaiveBayes-1-min1 0.8448
NaiveBayes-1_2-min5 0.8446
NaiveBayes-1_3-min6 0.8438
NaiveBayes-1_2-min6 0.8436
NaiveBayes-1-min5 0.8406
NaiveBayes-1_2-min1 0.8389
NaiveBayes-2_3-min6 0.8385
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 22 / 25
Case study
Political analysis: Normalized sentiment
Tue Wed Thu
0.00.20.40.60.81.0
Time
Positive
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Conclusions and future work
We developed a framework that eases data extraction and
analysis on Twitter
Ready for production
It will be released soon with a free licence
We briefly described two case studies
Event driven activity - Boston terror attacks
Regular activity - Political activity
Sentiment analysis is intrinsically difficult
Future work
Lemmalization
Natural language processing
Time series analysis
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 24 / 25
Thanks for your attention!
David F. Barrero
david@aut.uah.es
@dfbarrero

Contenu connexe

En vedette

Temas para tesis de diseño gráfico
Temas para tesis de diseño gráfico Temas para tesis de diseño gráfico
Temas para tesis de diseño gráfico bastiano10
 
Temas de diseño grafico
Temas de diseño graficoTemas de diseño grafico
Temas de diseño graficorapero1115
 
Memoria descriptiva aplicada el rediseño del sitio web Canaima GNU/Linux
Memoria descriptiva aplicada el rediseño del sitio web Canaima GNU/LinuxMemoria descriptiva aplicada el rediseño del sitio web Canaima GNU/Linux
Memoria descriptiva aplicada el rediseño del sitio web Canaima GNU/LinuxMaximiliano Vilchez
 
Resumen de propuestas de tesis
Resumen de propuestas de tesisResumen de propuestas de tesis
Resumen de propuestas de tesisEdgardo Vegega
 
Memoria Proyecto de Título Diseño Industrial - Referencial DuocUC
Memoria Proyecto de Título Diseño Industrial - Referencial DuocUCMemoria Proyecto de Título Diseño Industrial - Referencial DuocUC
Memoria Proyecto de Título Diseño Industrial - Referencial DuocUCRodrigo Moren Pizarro
 
Diseño Gráfico Digital en Software Libre v3.21
Diseño Gráfico Digital en Software Libre v3.21Diseño Gráfico Digital en Software Libre v3.21
Diseño Gráfico Digital en Software Libre v3.21Leonardo J. Caballero G.
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Memorias descriptivas
Memorias descriptivasMemorias descriptivas
Memorias descriptivaszonibri
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 

En vedette (10)

Memoria descriptiva maqueta sobre burgos
Memoria descriptiva maqueta sobre burgosMemoria descriptiva maqueta sobre burgos
Memoria descriptiva maqueta sobre burgos
 
Temas para tesis de diseño gráfico
Temas para tesis de diseño gráfico Temas para tesis de diseño gráfico
Temas para tesis de diseño gráfico
 
Temas de diseño grafico
Temas de diseño graficoTemas de diseño grafico
Temas de diseño grafico
 
Memoria descriptiva aplicada el rediseño del sitio web Canaima GNU/Linux
Memoria descriptiva aplicada el rediseño del sitio web Canaima GNU/LinuxMemoria descriptiva aplicada el rediseño del sitio web Canaima GNU/Linux
Memoria descriptiva aplicada el rediseño del sitio web Canaima GNU/Linux
 
Resumen de propuestas de tesis
Resumen de propuestas de tesisResumen de propuestas de tesis
Resumen de propuestas de tesis
 
Memoria Proyecto de Título Diseño Industrial - Referencial DuocUC
Memoria Proyecto de Título Diseño Industrial - Referencial DuocUCMemoria Proyecto de Título Diseño Industrial - Referencial DuocUC
Memoria Proyecto de Título Diseño Industrial - Referencial DuocUC
 
Diseño Gráfico Digital en Software Libre v3.21
Diseño Gráfico Digital en Software Libre v3.21Diseño Gráfico Digital en Software Libre v3.21
Diseño Gráfico Digital en Software Libre v3.21
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Memorias descriptivas
Memorias descriptivasMemorias descriptivas
Memorias descriptivas
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 

Similaire à Presentacion

Identification and Characterization of Events in Social Media
Identification and Characterization of Events in Social MediaIdentification and Characterization of Events in Social Media
Identification and Characterization of Events in Social MediaHila Becker
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDivyaPatel729457
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
 
Changing trends in citation analysis and challenges in API measurement
Changing trends in citation analysis and challenges in API measurementChanging trends in citation analysis and challenges in API measurement
Changing trends in citation analysis and challenges in API measurementMunesh Kumar
 
Individual project 2.20
Individual project 2.20Individual project 2.20
Individual project 2.20Monisha100
 
Credibility, Identity Resolution, Privacy, and Policing in Online Social Media
Credibility, Identity Resolution, Privacy, and Policing in Online Social MediaCredibility, Identity Resolution, Privacy, and Policing in Online Social Media
Credibility, Identity Resolution, Privacy, and Policing in Online Social MediaIIIT Hyderabad
 
Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...
Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...
Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...Sebastian Dennerlein
 
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...University of Groningen (The Netherlands)
 
Multivocal literature reviews in software engineering: preliminary findings f...
Multivocal literature reviews in software engineering: preliminary findings f...Multivocal literature reviews in software engineering: preliminary findings f...
Multivocal literature reviews in software engineering: preliminary findings f...Wylliams Santos
 
Spammer Detection and Fake User Identification on Social Networks
Spammer Detection and Fake User Identification on Social NetworksSpammer Detection and Fake User Identification on Social Networks
Spammer Detection and Fake User Identification on Social NetworksIRJET Journal
 
A Near-Real Time Application for Twitter Data Analysis
A Near-Real Time Application for Twitter Data AnalysisA Near-Real Time Application for Twitter Data Analysis
A Near-Real Time Application for Twitter Data AnalysisZina Petrushyna
 
Microblogging meets politics
Microblogging meets politicsMicroblogging meets politics
Microblogging meets politicsGabriela Grosseck
 
2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network AnalysisMarc Smith
 
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...Anthony Fisher Camilleri
 
Berlin 6 Open Access Conference: Wolfram Horstmann
Berlin 6 Open Access Conference: Wolfram HorstmannBerlin 6 Open Access Conference: Wolfram Horstmann
Berlin 6 Open Access Conference: Wolfram HorstmannCornelius Puschmann
 

Similaire à Presentacion (20)

Identification and Characterization of Events in Social Media
Identification and Characterization of Events in Social MediaIdentification and Characterization of Events in Social Media
Identification and Characterization of Events in Social Media
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptx
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
sun2021.pdf
sun2021.pdfsun2021.pdf
sun2021.pdf
 
Changing trends in citation analysis and challenges in API measurement
Changing trends in citation analysis and challenges in API measurementChanging trends in citation analysis and challenges in API measurement
Changing trends in citation analysis and challenges in API measurement
 
Individual project 2.20
Individual project 2.20Individual project 2.20
Individual project 2.20
 
Credibility, Identity Resolution, Privacy, and Policing in Online Social Media
Credibility, Identity Resolution, Privacy, and Policing in Online Social MediaCredibility, Identity Resolution, Privacy, and Policing in Online Social Media
Credibility, Identity Resolution, Privacy, and Policing in Online Social Media
 
Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...
Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...
Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...
 
Amia now! session one
Amia now! session oneAmia now! session one
Amia now! session one
 
Amia now! session one
Amia now! session oneAmia now! session one
Amia now! session one
 
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
 
Multivocal literature reviews in software engineering: preliminary findings f...
Multivocal literature reviews in software engineering: preliminary findings f...Multivocal literature reviews in software engineering: preliminary findings f...
Multivocal literature reviews in software engineering: preliminary findings f...
 
Spammer Detection and Fake User Identification on Social Networks
Spammer Detection and Fake User Identification on Social NetworksSpammer Detection and Fake User Identification on Social Networks
Spammer Detection and Fake User Identification on Social Networks
 
A Near-Real Time Application for Twitter Data Analysis
A Near-Real Time Application for Twitter Data AnalysisA Near-Real Time Application for Twitter Data Analysis
A Near-Real Time Application for Twitter Data Analysis
 
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasuresExploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
 
Microblogging meets politics
Microblogging meets politicsMicroblogging meets politics
Microblogging meets politics
 
2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis
 
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
 
Berlin 6 Open Access Conference: Wolfram Horstmann
Berlin 6 Open Access Conference: Wolfram HorstmannBerlin 6 Open Access Conference: Wolfram Horstmann
Berlin 6 Open Access Conference: Wolfram Horstmann
 
Anja Adler – Liquid Democracy-Norm, Code and Developers of Democracy beyond R...
Anja Adler – Liquid Democracy-Norm, Code and Developers of Democracy beyond R...Anja Adler – Liquid Democracy-Norm, Code and Developers of Democracy beyond R...
Anja Adler – Liquid Democracy-Norm, Code and Developers of Democracy beyond R...
 

Dernier

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 

Dernier (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 

Presentacion

  • 1. Introduction Framework Sentiment analysis Case studies Conclusions A Descriptive Analysis of Twitter Activity Around Boston Terror Attacks Álvaro Cuesta David F. Barrero María D. R-Moreno Computer Engineering Department Universidad de Alcalá, Spain ICCCI 2013 Craiova, Romania September 11, 2013 ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 1 / 25
  • 2. Introduction Framework Sentiment analysis Case studies Conclusions Summary 1 Introduction Motivation Objectives Case studies 2 Framework Framework overview Framework messaging Framework components 3 Sentiment analysis Overview Classifier 4 Case studies Boston Terror Attack Political analysis 5 Conclusions and future work ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 2 / 25
  • 3. Introduction Framework Sentiment analysis Case studies Conclusions Motivation Objectives Case studies Introduction Motivation Great expansion of social networks in the last years One of the most successfull ones is Twitter Microblogging platform Short messages known as tweets Open nature Twitter offers great research opportunities Open nature Distributed human sensor network Easy data extraction, difficult data processing Twitter + sentiment analysis Lack of tools for sentiment analysis in Spanish ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 3 / 25
  • 4. Introduction Framework Sentiment analysis Case studies Conclusions Motivation Objectives Case studies Introduction Objectives Twitter offers excelent API ... however there is a need of some infraestructure (mainly storage and reporting) Objectives 1 Develop a framework for Twitter data extraction and analysis 2 Provide reporting tools 3 Foundation for sentiment analysis in Spanish ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 4 / 25
  • 5. Introduction Framework Sentiment analysis Case studies Conclusions Motivation Objectives Case studies Introduction Case studies In order to assess the framework, we have included two study cases Event driven - Boston terror attack Regular usage - Political activity on Twitter in Spanish ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 5 / 25
  • 6. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Overview Requirements Easy to use, extensible, massive data processing Design decisions Modular design: Collection of independent scripts Focus on open data formats Built around the database: MongoDB Set of independent scripts interchanging data ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 6 / 25
  • 7. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework messaging ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 7 / 25
  • 8. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework components: Miner Miner Extracts and stores tweets Stream API Several filters Written in Python ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 8 / 25
  • 9. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework components: Database Database Storage for futher processing MongoDB NoSQL database High performance ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 9 / 25
  • 10. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework components: Reporting Reporting CSV export for futher processing R processing Extensibility Powerful libraries ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 10 / 25
  • 11. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework components: Sentiment analysis Sentiment analysis Supervised learning Need of labeling Tools for labeling Classifier building Classifier testing ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 11 / 25
  • 12. Introduction Framework Sentiment analysis Case studies Conclusions Overview Classifier Sentiment analysis Overview Supervised learning with Natural Language Toolkit (NLTK) Three classes: “Positive”, “negative” and “neutral” Need of labeled corpus Several ones in English ... ... none in Spanish Need of thousands manually classified tweets Collaborative labeling Web application to label tweets ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 12 / 25
  • 13. Introduction Framework Sentiment analysis Case studies Conclusions Overview Classifier Sentiment analysis Classifier Naïve Bayes classifier Stop words removed Some parameters to set Optimus parameter setting depends on the dataset Need of classifier evaluation Tester Cross validation ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 13 / 25
  • 14. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Boston Terror Attack Main objective Evaluate the platform Secondary objective Describe activity around an event Stream by string filter The event Terror attack on 15 Apr 2013 14:49 (GMT-4) in Boston Internet witch-hunt motivated by the release of some photos Shooting and manhunt Data adquisition Begin: Tue, 16 Apr 2013 00:43 (GMT) End: Tue, 23 Apr 2013 00:43 (GMT) Filter: “Maratón de Boston” (Boston Marathon in Spanish) ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 14 / 25
  • 15. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Boston Terror Attack: Dataset description Value Relative Average Tweets 28,892 1.16/user No-retweets 16,029 55.48 % Reweets 12,863 44.52 % Geolocalized 255 0.88 % Users 24,989 Mentions 18,937 65.54 % Replies 849 2.94 % Non-replies 18,088 62.61 % Size 96.39 MB 3.38 KB/tweet Index size 0.91 MB Disk 132.99 MB ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 15 / 25
  • 16. Case study Boston Terror attack: activity Apr 17 Apr 19 Apr 21 Apr 23 010002500 Time Tweets Tweets Apr 17 Apr 19 Apr 21 Apr 23 04001000 Time Non−retweets Tweets (excluding RTs) Apr 17 Apr 19 Apr 21 Apr 23 04001000 Time Retweets Retweets Dashed line: Bombing Dotted line: Photo release Solid line: Shooting Gray background: Manhunt
  • 17. Case study Boston Terror attack: activity Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00 50150 Time Tweets Tweets Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00 2060120 Time Non−retweets Tweets (excluding RTs) Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00 0204060 Time Retweets Retweets Dotted line: Photo release Solid line: Shooting Gray background: Manhunt
  • 18. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Political analysis: Overview Main objective Evaluate sentiment analysis Secondary objective Describe regular Twitter activity Stream by user filter Selection of Spanish political actors Selected by activity and controversy Account owner Accounts Political party @PPopular, @PSOE, @iunida, @UPyD Politician @agarzon, @EduMadina, @ToniCanto1, @Re- villaMiguelA, @ccifuentes, @_Rubalcaba_ Journalist @jordievole, @iescolar Activist organization @LA_PAH Data adquisition From Tue, 16 Apr 2013 00:00 (GMT) End: 18 Apr 2013 04:00 (GMT) Filter: Account name (“@account”) ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 18 / 25
  • 19. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Political analysis: Dataset description Value Relative Average Tweets 65,043 1.9/user No-retweets 28,175 43.32 % Reweets 36,868 56.68 % Geolocalized 528 0.81 % Users 34,195 Mentions 56,713 87.19 % Non-replies 46,981 72.23 % Replies 9,732 14.96 % Size 227.51 MB 3.58 KB/tweet Index size 2.05 MB Disk 237.95 MB ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 19 / 25
  • 20. Case study Political analysis: Activity Tue Wed Thu 015003500 Time Tweets Tweets Tue Wed Thu 05001500 Time Non−retweets Tweets (excluding RTs) Tue Wed Thu 010002000 Time Retweets Retweets
  • 21. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Political analysis: Sentiment analysis 9, 884 tweets were manually classified in a collaborative way 4, 739 non-neutral tweets 1, 062 positives, 3, 677 negatives Unbalanced dataset We tried several parameters for the Naïve Bayes classifier N-grams: {1}, {2}, {3}, {1, 2}, {1, 3} and {2, 3} Minimum score: 0, 1, 2, 3, 4, 5, 6 and 10 10-fold cross-validation ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 21 / 25
  • 22. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Political analysis: Sentiment analysis Accuracy NaiveBayes-1_2-min3 0.8543 NaiveBayes-1-min3 0.8510 NaiveBayes-1_3-min3 0.8507 NaiveBayes-1-min4 0.8476 NaiveBayes-1_3-min5 0.8474 NaiveBayes-1_2-min4 0.8469 NaiveBayes-1_3-min4 0.8467 NaiveBayes-1_3-min1 0.8459 NaiveBayes-1-min6 0.8452 NaiveBayes-1-min1 0.8448 NaiveBayes-1_2-min5 0.8446 NaiveBayes-1_3-min6 0.8438 NaiveBayes-1_2-min6 0.8436 NaiveBayes-1-min5 0.8406 NaiveBayes-1_2-min1 0.8389 NaiveBayes-2_3-min6 0.8385 ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 22 / 25
  • 23. Case study Political analysis: Normalized sentiment Tue Wed Thu 0.00.20.40.60.81.0 Time Positive
  • 24. Introduction Framework Sentiment analysis Case studies Conclusions Conclusions and future work We developed a framework that eases data extraction and analysis on Twitter Ready for production It will be released soon with a free licence We briefly described two case studies Event driven activity - Boston terror attacks Regular activity - Political activity Sentiment analysis is intrinsically difficult Future work Lemmalization Natural language processing Time series analysis ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 24 / 25
  • 25. Thanks for your attention! David F. Barrero david@aut.uah.es @dfbarrero