SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
CS 541 : 2016 Presidential Candidate Tracker
Anwar Jameel (aj528)
Shahab Shekari (ss1817)
28 April 2016
1 Introduction
In this project, we have analyzed the data from twitter and facebook about
USA Presidential Candidates. The huge amount of data is parsed from 20
different news channels (Facebook pages) and from Twitter related to current
presidential race. We have used python APIs of Facebook (GraphAPI) and
Twitter (Tweepy) to get the data from Facebook and Twitter. DynamoDB is
used for storing different stats (e.g. polarity scores) about each presidential
candidate. We have used Natural Language Toolkit for sentiment analysis of
facebook posts and its comments and twitter tweets. We have set up cron jobs on
EC2 node to parse facebook posts once a day every midnight and tweets every
morning at 6am (for last day’s popular tweets) and every 6 hours: 3,9,15,21
(for live tweets). Finally, we have created a website to compare the polarity
scores of all the candidates based on different sources over a period of time.
We have also created a word cloud containing adjectives and hashtags for each
candidate on our website. The candidate engagements such as likes, number of
retweets, number of favorites are also shown on each candidate’s page. In the
next sections we discuss more about the implementation details.
2 Implementations
The sources of data in our project are Facebook and Twitter. In this section,
we discuss about all the features implemented to analyse and integrate the data
from Facebook and Twitter. We created the website to project different patterns
from the analysed data stored in DynamoDB. We have empowered our website
with an option to compare the polarity scores between any subset of 20 news
outlets and twitter. This feature provides an unbiased polarity comparison of
different sources for every candidate. We also display the popular hashtags
for every candidate on home page of our website. The word cloud containing
adjectives and hashtags is also displayed for every candidate. We also display
percentage of likes, retweets and favorites for each candidate, compared to other
candidates on the candidate’s page. Figure 1 shows the top 10 hashtags used
1
in conjunction with individual candidates for the past day. The hashtags are
rotated (faded in/out) every 5 seconds.
Figure 1: Website Home Page
2.1 Facebook
The data related to 6 USA presidential candidates is collected from facebook
pages of 20 different news outlets (such as Fox News, CNN, MSNBC, etc.). We
have defined target words (such as ”trump”, ”hillary”, etc.) which we look for
in a post’s title while parsing the posts of news channels. The relevant posts are
the ones which contain these target words. For every relevant post, top 20 com-
ments based on number of likes are retrieved and stored. Since facebook doesn’t
provide a direct API for getting the top k comments, we maintain a priority
queue of the earliest 500 comments of a post to find the top 20 comments. The
polarity scores are calculated using nltk sentiment analyzer for every relevant
post and its top comments. The aggregate polarity scores for each post and its
top comments are also stored for every candidate. The adjectives are retrieved
(again, using NLTK) from top comments of every post and stored for every can-
didate. The adjectives are also aggregated for every candidate to form a word
cloud. Both aggregated polarity scores and aggregated adjectives are stored in
the aggregate stats table of DynamoDB. The actual facebook post along with
other details such as comments, # of likes, candidate ids, post id etc. are stored
in fb posts table of DynamoDB.
2.2 Twitter
The data related to 6 USA presidential candidates is collected from popular
and recent tweets at Twitter. We look for tweets containing target words and
2
Figure 2: Candidate Page
store them with their candidate ids based on target words. The popular tweets
for previous day are parsed and stored every day at 6am. Recent, live tweets,
are parsed and stored every 6 hours (at 3, 9, 15, 21). For every relevant tweet,
we also store its timestamp, hashtags, number of favorites, number of retweets
and the tweet itself in twt posts table of DynamoDB. Polarity scores are cal-
culated for every tweet using nltk sentiment analyzer. We calculate the ag-
gregate polarity scores for every candidate and store them in aggregate stats
table of DynamoDB. Aggregated hashtags for every candidate are also stored
in aggregate stats table of DynamoDB.
Figure 2 shows a snapshot of a candidate page (Hillary Clinton) with few sources
selected to project the polarity and word cloud.
3 Data Statistics
Table Partition
Key
Sort
Key
Read Write Storage # Items
fb posts candidate id post id 10 10 2.71 MB 6,589
twt posts candidate id tweet id 200 10 81 MB 270,562
aggregate
stats
stat source time
stamp
5 5 1.04 MB 222
For each table we define Primary Key as combination of (Partition Key,
Sort Key). Partition key determines the partition where the item is stored in
DynamoDB. All items with the same partition key are stored together, in sorted
order by sort key value. The above given table show statistics for facebook,
twitter and aggregated statistics tables that we maintain at DynamoDB. The
3
read and write columns show provisioned read and write capacity units. The
storage column gives the amount of data stored and items column gives the
number of facebook posts/comments, tweets etc. present in each table for the
last 3 weeks.
4 Challenges
There are several challenges that we have faced during the implementation as
well as during data retrieval process. We do have AWS system limitations,
due to a lack of credits, which restrict us to scan a large number of items from
DynamoDB. As the amount of data grows with time, we need more data units to
be scanned from DynamoDB to project the polarity and word cloud over time.
In order to cope with the amount of data we have to scan from DynamoDB, we
implemented a caching solution, such that we scan DynamoDB tables just once
every 3 hours, store the data in RAM, and then filter and project the data from
RAM as needed.
Another challenge was the fact that Facebook does not provide a public API
to search or get public posts. We have overcome this challenge by scraping the
posts from a news channel’s Facebook wall and then finding the relevant posts
based on defined target words. Another scientific challenge that we have faced
is how to decide target words, we restricted ourselves to the last names of the
candidates. The choice was based on the fact that, since the amount of data
was not an issue, we would prefer not having any false positives, by excluding
posts that only contained the first name of the candidates.
5 Conclusion and Future Work
We have successfully, completed all the proposed goals of our project. We are
able to collect huge amounts of data to analyse, integrate and project the pat-
terns in the data using current data analysis techniques. Our projections are
unbiased in the sense that we have considered almost all the major attributes
of a facebook post or a tweet to calculate polarity scores. Our target words
are just the last names of the candidates and we assume that if any post or
story related with a candidate appears on either facebook or twitter then it will
contain either of these target words. We believe that up to a great extent this
assumption holds correct trivially. We also aggregate the adjectives and hash-
tags over time and form a word cloud based on the counts of these adjectives
and hashtags over a period of time. The engagement statistics are a great way
to see a candidate’s fondness in the public and we have projected it for every
candidate. We can also observe that few news channels have high variance in
polarity graph curves when compare with others. This variance can be used as
a measure to find how biased a certain news channel is.
Although our results portray great analytics about current presidential candi-
date, we think there is still a scope of improvement. In future, we would like
4
to include more sources (such as international media) to gather data from and
project the polarity and word cloud. We would also like to extrapolate the
polarity graph for future events using machine learning techniques.
Acknowledgement
We would like to thank Professor Am´elie Marian for providing us with construc-
tive suggestions and guiding us throughout the project.
5

Contenu connexe

Tendances

Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Deepak K
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glancepoojagupta267
 
Text mining on Twitter information based on R platform
Text mining on Twitter information based on R platformText mining on Twitter information based on R platform
Text mining on Twitter information based on R platformFayan TAO
 
9 Tools you need as a journalist for the best newsgathering and monitoring an...
9 Tools you need as a journalist for the best newsgathering and monitoring an...9 Tools you need as a journalist for the best newsgathering and monitoring an...
9 Tools you need as a journalist for the best newsgathering and monitoring an...Angie Yasser
 
GeospatialDataAnalysis
GeospatialDataAnalysisGeospatialDataAnalysis
GeospatialDataAnalysisTaylor Graham
 
Journalists and the Social Web 1
Journalists and the Social Web 1Journalists and the Social Web 1
Journalists and the Social Web 1ardessie
 
Paper id 24201441
Paper id 24201441Paper id 24201441
Paper id 24201441IJRAT
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint201014161
 
Search Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut AslantaşSearch Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut AslantaşAykut Aslantaş
 
Search Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanismSearch Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanismUmang MIshra
 
Surfing the internet
Surfing the internetSurfing the internet
Surfing the internetEveferro
 
Conversation Practices and Network Structure in Twitter
Conversation Practices and Network Structure in TwitterConversation Practices and Network Structure in Twitter
Conversation Practices and Network Structure in TwitterLuca Rossi
 
Data Mining of Informational Stream in Social Networks
Data Mining of Informational Stream in Social Networks   Data Mining of Informational Stream in Social Networks
Data Mining of Informational Stream in Social Networks Bohdan Pavlyshenko
 
Groundhog day: near duplicate detection on twitter
Groundhog day: near duplicate detection on twitterGroundhog day: near duplicate detection on twitter
Groundhog day: near duplicate detection on twitterDan Nguyen
 
google search engine
google search enginegoogle search engine
google search engineway2go
 

Tendances (19)

Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glance
 
Text mining on Twitter information based on R platform
Text mining on Twitter information based on R platformText mining on Twitter information based on R platform
Text mining on Twitter information based on R platform
 
9 Tools you need as a journalist for the best newsgathering and monitoring an...
9 Tools you need as a journalist for the best newsgathering and monitoring an...9 Tools you need as a journalist for the best newsgathering and monitoring an...
9 Tools you need as a journalist for the best newsgathering and monitoring an...
 
GeospatialDataAnalysis
GeospatialDataAnalysisGeospatialDataAnalysis
GeospatialDataAnalysis
 
Journalists and the Social Web 1
Journalists and the Social Web 1Journalists and the Social Web 1
Journalists and the Social Web 1
 
Paper id 24201441
Paper id 24201441Paper id 24201441
Paper id 24201441
 
How to Search Twitter
How to Search TwitterHow to Search Twitter
How to Search Twitter
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
 
Search Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut AslantaşSearch Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut Aslantaş
 
Surfing the web
Surfing the webSurfing the web
Surfing the web
 
Social media analysis project
Social media analysis projectSocial media analysis project
Social media analysis project
 
Search Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanismSearch Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanism
 
Surfing the internet
Surfing the internetSurfing the internet
Surfing the internet
 
Conversation Practices and Network Structure in Twitter
Conversation Practices and Network Structure in TwitterConversation Practices and Network Structure in Twitter
Conversation Practices and Network Structure in Twitter
 
Data Mining of Informational Stream in Social Networks
Data Mining of Informational Stream in Social Networks   Data Mining of Informational Stream in Social Networks
Data Mining of Informational Stream in Social Networks
 
Modulo Instruccional Internet Ppt
Modulo Instruccional Internet PptModulo Instruccional Internet Ppt
Modulo Instruccional Internet Ppt
 
Groundhog day: near duplicate detection on twitter
Groundhog day: near duplicate detection on twitterGroundhog day: near duplicate detection on twitter
Groundhog day: near duplicate detection on twitter
 
google search engine
google search enginegoogle search engine
google search engine
 

En vedette

Présentation
Présentation Présentation
Présentation IFIC (AUF)
 
Design of a Moto3 Racing Motorcycle
Design of a Moto3 Racing MotorcycleDesign of a Moto3 Racing Motorcycle
Design of a Moto3 Racing MotorcycleDavid Ojea Cerradelo
 
How to Bid bandwidth
How to Bid bandwidthHow to Bid bandwidth
How to Bid bandwidthAnwar Jameel
 
The French Revolution
The French RevolutionThe French Revolution
The French Revolutionmmcdonald2
 
Object Tracking using Artificial Neural Network
Object Tracking using Artificial Neural NetworkObject Tracking using Artificial Neural Network
Object Tracking using Artificial Neural NetworkAnwar Jameel
 
Studiodmerchandise
StudiodmerchandiseStudiodmerchandise
Studiodmerchandisealliedsuza
 
Clip 1: Arrival in Dampier
Clip 1: Arrival in DampierClip 1: Arrival in Dampier
Clip 1: Arrival in Dampiermmcdonald2
 
Year 8 History Research Essay Introduction
Year 8 History Research Essay IntroductionYear 8 History Research Essay Introduction
Year 8 History Research Essay Introductionmmcdonald2
 
الجبائر السنية التجميلية - الجزء الثاني
الجبائر السنية التجميلية - الجزء الثانيالجبائر السنية التجميلية - الجزء الثاني
الجبائر السنية التجميلية - الجزء الثانيBassem Abu Canon , DDS
 
LinkedIn ISMP Presentation
LinkedIn ISMP PresentationLinkedIn ISMP Presentation
LinkedIn ISMP PresentationNatasha Qabazard
 
Red dog death scene deconstruction
Red dog death scene deconstructionRed dog death scene deconstruction
Red dog death scene deconstructionmmcdonald2
 
Transition Class 3
Transition Class 3Transition Class 3
Transition Class 3mmcdonald2
 
Camera Angles and Shots
Camera Angles and ShotsCamera Angles and Shots
Camera Angles and Shotsmmcdonald2
 

En vedette (20)

Présentation
Présentation Présentation
Présentation
 
Tambola
TambolaTambola
Tambola
 
Design of a Moto3 Racing Motorcycle
Design of a Moto3 Racing MotorcycleDesign of a Moto3 Racing Motorcycle
Design of a Moto3 Racing Motorcycle
 
How to Bid bandwidth
How to Bid bandwidthHow to Bid bandwidth
How to Bid bandwidth
 
Un País Viral...
Un País Viral...Un País Viral...
Un País Viral...
 
GallupReport
GallupReportGallupReport
GallupReport
 
Chassis views
Chassis viewsChassis views
Chassis views
 
Jerusalem
JerusalemJerusalem
Jerusalem
 
The French Revolution
The French RevolutionThe French Revolution
The French Revolution
 
Object Tracking using Artificial Neural Network
Object Tracking using Artificial Neural NetworkObject Tracking using Artificial Neural Network
Object Tracking using Artificial Neural Network
 
Studiodmerchandise
StudiodmerchandiseStudiodmerchandise
Studiodmerchandise
 
Clip 1: Arrival in Dampier
Clip 1: Arrival in DampierClip 1: Arrival in Dampier
Clip 1: Arrival in Dampier
 
Year 8 History Research Essay Introduction
Year 8 History Research Essay IntroductionYear 8 History Research Essay Introduction
Year 8 History Research Essay Introduction
 
الجبائر السنية التجميلية - الجزء الثاني
الجبائر السنية التجميلية - الجزء الثانيالجبائر السنية التجميلية - الجزء الثاني
الجبائر السنية التجميلية - الجزء الثاني
 
LinkedIn ISMP Presentation
LinkedIn ISMP PresentationLinkedIn ISMP Presentation
LinkedIn ISMP Presentation
 
Imc report
Imc reportImc report
Imc report
 
Red dog death scene deconstruction
Red dog death scene deconstructionRed dog death scene deconstruction
Red dog death scene deconstruction
 
Transition Class 3
Transition Class 3Transition Class 3
Transition Class 3
 
Camera Angles and Shots
Camera Angles and ShotsCamera Angles and Shots
Camera Angles and Shots
 
Getting food
Getting foodGetting food
Getting food
 

Similaire à 2016 Presidential Candidate Tracker

Social data analysis using apache flume, hdfs, hive
Social data analysis using apache flume, hdfs, hiveSocial data analysis using apache flume, hdfs, hive
Social data analysis using apache flume, hdfs, hiveijctet
 
2023 Guide How To Scrape Social Media Data Using Python (1).pptx
2023 Guide How To Scrape Social Media Data Using Python (1).pptx2023 Guide How To Scrape Social Media Data Using Python (1).pptx
2023 Guide How To Scrape Social Media Data Using Python (1).pptxiwebdatascraping
 
FInal Project Intelligent Social Media Analytics
FInal Project Intelligent Social Media AnalyticsFInal Project Intelligent Social Media Analytics
FInal Project Intelligent Social Media AnalyticsAshwin Dinoriya
 
srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016Saurabh Deochake
 
Sentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and HiveSentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and HiveIRJET Journal
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceNisheet Mahajan
 
Final Presentation
Final PresentationFinal Presentation
Final PresentationLove Tyagi
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis reportSavio Aberneithie
 
DeepSearch_Project_Report
DeepSearch_Project_ReportDeepSearch_Project_Report
DeepSearch_Project_ReportUrjit Patel
 
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docxBrand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docxAASTHA76
 
Mining public opinion about economic issues
Mining public opinion about economic issuesMining public opinion about economic issues
Mining public opinion about economic issuesIvan Abboud
 
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...CSCJournals
 
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?Molly Gibbons (she/her)
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2Sara Hooker
 
591 Final Report - Team 7 - Political Issues
591 Final Report - Team 7 - Political Issues591 Final Report - Team 7 - Political Issues
591 Final Report - Team 7 - Political IssuesTim Sawicki
 
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSISMOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSISEditor Jacotech
 
Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique IJERA Editor
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & AnalysisScott Sanders
 
Social Media Mining using R
Social Media Mining using RSocial Media Mining using R
Social Media Mining using RSubhankar Mishra
 

Similaire à 2016 Presidential Candidate Tracker (20)

Social data analysis using apache flume, hdfs, hive
Social data analysis using apache flume, hdfs, hiveSocial data analysis using apache flume, hdfs, hive
Social data analysis using apache flume, hdfs, hive
 
2023 Guide How To Scrape Social Media Data Using Python (1).pptx
2023 Guide How To Scrape Social Media Data Using Python (1).pptx2023 Guide How To Scrape Social Media Data Using Python (1).pptx
2023 Guide How To Scrape Social Media Data Using Python (1).pptx
 
STACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSISSTACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSIS
 
FInal Project Intelligent Social Media Analytics
FInal Project Intelligent Social Media AnalyticsFInal Project Intelligent Social Media Analytics
FInal Project Intelligent Social Media Analytics
 
srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016
 
Sentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and HiveSentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and Hive
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business Intelligence
 
Final Presentation
Final PresentationFinal Presentation
Final Presentation
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
 
DeepSearch_Project_Report
DeepSearch_Project_ReportDeepSearch_Project_Report
DeepSearch_Project_Report
 
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docxBrand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
 
Mining public opinion about economic issues
Mining public opinion about economic issuesMining public opinion about economic issues
Mining public opinion about economic issues
 
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
 
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
591 Final Report - Team 7 - Political Issues
591 Final Report - Team 7 - Political Issues591 Final Report - Team 7 - Political Issues
591 Final Report - Team 7 - Political Issues
 
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSISMOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
 
Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
 
Social Media Mining using R
Social Media Mining using RSocial Media Mining using R
Social Media Mining using R
 

Dernier

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Dernier (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

2016 Presidential Candidate Tracker

  • 1. CS 541 : 2016 Presidential Candidate Tracker Anwar Jameel (aj528) Shahab Shekari (ss1817) 28 April 2016 1 Introduction In this project, we have analyzed the data from twitter and facebook about USA Presidential Candidates. The huge amount of data is parsed from 20 different news channels (Facebook pages) and from Twitter related to current presidential race. We have used python APIs of Facebook (GraphAPI) and Twitter (Tweepy) to get the data from Facebook and Twitter. DynamoDB is used for storing different stats (e.g. polarity scores) about each presidential candidate. We have used Natural Language Toolkit for sentiment analysis of facebook posts and its comments and twitter tweets. We have set up cron jobs on EC2 node to parse facebook posts once a day every midnight and tweets every morning at 6am (for last day’s popular tweets) and every 6 hours: 3,9,15,21 (for live tweets). Finally, we have created a website to compare the polarity scores of all the candidates based on different sources over a period of time. We have also created a word cloud containing adjectives and hashtags for each candidate on our website. The candidate engagements such as likes, number of retweets, number of favorites are also shown on each candidate’s page. In the next sections we discuss more about the implementation details. 2 Implementations The sources of data in our project are Facebook and Twitter. In this section, we discuss about all the features implemented to analyse and integrate the data from Facebook and Twitter. We created the website to project different patterns from the analysed data stored in DynamoDB. We have empowered our website with an option to compare the polarity scores between any subset of 20 news outlets and twitter. This feature provides an unbiased polarity comparison of different sources for every candidate. We also display the popular hashtags for every candidate on home page of our website. The word cloud containing adjectives and hashtags is also displayed for every candidate. We also display percentage of likes, retweets and favorites for each candidate, compared to other candidates on the candidate’s page. Figure 1 shows the top 10 hashtags used 1
  • 2. in conjunction with individual candidates for the past day. The hashtags are rotated (faded in/out) every 5 seconds. Figure 1: Website Home Page 2.1 Facebook The data related to 6 USA presidential candidates is collected from facebook pages of 20 different news outlets (such as Fox News, CNN, MSNBC, etc.). We have defined target words (such as ”trump”, ”hillary”, etc.) which we look for in a post’s title while parsing the posts of news channels. The relevant posts are the ones which contain these target words. For every relevant post, top 20 com- ments based on number of likes are retrieved and stored. Since facebook doesn’t provide a direct API for getting the top k comments, we maintain a priority queue of the earliest 500 comments of a post to find the top 20 comments. The polarity scores are calculated using nltk sentiment analyzer for every relevant post and its top comments. The aggregate polarity scores for each post and its top comments are also stored for every candidate. The adjectives are retrieved (again, using NLTK) from top comments of every post and stored for every can- didate. The adjectives are also aggregated for every candidate to form a word cloud. Both aggregated polarity scores and aggregated adjectives are stored in the aggregate stats table of DynamoDB. The actual facebook post along with other details such as comments, # of likes, candidate ids, post id etc. are stored in fb posts table of DynamoDB. 2.2 Twitter The data related to 6 USA presidential candidates is collected from popular and recent tweets at Twitter. We look for tweets containing target words and 2
  • 3. Figure 2: Candidate Page store them with their candidate ids based on target words. The popular tweets for previous day are parsed and stored every day at 6am. Recent, live tweets, are parsed and stored every 6 hours (at 3, 9, 15, 21). For every relevant tweet, we also store its timestamp, hashtags, number of favorites, number of retweets and the tweet itself in twt posts table of DynamoDB. Polarity scores are cal- culated for every tweet using nltk sentiment analyzer. We calculate the ag- gregate polarity scores for every candidate and store them in aggregate stats table of DynamoDB. Aggregated hashtags for every candidate are also stored in aggregate stats table of DynamoDB. Figure 2 shows a snapshot of a candidate page (Hillary Clinton) with few sources selected to project the polarity and word cloud. 3 Data Statistics Table Partition Key Sort Key Read Write Storage # Items fb posts candidate id post id 10 10 2.71 MB 6,589 twt posts candidate id tweet id 200 10 81 MB 270,562 aggregate stats stat source time stamp 5 5 1.04 MB 222 For each table we define Primary Key as combination of (Partition Key, Sort Key). Partition key determines the partition where the item is stored in DynamoDB. All items with the same partition key are stored together, in sorted order by sort key value. The above given table show statistics for facebook, twitter and aggregated statistics tables that we maintain at DynamoDB. The 3
  • 4. read and write columns show provisioned read and write capacity units. The storage column gives the amount of data stored and items column gives the number of facebook posts/comments, tweets etc. present in each table for the last 3 weeks. 4 Challenges There are several challenges that we have faced during the implementation as well as during data retrieval process. We do have AWS system limitations, due to a lack of credits, which restrict us to scan a large number of items from DynamoDB. As the amount of data grows with time, we need more data units to be scanned from DynamoDB to project the polarity and word cloud over time. In order to cope with the amount of data we have to scan from DynamoDB, we implemented a caching solution, such that we scan DynamoDB tables just once every 3 hours, store the data in RAM, and then filter and project the data from RAM as needed. Another challenge was the fact that Facebook does not provide a public API to search or get public posts. We have overcome this challenge by scraping the posts from a news channel’s Facebook wall and then finding the relevant posts based on defined target words. Another scientific challenge that we have faced is how to decide target words, we restricted ourselves to the last names of the candidates. The choice was based on the fact that, since the amount of data was not an issue, we would prefer not having any false positives, by excluding posts that only contained the first name of the candidates. 5 Conclusion and Future Work We have successfully, completed all the proposed goals of our project. We are able to collect huge amounts of data to analyse, integrate and project the pat- terns in the data using current data analysis techniques. Our projections are unbiased in the sense that we have considered almost all the major attributes of a facebook post or a tweet to calculate polarity scores. Our target words are just the last names of the candidates and we assume that if any post or story related with a candidate appears on either facebook or twitter then it will contain either of these target words. We believe that up to a great extent this assumption holds correct trivially. We also aggregate the adjectives and hash- tags over time and form a word cloud based on the counts of these adjectives and hashtags over a period of time. The engagement statistics are a great way to see a candidate’s fondness in the public and we have projected it for every candidate. We can also observe that few news channels have high variance in polarity graph curves when compare with others. This variance can be used as a measure to find how biased a certain news channel is. Although our results portray great analytics about current presidential candi- date, we think there is still a scope of improvement. In future, we would like 4
  • 5. to include more sources (such as international media) to gather data from and project the polarity and word cloud. We would also like to extrapolate the polarity graph for future events using machine learning techniques. Acknowledgement We would like to thank Professor Am´elie Marian for providing us with construc- tive suggestions and guiding us throughout the project. 5