SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
PIE CHART OR PIZZA:
IDENTIFYING CHART TYPES AND THEIR VIRALITY ON
TWITTER
Elena Simperl
@esimperl
University of Bristol
January 13, 2021
“One of the interpretations of the EU referendum result and
the rise of Donald Trump in the US is that we are now living in
a post-truth society - a world in which anecdotes shared on
social media and invented numbers thrown on the sides of
buses are more trusted and influential than official statistics,
extensive research, and proven expertise. In this world,
scientists, statisticians, analysts, and journalists must find new
ways to bring hard, factual data to citizens.”
“Data must entertain as well as inform, excite as well as
educate. It must be built with social media sharing in mind,
and become part of our everyday activities and digital
interactions with others.”
DATA STORIES
DATASTORIES.CO.UK
Data Stories developed frameworks
and technology to bring data closer
to people through art, games, and
storytelling.
We examined the impact of varying
levels of localisation, topicalization,
participation, and shareability on
public engagement with factual
evidence.
We delivered tools and guidance to
help artists, designers, statisticians,
analysts, and journalists communicate
through data in inspiring, informative
ways.
Theme 1: Find, make sense, use data
HIGHLIGHTS
Theme 2: Entertain and inform with data
STORYTELLING THROUGH GAMES AND ART
VIRAL
CHARTS?
Data visualisations are widely used by experts
to communicate quantitative information to the
public.
News agencies have Twitter accounts that
specialise in the dissemination of information
using charts.
Brands use infographics and other visual means
in campaigns.
Research has looked at information diffusion in
social networks for text, images, video, but not
charts.
PIE CHART OR
PIZZA?
Data-driven approach that
 identifies whether an image
posted in a tweet displays a
chart.
 If yes, it
 predicts its exact chart type; and
 its potential to go viral (i.e. like
and retweet counts).
REALITY VS BENCHMARK DATASETS
Top: benchmark data
Bottom: actual charts shared on Twitter
CONVNET FOR CHART IDENTIFICATION
• Adaptation of the
VGGNet system
(Simonyan and Zisserman,
2015) tuned to the
requirements of our task
• 2.4m (excl. the final
fully-connected layer)
parameters, around 129m
less than VGGNet’s “A”
configuration.
THE REVISION+ CORPUS
ReVision corpus: introduced in (Savva et al. , 2011), 10 chart types, 2965 images
We extended it to ReVision+ (1 new chart type, 1 extended chart type, 3.6k images with no charts)
Chart type Samples
Area chart 90
Bar (+column chart) 169 (362)
Box plot 150
Line graph 317
Map 249
Pie chart 210
Pareto chart 168
Radar plot 137
Scatter plot 371
Table 263
Venn diagram 108
No chart (ILSVRC-2012) 3636
Total 6061
CHART IDENTIFICATION EVALUATION
10 chart classes
11 chart classes
+ no-chart class
BUILDING A REALISTIC
DATASET
We collected a set of
34491 images from
Twitter accounts
dedicated to data
journalism.
We split this corpus into
two parts: 3000
images for chart
identification
(DataTweet+) and
31491 images for
virality prediction
(DataTweet).
THE DATATWEET+
CORPUS
We hand-labelled 3000
images using the
crowdsourcing platform
Figure Eight.
Quality assurance:
 80%+ on gold standard
questions (50 images, manually
labelled by us);
 inter-annotator agreement
(Fleiss Kappa) 60%+ (0.8741).
DISTRIBUTION OF CHART TYPES IN
DATATWEET+
FINETUNING THE CONVNET
We ran two sets of experiments on DataTweet+, one with
the ConvNet trained on ReVision+ and one after fine-
tuning it on the new corpus DataTweet+.
We “froze” the parameters of the convolutional layers
and tuned only the fully connected layers.
We set the learning rate to half of its original value; the
other training details remain identical to the ones of the
original model.
CHART IDENTIFICATION EVALUATION
Original,
clean chart
dataset,
extended
3000 charts
from Twitter
CLASSIFICATION EXAMPLES (1)
CLASSIFICATION EXAMPLES (2)
MULTI-MODAL NEURAL ARCHITECTURE
FOR VIRALITY PREDICTION
JOINTLY LEARNING TO PREDICT LIKES
AND RETWEETS
Modelled as regression task. During training our model tries to
minimise:
Target values are transformed to logarithmic scale due to the large
variation of their expected values.
We evaluate using Root Mean Square Error (RMSE) and Spearman’s
rank correlation (ρ).
retweets target retweets likes target likes
VIRALITY EVALUATION
FINDINGS
Best performance when all
features included.
Despite much lower
computational complexity, the
systems equipped with the
mDataTweet+ features perform
better in both retweet and like
prediction than the ones
equipped with mILSVRC.
Using the fine-tuned mDataTweet+
features results in lower average
RMSE compared to the mReVision+
ones.
Most determinant prediction
features are author-related.
CONCLUSIONS AND
FUTURE WORK (1)
First attempt to estimate how much a chart -driven
Twitter post will be shared by jointly learning to
predict the number of times a chart message will be
retweeted and liked.
Our system outperforms other competing systems on
ReVision, while it is additionally capable of excluding
images that do not contain charts.
We introduced using crowdsourcing a new dataset of
realistic data visualisations—available at:
https://github.com/pvougiou/Pie-Chart-or-Pizza.
The models trained on the DataTweet+ corpus are
relevant for ongoing research on charts ranking or
recommendation with neural networks, which
identified a series of quality metrics to create large
training datasets automatically.
CONCLUSIONS AND
FUTURE WORK (2)
Such metrics could be used to generate larger,
synthetic chart corpora where we can control for
various chart design elements to see if they make a
difference on social media.
We did not consider images with more than one
chart. Model did not do well on dashboards and
embellished charts.
We did not consider time in our shareability
predictions - 95% of the posts we analysed were
older than a year, so we predicted cumulative
retweets and likes rather than time-sensitive results.
The system was robust across chart types and author
profiles and could be extended to other tasks such as
visual question answering for charts, used e.g. in fact
checking.
PUBLICATIONS
Talking Datasets — understanding data sensemaking behaviours. L Koesten, K
Gregory, P Groth, E Simperl. Currently under review at the International Journal
of Human-Computer Studies. 2020
Everything You Always Wanted to Know about a Dataset: Studies in Data
Summarisation. L Koesten, E Simperl, E Kacprzak, T Blount, J Tennison.
International Journal of Human-Computer Studies. 2019
Collaborative Practices with Structured Data: Do Tools Support what Users Need?
L Koesten, E Kacprzak, E Simperl, J Tennison; ACM CHI Conference on Human
Factors in Computing Systems, CHI 2019.
Dataset search: a survey. A Chapman, E Simperl, L Koesten, G Konstantinidis,
LD Ibáñez, E Kacprzak, P Groth. The International Journal on Very Large Data
Bases, 2019.
Characterising dataset search — An analysis of search logs and data requests. E
Kacprzak, L Koesten, LD Ibáñez, T Blount, J Tennison, E Simperl; Journal of Web
Semantics, 2018
The Trials and Tribulations of Working with Structured Data - a Study on
Information Seeking Behaviour. L Koesten, E Kacprzak, J Tennison, E Simperl.
Proceedings of ACM CHI Conference on Human Factors in Computing Systems,
CHI 2017.
Dataset Reuse: Toward Translating Principles to Practice. L Koesten, P Vougiouklis,
E Simperl, P Groth - Patterns, 2020
Pie Chart or Pizza: Identifying Chart Types and Their Virality on Twitter - P
Vougiouklis, L Carr, E Simperl - Proceedings of the International AAAI Conference
on Web and Social Media, 2020

Contenu connexe

Tendances

Today's Data Grow Tomorrow's Citizens
Today's Data Grow Tomorrow's CitizensToday's Data Grow Tomorrow's Citizens
Today's Data Grow Tomorrow's Citizens
Communication and Media Studies, Carleton University
 
Data Gov
Data GovData Gov
Data Gov
RexNige
 

Tendances (20)

Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in Wikidata
 
GI Management Transformation: from geometry to databased relationships
GI Management Transformation: from geometry to databased relationshipsGI Management Transformation: from geometry to databased relationships
GI Management Transformation: from geometry to databased relationships
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Linked Open Government Data: What’s Next?
Linked Open Government Data:  What’s Next?Linked Open Government Data:  What’s Next?
Linked Open Government Data: What’s Next?
 
Data Power
Data PowerData Power
Data Power
 
Data Journalism and the Remaking of Data Infrastructures
Data Journalism and the Remaking of Data InfrastructuresData Journalism and the Remaking of Data Infrastructures
Data Journalism and the Remaking of Data Infrastructures
 
Machine Learning and Social Participation
Machine Learning and Social ParticipationMachine Learning and Social Participation
Machine Learning and Social Participation
 
Today's Data Grow Tomorrow's Citizens
Today's Data Grow Tomorrow's CitizensToday's Data Grow Tomorrow's Citizens
Today's Data Grow Tomorrow's Citizens
 
The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...
 
Intro to Data Analysis Framework
Intro to Data Analysis Framework Intro to Data Analysis Framework
Intro to Data Analysis Framework
 
Isolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’sIsolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’s
 
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
 
Tweets are Not Created Equal. Intersecting Devices in the 1% Sample
Tweets are Not Created Equal. Intersecting Devices in the 1% SampleTweets are Not Created Equal. Intersecting Devices in the 1% Sample
Tweets are Not Created Equal. Intersecting Devices in the 1% Sample
 
Data Gov
Data GovData Gov
Data Gov
 
Tfsc disc 2014 si proposal (30 june2014)
Tfsc disc 2014 si proposal (30 june2014)Tfsc disc 2014 si proposal (30 june2014)
Tfsc disc 2014 si proposal (30 june2014)
 
Ongoing Research in Data Studies
Ongoing Research in Data StudiesOngoing Research in Data Studies
Ongoing Research in Data Studies
 
Community Data Program Submitted letter to Open Government Partneship
Community Data Program Submitted letter to Open Government PartneshipCommunity Data Program Submitted letter to Open Government Partneship
Community Data Program Submitted letter to Open Government Partneship
 
Not the Geography You Remember
Not the Geography You RememberNot the Geography You Remember
Not the Geography You Remember
 
Platforms and Analytical Gestures
Platforms and Analytical GesturesPlatforms and Analytical Gestures
Platforms and Analytical Gestures
 
Critical Data Studies in the Academy
Critical Data Studies in the AcademyCritical Data Studies in the Academy
Critical Data Studies in the Academy
 

Similaire à Pie chart or pizza: identifying chart types and their virality on Twitter

Big Data Conference
Big Data ConferenceBig Data Conference
Big Data Conference
DataTactics
 
Predictive geospatial analytics using principal component regression
Predictive geospatial analytics using principal component regression Predictive geospatial analytics using principal component regression
Predictive geospatial analytics using principal component regression
IJECEIAES
 
CSUN 2023 Automated Descriptions 3 March 2023 TG.pptx
CSUN 2023 Automated Descriptions 3 March 2023 TG.pptxCSUN 2023 Automated Descriptions 3 March 2023 TG.pptx
CSUN 2023 Automated Descriptions 3 March 2023 TG.pptx
Ted Gies
 
BIG DATA AND BIG CITIES THE PROMISES AND LIMITATIONSOF IMPR.docx
BIG DATA AND BIG CITIES THE PROMISES AND LIMITATIONSOF IMPR.docxBIG DATA AND BIG CITIES THE PROMISES AND LIMITATIONSOF IMPR.docx
BIG DATA AND BIG CITIES THE PROMISES AND LIMITATIONSOF IMPR.docx
tangyechloe
 
TED Wiley Visualizing .docx
TED  Wiley Visualizing .docxTED  Wiley Visualizing .docx
TED Wiley Visualizing .docx
ssuserf9c51d
 

Similaire à Pie chart or pizza: identifying chart types and their virality on Twitter (20)

Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 
Cancer genomics first look
Cancer genomics first lookCancer genomics first look
Cancer genomics first look
 
Map Reduce in Big fata
Map Reduce in Big fataMap Reduce in Big fata
Map Reduce in Big fata
 
Engines of Order. Social Media and the Rise of Algorithmic Knowing.
Engines of Order. Social Media and the Rise of Algorithmic Knowing.Engines of Order. Social Media and the Rise of Algorithmic Knowing.
Engines of Order. Social Media and the Rise of Algorithmic Knowing.
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
 
Imagining a Physical Future for Digital Journalism
Imagining a Physical Future for Digital JournalismImagining a Physical Future for Digital Journalism
Imagining a Physical Future for Digital Journalism
 
Data visualisation
Data visualisationData visualisation
Data visualisation
 
UNIT1-2.pptx
UNIT1-2.pptxUNIT1-2.pptx
UNIT1-2.pptx
 
Accessible Next Level Visualizations
Accessible Next Level VisualizationsAccessible Next Level Visualizations
Accessible Next Level Visualizations
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstract
 
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics Corporation
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data Conference
 
Predictive geospatial analytics using principal component regression
Predictive geospatial analytics using principal component regression Predictive geospatial analytics using principal component regression
Predictive geospatial analytics using principal component regression
 
Linear Regression with R programming.pptx
Linear Regression with R programming.pptxLinear Regression with R programming.pptx
Linear Regression with R programming.pptx
 
CSUN 2023 Automated Descriptions 3 March 2023 TG.pptx
CSUN 2023 Automated Descriptions 3 March 2023 TG.pptxCSUN 2023 Automated Descriptions 3 March 2023 TG.pptx
CSUN 2023 Automated Descriptions 3 March 2023 TG.pptx
 
BIG DATA AND BIG CITIES THE PROMISES AND LIMITATIONSOF IMPR.docx
BIG DATA AND BIG CITIES THE PROMISES AND LIMITATIONSOF IMPR.docxBIG DATA AND BIG CITIES THE PROMISES AND LIMITATIONSOF IMPR.docx
BIG DATA AND BIG CITIES THE PROMISES AND LIMITATIONSOF IMPR.docx
 
TED Wiley Visualizing .docx
TED  Wiley Visualizing .docxTED  Wiley Visualizing .docx
TED Wiley Visualizing .docx
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
IRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using CobwebIRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using Cobweb
 

Plus de Elena Simperl

One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
Elena Simperl
 

Plus de Elena Simperl (20)

This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing science
 
Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generation
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdf
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdf
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart cities
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart cities
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
 
Qrowd and the city
Qrowd and the cityQrowd and the city
Qrowd and the city
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approach
 
Making transport smarter, leveraging the human factor
Making transport smarter, leveraging the human factorMaking transport smarter, leveraging the human factor
Making transport smarter, leveraging the human factor
 
Data storytelling
Data storytelling Data storytelling
Data storytelling
 
Quality and collaboration in Wikidata
Quality and collaboration in WikidataQuality and collaboration in Wikidata
Quality and collaboration in Wikidata
 
Beyond monetary incentives: experiments with paid microtasks
Beyond monetary incentives: experiments with paid microtasksBeyond monetary incentives: experiments with paid microtasks
Beyond monetary incentives: experiments with paid microtasks
 
The Data Pitch call
The Data Pitch callThe Data Pitch call
The Data Pitch call
 
The business of open data
The business of open dataThe business of open data
The business of open data
 

Dernier

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 

Dernier (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 

Pie chart or pizza: identifying chart types and their virality on Twitter

  • 1. PIE CHART OR PIZZA: IDENTIFYING CHART TYPES AND THEIR VIRALITY ON TWITTER Elena Simperl @esimperl University of Bristol January 13, 2021
  • 2. “One of the interpretations of the EU referendum result and the rise of Donald Trump in the US is that we are now living in a post-truth society - a world in which anecdotes shared on social media and invented numbers thrown on the sides of buses are more trusted and influential than official statistics, extensive research, and proven expertise. In this world, scientists, statisticians, analysts, and journalists must find new ways to bring hard, factual data to citizens.”
  • 3. “Data must entertain as well as inform, excite as well as educate. It must be built with social media sharing in mind, and become part of our everyday activities and digital interactions with others.”
  • 4. DATA STORIES DATASTORIES.CO.UK Data Stories developed frameworks and technology to bring data closer to people through art, games, and storytelling. We examined the impact of varying levels of localisation, topicalization, participation, and shareability on public engagement with factual evidence. We delivered tools and guidance to help artists, designers, statisticians, analysts, and journalists communicate through data in inspiring, informative ways.
  • 5. Theme 1: Find, make sense, use data
  • 7. Theme 2: Entertain and inform with data
  • 9. VIRAL CHARTS? Data visualisations are widely used by experts to communicate quantitative information to the public. News agencies have Twitter accounts that specialise in the dissemination of information using charts. Brands use infographics and other visual means in campaigns. Research has looked at information diffusion in social networks for text, images, video, but not charts.
  • 10. PIE CHART OR PIZZA? Data-driven approach that  identifies whether an image posted in a tweet displays a chart.  If yes, it  predicts its exact chart type; and  its potential to go viral (i.e. like and retweet counts).
  • 11. REALITY VS BENCHMARK DATASETS Top: benchmark data Bottom: actual charts shared on Twitter
  • 12. CONVNET FOR CHART IDENTIFICATION • Adaptation of the VGGNet system (Simonyan and Zisserman, 2015) tuned to the requirements of our task • 2.4m (excl. the final fully-connected layer) parameters, around 129m less than VGGNet’s “A” configuration.
  • 13. THE REVISION+ CORPUS ReVision corpus: introduced in (Savva et al. , 2011), 10 chart types, 2965 images We extended it to ReVision+ (1 new chart type, 1 extended chart type, 3.6k images with no charts) Chart type Samples Area chart 90 Bar (+column chart) 169 (362) Box plot 150 Line graph 317 Map 249 Pie chart 210 Pareto chart 168 Radar plot 137 Scatter plot 371 Table 263 Venn diagram 108 No chart (ILSVRC-2012) 3636 Total 6061
  • 14. CHART IDENTIFICATION EVALUATION 10 chart classes 11 chart classes + no-chart class
  • 15. BUILDING A REALISTIC DATASET We collected a set of 34491 images from Twitter accounts dedicated to data journalism. We split this corpus into two parts: 3000 images for chart identification (DataTweet+) and 31491 images for virality prediction (DataTweet).
  • 16. THE DATATWEET+ CORPUS We hand-labelled 3000 images using the crowdsourcing platform Figure Eight. Quality assurance:  80%+ on gold standard questions (50 images, manually labelled by us);  inter-annotator agreement (Fleiss Kappa) 60%+ (0.8741).
  • 17. DISTRIBUTION OF CHART TYPES IN DATATWEET+
  • 18. FINETUNING THE CONVNET We ran two sets of experiments on DataTweet+, one with the ConvNet trained on ReVision+ and one after fine- tuning it on the new corpus DataTweet+. We “froze” the parameters of the convolutional layers and tuned only the fully connected layers. We set the learning rate to half of its original value; the other training details remain identical to the ones of the original model.
  • 19. CHART IDENTIFICATION EVALUATION Original, clean chart dataset, extended 3000 charts from Twitter
  • 22. MULTI-MODAL NEURAL ARCHITECTURE FOR VIRALITY PREDICTION
  • 23. JOINTLY LEARNING TO PREDICT LIKES AND RETWEETS Modelled as regression task. During training our model tries to minimise: Target values are transformed to logarithmic scale due to the large variation of their expected values. We evaluate using Root Mean Square Error (RMSE) and Spearman’s rank correlation (ρ). retweets target retweets likes target likes
  • 25. FINDINGS Best performance when all features included. Despite much lower computational complexity, the systems equipped with the mDataTweet+ features perform better in both retweet and like prediction than the ones equipped with mILSVRC. Using the fine-tuned mDataTweet+ features results in lower average RMSE compared to the mReVision+ ones. Most determinant prediction features are author-related.
  • 26. CONCLUSIONS AND FUTURE WORK (1) First attempt to estimate how much a chart -driven Twitter post will be shared by jointly learning to predict the number of times a chart message will be retweeted and liked. Our system outperforms other competing systems on ReVision, while it is additionally capable of excluding images that do not contain charts. We introduced using crowdsourcing a new dataset of realistic data visualisations—available at: https://github.com/pvougiou/Pie-Chart-or-Pizza. The models trained on the DataTweet+ corpus are relevant for ongoing research on charts ranking or recommendation with neural networks, which identified a series of quality metrics to create large training datasets automatically.
  • 27. CONCLUSIONS AND FUTURE WORK (2) Such metrics could be used to generate larger, synthetic chart corpora where we can control for various chart design elements to see if they make a difference on social media. We did not consider images with more than one chart. Model did not do well on dashboards and embellished charts. We did not consider time in our shareability predictions - 95% of the posts we analysed were older than a year, so we predicted cumulative retweets and likes rather than time-sensitive results. The system was robust across chart types and author profiles and could be extended to other tasks such as visual question answering for charts, used e.g. in fact checking.
  • 28.
  • 29. PUBLICATIONS Talking Datasets — understanding data sensemaking behaviours. L Koesten, K Gregory, P Groth, E Simperl. Currently under review at the International Journal of Human-Computer Studies. 2020 Everything You Always Wanted to Know about a Dataset: Studies in Data Summarisation. L Koesten, E Simperl, E Kacprzak, T Blount, J Tennison. International Journal of Human-Computer Studies. 2019 Collaborative Practices with Structured Data: Do Tools Support what Users Need? L Koesten, E Kacprzak, E Simperl, J Tennison; ACM CHI Conference on Human Factors in Computing Systems, CHI 2019. Dataset search: a survey. A Chapman, E Simperl, L Koesten, G Konstantinidis, LD Ibáñez, E Kacprzak, P Groth. The International Journal on Very Large Data Bases, 2019. Characterising dataset search — An analysis of search logs and data requests. E Kacprzak, L Koesten, LD Ibáñez, T Blount, J Tennison, E Simperl; Journal of Web Semantics, 2018 The Trials and Tribulations of Working with Structured Data - a Study on Information Seeking Behaviour. L Koesten, E Kacprzak, J Tennison, E Simperl. Proceedings of ACM CHI Conference on Human Factors in Computing Systems, CHI 2017. Dataset Reuse: Toward Translating Principles to Practice. L Koesten, P Vougiouklis, E Simperl, P Groth - Patterns, 2020 Pie Chart or Pizza: Identifying Chart Types and Their Virality on Twitter - P Vougiouklis, L Carr, E Simperl - Proceedings of the International AAAI Conference on Web and Social Media, 2020