SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Characterizing the Life Cycle
of Online News Stories
Using Social Media Reactions
Carlos Castillo, Mohammed El-Haddad, Matt Stempeck, Jürgen Pfeffer
Twitter: @ChaToX
2
Carlos Castillo – @chatox
http://www.chato.cl/research/
Outline
• Determining classes of news articles
• Predicting traffic using social media
3
Carlos Castillo – @chatox
http://www.chato.cl/research/
Usage analysis in online news
• Aikat (1998)
– Short dwell times, weekday+, weekend-,
bursty traffic.
• Crane and Sornette (2008), Yang and
Leskovec (2011), Lehmann et al. (2012)
– Behavioral classes of attention online
4
Carlos Castillo – @chatox
http://www.chato.cl/research/
Analysis of social media responses
• SocialFlow whitepaper (Lotan, Gaffney,
and Meyer 2011)
– Al Jazeera, BBC News, CNN, The Economist,
Fox News and The New York Times
• Hu et al. (2011)
– Tweets during speech of US president
5
Carlos Castillo – @chatox
http://www.chato.cl/research/
Predictive Web Analytics (references)
6
Carlos Castillo – @chatox
http://www.chato.cl/research/
Data collection
• Three weeks in October 2012
• “Beacon” embedded in Al Jazeera pages
– Real-time data processing
– Apache S4 application for online processing
– Cassandra (NoSQL database) for storage
≈ 3M visits
≈ 200K social media reactions
7
Carlos Castillo – @chatox
http://www.chato.cl/research/
Summary of dataset
8
Carlos Castillo – @chatox
http://www.chato.cl/research/
News In-Depth
Examples:
• US state of Maryland
abolishes death penalty
(May 2nd, 2013)
• Hundreds arrested in
China over 'fake' meat
(May 3rd, 2013)
Examples:
• Spirits of Japan shrine
haunt Asian relations
(May 2nd, 2013)
• Interactive: Powering
the Gulf (May 2nd,
2013)
9
Carlos Castillo – @chatox
http://www.chato.cl/research/
News (322) In-Depth (139)
Tag clouds extracted from titles of articles
Average News profile
Average In-Depth profile
In-Depth items have a slower growth
In-Depth items have a longer shelf-life
In-Depth items are shared on Facebook
News items are shared on Twitter
15
Carlos Castillo – @chatox
http://www.chato.cl/research/
Typical visitation profiles (12 hours)
Decreasing (78%)
Steady (9%)
Increasing (3%)
Rebounding (10%)
Examples
Decreasing
(78%):
● Almost all
breaking news
● Sometimes
delayed due to
timezone
differences, e.g.
Hurricane Sandy
Steady or
Increasing (12%):
● Ongoing news:
Obama/Romney,
Worker strikes in
SA, Syrian unrest
● Articles updated
with supporting
content
Rebounding
(10%):
● Articles picked up
by external
sources or social
media (typically
single source of
traffic)
● Background
articles to new
developments
17
Carlos Castillo – @chatox
http://www.chato.cl/research/
Prediction of visits
• Short-term traffic is to a large extent
correlated with long-term traffic
• Social media signals are correlated with
traffic and shelf-life
More reactions → more traffic
More discussion → longer shelf-life
• Can we predict 7 days after 30 minutes?
18
Carlos Castillo – @chatox
http://www.chato.cl/research/
Predicting traffic and shelf-life online
has a long history
• Predicting long-term behavior and
half-life from short-term observations
– Observations = comments, visits, votes, …
– Behavior = total comments, total visits, …
– 10+ papers specifically on web traffic
• Bit.ly (2011, 2012)
– Studies half-life per topic and platform
Results (traffic predictions)
Results (traffic predictions)
Extrapolate
visits
News are more
predictable than
In-Depth
Results (traffic predictions)
Improved
predictions
Using social
media variables
22
Carlos Castillo – @chatox
http://www.chato.cl/research/
Selected variables, traffic prediction
Results (shelf-life prediction)
Larger
improvements
for In-Depth
articles
Still, this is a 12 hours
error in predicting
something with an
average of 48-72 hours
24
Carlos Castillo – @chatox
http://www.chato.cl/research/
http://fast.qcri.org/
25
Carlos Castillo – @chatox
http://www.chato.cl/research/
What did we learn?
• Decrease, Stay or Increase. Rebound
– Roughly 80:10:10 ratio
• News vs In-Depth: different behavior
• Social media signals are useful to
understand and predict visits
26
Carlos Castillo – @chatox
http://www.chato.cl/research/
Invitation:
ECML/PKDD Discovery Challenge 2014
• Open competition
on predictive Web
Analytics
• Data provided by
Chartbeat Inc.
Thank you!
Carlos Castillo · chato@acm.org
http://www.chato.cl/research/
Characterizing the Life Cycle of Online News Stories Using Social Media Reactions

Contenu connexe

En vedette

Kdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiKdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-ii
Laks Lakshmanan
 
Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-iv
Laks Lakshmanan
 
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
IIIT Hyderabad
 
Kdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iKdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-i
Laks Lakshmanan
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iii
Laks Lakshmanan
 

En vedette (15)

Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
 Social Media News Communities: Gatekeeping, Coverage, and Statement Bias Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
Social Media News Communities: Gatekeeping, Coverage, and Statement Bias
 
Keynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationKeynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open Invitation
 
Kdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiKdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-ii
 
Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-iv
 
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
TweetCred: Real-Time Credibility Assessment of 
 Content on Twitter @ Socinfo...
 
Kdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iKdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-i
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social Media
 
What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Crisis Computing
Crisis ComputingCrisis Computing
Crisis Computing
 
Emotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of WikipediaEmotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of Wikipedia
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iii
 
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
 
Social Media Mining and Retrieval
Social Media Mining and RetrievalSocial Media Mining and Retrieval
Social Media Mining and Retrieval
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 

Similaire à Characterizing the Life Cycle of Online News Stories Using Social Media Reactions

Ausvotes
AusvotesAusvotes
Ausvotes
lchu125
 
Pizza Talk IV: Fighting Back Shitstorms With An Army of Superfans
Pizza Talk IV: Fighting Back Shitstorms With An Army of SuperfansPizza Talk IV: Fighting Back Shitstorms With An Army of Superfans
Pizza Talk IV: Fighting Back Shitstorms With An Army of Superfans
vm-people GmbH
 
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on TwitterBroker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
Cybersecurity Education and Research Centre
 
WBS OutlineWork Breakdown Structure OutlineProject Initiation1.1De.docx
WBS OutlineWork Breakdown Structure OutlineProject Initiation1.1De.docxWBS OutlineWork Breakdown Structure OutlineProject Initiation1.1De.docx
WBS OutlineWork Breakdown Structure OutlineProject Initiation1.1De.docx
celenarouzie
 
Ausvotes
AusvotesAusvotes
Ausvotes
lchu125
 
Presentation ISCRAM 2012
Presentation ISCRAM 2012Presentation ISCRAM 2012
Presentation ISCRAM 2012
Twittercrisis
 

Similaire à Characterizing the Life Cycle of Online News Stories Using Social Media Reactions (20)

Ausvotes
AusvotesAusvotes
Ausvotes
 
Prime "Social" Ministers - François Hollande Analysis
Prime "Social" Ministers - François Hollande AnalysisPrime "Social" Ministers - François Hollande Analysis
Prime "Social" Ministers - François Hollande Analysis
 
Prime "Social" Ministers - Alexis Tsipras Analysis
Prime "Social" Ministers - Alexis Tsipras AnalysisPrime "Social" Ministers - Alexis Tsipras Analysis
Prime "Social" Ministers - Alexis Tsipras Analysis
 
Icwsm Politics Panel
Icwsm Politics PanelIcwsm Politics Panel
Icwsm Politics Panel
 
Prime social ministers - David Cameron Analysis
Prime social ministers - David Cameron AnalysisPrime social ministers - David Cameron Analysis
Prime social ministers - David Cameron Analysis
 
Prime "Social" Ministers - Matteo Renzi Analysis
Prime "Social" Ministers - Matteo Renzi AnalysisPrime "Social" Ministers - Matteo Renzi Analysis
Prime "Social" Ministers - Matteo Renzi Analysis
 
Filter Bubbles in the Australian Twittersphere?
Filter Bubbles in the Australian Twittersphere?Filter Bubbles in the Australian Twittersphere?
Filter Bubbles in the Australian Twittersphere?
 
New tools twitter
New tools twitterNew tools twitter
New tools twitter
 
Pizza Talk IV: Fighting Back Shitstorms With An Army of Superfans
Pizza Talk IV: Fighting Back Shitstorms With An Army of SuperfansPizza Talk IV: Fighting Back Shitstorms With An Army of Superfans
Pizza Talk IV: Fighting Back Shitstorms With An Army of Superfans
 
Document(2)
Document(2)Document(2)
Document(2)
 
AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...
 
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on TwitterBroker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
 
WBS OutlineWork Breakdown Structure OutlineProject Initiation1.1De.docx
WBS OutlineWork Breakdown Structure OutlineProject Initiation1.1De.docxWBS OutlineWork Breakdown Structure OutlineProject Initiation1.1De.docx
WBS OutlineWork Breakdown Structure OutlineProject Initiation1.1De.docx
 
News Diffusion on Twitter: Comparing the Dissemination Careers for Mainstream...
News Diffusion on Twitter: Comparing the Dissemination Careers for Mainstream...News Diffusion on Twitter: Comparing the Dissemination Careers for Mainstream...
News Diffusion on Twitter: Comparing the Dissemination Careers for Mainstream...
 
Distinguere grano e loglio segnali, rumore e altre storie in un big (data) wo...
Distinguere grano e loglio segnali, rumore e altre storie in un big (data) wo...Distinguere grano e loglio segnali, rumore e altre storie in un big (data) wo...
Distinguere grano e loglio segnali, rumore e altre storie in un big (data) wo...
 
Prime "Social" Ministers - Mariano Rajoy Analysis
Prime "Social" Ministers - Mariano Rajoy AnalysisPrime "Social" Ministers - Mariano Rajoy Analysis
Prime "Social" Ministers - Mariano Rajoy Analysis
 
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
 
Twitter 101
Twitter 101Twitter 101
Twitter 101
 
Ausvotes
AusvotesAusvotes
Ausvotes
 
Presentation ISCRAM 2012
Presentation ISCRAM 2012Presentation ISCRAM 2012
Presentation ISCRAM 2012
 

Plus de Carlos Castillo (ChaTo)

Plus de Carlos Castillo (ChaTo) (20)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social Media
 
When no clicks are good news
When no clicks are good newsWhen no clicks are good news
When no clicks are good news
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Link prediction
Link predictionLink prediction
Link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Characterizing the Life Cycle of Online News Stories Using Social Media Reactions

  • 1. Characterizing the Life Cycle of Online News Stories Using Social Media Reactions Carlos Castillo, Mohammed El-Haddad, Matt Stempeck, Jürgen Pfeffer Twitter: @ChaToX
  • 2. 2 Carlos Castillo – @chatox http://www.chato.cl/research/ Outline • Determining classes of news articles • Predicting traffic using social media
  • 3. 3 Carlos Castillo – @chatox http://www.chato.cl/research/ Usage analysis in online news • Aikat (1998) – Short dwell times, weekday+, weekend-, bursty traffic. • Crane and Sornette (2008), Yang and Leskovec (2011), Lehmann et al. (2012) – Behavioral classes of attention online
  • 4. 4 Carlos Castillo – @chatox http://www.chato.cl/research/ Analysis of social media responses • SocialFlow whitepaper (Lotan, Gaffney, and Meyer 2011) – Al Jazeera, BBC News, CNN, The Economist, Fox News and The New York Times • Hu et al. (2011) – Tweets during speech of US president
  • 5. 5 Carlos Castillo – @chatox http://www.chato.cl/research/ Predictive Web Analytics (references)
  • 6. 6 Carlos Castillo – @chatox http://www.chato.cl/research/ Data collection • Three weeks in October 2012 • “Beacon” embedded in Al Jazeera pages – Real-time data processing – Apache S4 application for online processing – Cassandra (NoSQL database) for storage ≈ 3M visits ≈ 200K social media reactions
  • 7. 7 Carlos Castillo – @chatox http://www.chato.cl/research/ Summary of dataset
  • 8. 8 Carlos Castillo – @chatox http://www.chato.cl/research/ News In-Depth Examples: • US state of Maryland abolishes death penalty (May 2nd, 2013) • Hundreds arrested in China over 'fake' meat (May 3rd, 2013) Examples: • Spirits of Japan shrine haunt Asian relations (May 2nd, 2013) • Interactive: Powering the Gulf (May 2nd, 2013)
  • 9. 9 Carlos Castillo – @chatox http://www.chato.cl/research/ News (322) In-Depth (139) Tag clouds extracted from titles of articles
  • 12. In-Depth items have a slower growth
  • 13. In-Depth items have a longer shelf-life
  • 14. In-Depth items are shared on Facebook News items are shared on Twitter
  • 15. 15 Carlos Castillo – @chatox http://www.chato.cl/research/ Typical visitation profiles (12 hours) Decreasing (78%) Steady (9%) Increasing (3%) Rebounding (10%)
  • 16. Examples Decreasing (78%): ● Almost all breaking news ● Sometimes delayed due to timezone differences, e.g. Hurricane Sandy Steady or Increasing (12%): ● Ongoing news: Obama/Romney, Worker strikes in SA, Syrian unrest ● Articles updated with supporting content Rebounding (10%): ● Articles picked up by external sources or social media (typically single source of traffic) ● Background articles to new developments
  • 17. 17 Carlos Castillo – @chatox http://www.chato.cl/research/ Prediction of visits • Short-term traffic is to a large extent correlated with long-term traffic • Social media signals are correlated with traffic and shelf-life More reactions → more traffic More discussion → longer shelf-life • Can we predict 7 days after 30 minutes?
  • 18. 18 Carlos Castillo – @chatox http://www.chato.cl/research/ Predicting traffic and shelf-life online has a long history • Predicting long-term behavior and half-life from short-term observations – Observations = comments, visits, votes, … – Behavior = total comments, total visits, … – 10+ papers specifically on web traffic • Bit.ly (2011, 2012) – Studies half-life per topic and platform
  • 20. Results (traffic predictions) Extrapolate visits News are more predictable than In-Depth
  • 22. 22 Carlos Castillo – @chatox http://www.chato.cl/research/ Selected variables, traffic prediction
  • 23. Results (shelf-life prediction) Larger improvements for In-Depth articles Still, this is a 12 hours error in predicting something with an average of 48-72 hours
  • 24. 24 Carlos Castillo – @chatox http://www.chato.cl/research/ http://fast.qcri.org/
  • 25. 25 Carlos Castillo – @chatox http://www.chato.cl/research/ What did we learn? • Decrease, Stay or Increase. Rebound – Roughly 80:10:10 ratio • News vs In-Depth: different behavior • Social media signals are useful to understand and predict visits
  • 26. 26 Carlos Castillo – @chatox http://www.chato.cl/research/ Invitation: ECML/PKDD Discovery Challenge 2014 • Open competition on predictive Web Analytics • Data provided by Chartbeat Inc.
  • 27. Thank you! Carlos Castillo · chato@acm.org http://www.chato.cl/research/