SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
The plista Dataset

ACM RecSys 2013, Hong Kong

Authors:
Kille, Benjamin
and Hopfgartner, Frank
and Brodt, Torben
and Heintz, Tobias
Speaker:
Brodt, Torben
International News Recommender
Systems Workshop and Challenge
October 13th, 2013
Introduction and Motivation
● Context: News Article Recommendation
Introduction and Motivation
● Do we need another recommendation data
set?
we have
...
● What features are those data sets missing?
● What requirements entail news articles for
recommendation?
Introduction and Motivation
● Features that had not been available in
existing data sets:
○ contextual features: device, operating system,
browser, etc.
○ cross-domain features: 13 different news providers
included
○ different interaction types: interactions with
recommendations (clicks), as well as news items
(impressions)
○ content features: headline, URL, images, text
snippets, etc.
Introduction and Motivation
● Additional requirements for recommending news articles
○ real-time → recommendations must be provided within a
short time interval (< 200ms)
○ changing relevancy → items’ relevancy decreases with
time
○ dynamics → new news items are being continuously
added
● Requirements inherent to existing recommender systems:
○ sparsity → users typically read only few news articles
○ cold start → systems refrain from requesting users to
create profiles; this results in a majority of small user
profiles
Dataset characteristics
{ // json
"type": "impression",
"context": {
"simple": {
"27": 418, // publisher
"14": 31721, // widget
...
},
"lists": {
"10": [100, 101] // channel
}
... specs hosted at http://orp.plista.
api
} com
Dataset characteristics
● object types
○
○
○
○

impressions → users reading news articles
clicks → users clicking recommendations
creates → news articles being created
updates → news articles being updated

api specs hosted at http://orp.plista.
com
Dataset usage
Dataset usage
● Evaluation based on
Click-Through-Rate
(CTR)
● ~ 84 million
impressions
● ~ 1 million clicks
Dataset usage
● evaluation cross-news portal
recommenders
● 10 - 36 % user overlap in
between different news
portals
Dataset usage
● news portal comparisons
● do we observe similar user
behaviour on news portals
offering similar content?
Dataset usage
● evaluating contextual
recommendation algorithms
● sensitive to
○ weekday
○ hour of day
○ ...
Dataset usage
When using the data set you may consider…
● … we identify users by session IDs
○
○

individual users may have several IDs
users sharing their device might be mapped to one ID

● … interactions (clicks, impressions) and content
dynamics (creates, updates) differ between news
portals
● … contents are restricted to German
● … preferences are represented on a binary scale (user
read article, user clicked recommendation)
● … clicking on recommendations might not reveal the
actual relevancy of an item
Conclusions
● we introduce a new data set intended to
support recommender systems research
● we outlined novel features which existing
data sets lacked
● we presented scenarios which can be
evaluated using the data set
● we pointed to critical aspects which ought
to be considered when working with the data
set
Summary
● news articles
○ of ~13 publishers

● transactional data
○ Impressions
○ Clicks

● contextual data
○ of ~50 attributes

● cross domain application
The plista Dataset
@inproceedings{Kille:2013,
title = {The plista Dataset},
author = {
Kille, Benjamin
and Hopfgartner, Frank
and Brodt, Torben
and Heintz, Tobias
},
booktitle = {
NRS'13: Proceedings of
the International Workshop and
Challenge on News Recommender Systems
},
year = {2013},
month = {10},
location = {Hong Kong, China},
publisher = {ACM},
pages={14--21}
}

Contenu connexe

Similaire à Paper the plista dataset

Understanding and responding to content blocking
Understanding and responding to content blockingUnderstanding and responding to content blocking
Understanding and responding to content blockingPierre Far
 
Pivotal Tracker - Research Findings
Pivotal Tracker - Research FindingsPivotal Tracker - Research Findings
Pivotal Tracker - Research FindingsPaulina Galindo
 
What can media learn from game analytics
What can media learn from game analyticsWhat can media learn from game analytics
What can media learn from game analyticsOsma Ahvenlampi
 
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...Gabriel Moreira
 
Data + Audience: Connecting to Create Impact
Data + Audience: Connecting to Create ImpactData + Audience: Connecting to Create Impact
Data + Audience: Connecting to Create ImpactCourtney Clark
 
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture SeriesIdentifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture SeriesLuciano Pesci, PhD
 
Pinpoint, Prepare, and Perform with Web Analytics
Pinpoint, Prepare, and Perform with Web AnalyticsPinpoint, Prepare, and Perform with Web Analytics
Pinpoint, Prepare, and Perform with Web AnalyticsKatie Vojtko
 
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...Nim Dvir
 
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...Marijn Koolen
 
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...LavaConConference
 
UXPA 2023: Learn how to get over personas by swiping right on user roles
UXPA 2023: Learn how to get over personas by swiping right on user rolesUXPA 2023: Learn how to get over personas by swiping right on user roles
UXPA 2023: Learn how to get over personas by swiping right on user rolesUXPA International
 
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...Together We're Better
 
Help Me, Help You: Supporting Your Data
Help Me, Help You: Supporting Your DataHelp Me, Help You: Supporting Your Data
Help Me, Help You: Supporting Your DataData Con LA
 
Nicholas Gorski: Real-time revenue science at Twitter
Nicholas Gorski: Real-time revenue science at TwitterNicholas Gorski: Real-time revenue science at Twitter
Nicholas Gorski: Real-time revenue science at TwitterDavid Garrison
 
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + DemosDrools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + DemosMauricio (Salaboy) Salatino
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Gabriel Moreira
 
Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018Gabriel Moreira
 

Similaire à Paper the plista dataset (20)

Understanding and responding to content blocking
Understanding and responding to content blockingUnderstanding and responding to content blocking
Understanding and responding to content blocking
 
Semantic e commerce
Semantic e commerceSemantic e commerce
Semantic e commerce
 
Pivotal Tracker - Research Findings
Pivotal Tracker - Research FindingsPivotal Tracker - Research Findings
Pivotal Tracker - Research Findings
 
What can media learn from game analytics
What can media learn from game analyticsWhat can media learn from game analytics
What can media learn from game analytics
 
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
 
Data + Audience: Connecting to Create Impact
Data + Audience: Connecting to Create ImpactData + Audience: Connecting to Create Impact
Data + Audience: Connecting to Create Impact
 
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture SeriesIdentifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
 
Pinpoint, Prepare, and Perform with Web Analytics
Pinpoint, Prepare, and Perform with Web AnalyticsPinpoint, Prepare, and Perform with Web Analytics
Pinpoint, Prepare, and Perform with Web Analytics
 
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
 
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
 
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
 
UXPA 2023: Learn how to get over personas by swiping right on user roles
UXPA 2023: Learn how to get over personas by swiping right on user rolesUXPA 2023: Learn how to get over personas by swiping right on user roles
UXPA 2023: Learn how to get over personas by swiping right on user roles
 
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
 
Help Me, Help You: Supporting Your Data
Help Me, Help You: Supporting Your DataHelp Me, Help You: Supporting Your Data
Help Me, Help You: Supporting Your Data
 
Nicholas Gorski: Real-time revenue science at Twitter
Nicholas Gorski: Real-time revenue science at TwitterNicholas Gorski: Real-time revenue science at Twitter
Nicholas Gorski: Real-time revenue science at Twitter
 
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + DemosDrools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação
 
Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018
 

Plus de Torben Brodt

Living Labs Challenge Workshop
Living Labs Challenge WorkshopLiving Labs Challenge Workshop
Living Labs Challenge WorkshopTorben Brodt
 
Recommender Trends 2014
Recommender Trends 2014Recommender Trends 2014
Recommender Trends 2014Torben Brodt
 
Open recommendation platform
Open recommendation platformOpen recommendation platform
Open recommendation platformTorben Brodt
 
#TOA13 - Tech Opoen Air Recommender Hackathon
#TOA13 - Tech Opoen Air Recommender Hackathon#TOA13 - Tech Opoen Air Recommender Hackathon
#TOA13 - Tech Opoen Air Recommender HackathonTorben Brodt
 
Algorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp DigitalAlgorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp DigitalTorben Brodt
 
Realtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands onRealtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands onTorben Brodt
 
Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04Torben Brodt
 
SIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender systemSIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender systemTorben Brodt
 
Content recommendations
Content recommendationsContent recommendations
Content recommendationsTorben Brodt
 
RecSys2012 inside the plista contest
RecSys2012   inside the plista contestRecSys2012   inside the plista contest
RecSys2012 inside the plista contestTorben Brodt
 
Webhacks am Beispiel PHP + MySQL
Webhacks am Beispiel PHP + MySQLWebhacks am Beispiel PHP + MySQL
Webhacks am Beispiel PHP + MySQLTorben Brodt
 
Collaborative Filtering.. für automatische Empfehlungen
Collaborative Filtering.. für automatische EmpfehlungenCollaborative Filtering.. für automatische Empfehlungen
Collaborative Filtering.. für automatische EmpfehlungenTorben Brodt
 
Google Web Toolkit
Google Web ToolkitGoogle Web Toolkit
Google Web ToolkitTorben Brodt
 
Geld Verdienen Mit Adsense
Geld Verdienen Mit AdsenseGeld Verdienen Mit Adsense
Geld Verdienen Mit AdsenseTorben Brodt
 
Web 2.0 - "Fluch oder Segen"
Web 2.0 - "Fluch oder Segen"Web 2.0 - "Fluch oder Segen"
Web 2.0 - "Fluch oder Segen"Torben Brodt
 

Plus de Torben Brodt (18)

Living Labs Challenge Workshop
Living Labs Challenge WorkshopLiving Labs Challenge Workshop
Living Labs Challenge Workshop
 
Recommender Trends 2014
Recommender Trends 2014Recommender Trends 2014
Recommender Trends 2014
 
Nrs2013 recap
Nrs2013 recapNrs2013 recap
Nrs2013 recap
 
Open recommendation platform
Open recommendation platformOpen recommendation platform
Open recommendation platform
 
#TOA13 - Tech Opoen Air Recommender Hackathon
#TOA13 - Tech Opoen Air Recommender Hackathon#TOA13 - Tech Opoen Air Recommender Hackathon
#TOA13 - Tech Opoen Air Recommender Hackathon
 
Algorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp DigitalAlgorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp Digital
 
Realtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands onRealtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands on
 
Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04
 
SIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender systemSIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender system
 
Content recommendations
Content recommendationsContent recommendations
Content recommendations
 
RecSys2012 inside the plista contest
RecSys2012   inside the plista contestRecSys2012   inside the plista contest
RecSys2012 inside the plista contest
 
Webhacks am Beispiel PHP + MySQL
Webhacks am Beispiel PHP + MySQLWebhacks am Beispiel PHP + MySQL
Webhacks am Beispiel PHP + MySQL
 
GIT / SVN
GIT / SVNGIT / SVN
GIT / SVN
 
Collaborative Filtering.. für automatische Empfehlungen
Collaborative Filtering.. für automatische EmpfehlungenCollaborative Filtering.. für automatische Empfehlungen
Collaborative Filtering.. für automatische Empfehlungen
 
Google Web Toolkit
Google Web ToolkitGoogle Web Toolkit
Google Web Toolkit
 
Geld Verdienen Mit Adsense
Geld Verdienen Mit AdsenseGeld Verdienen Mit Adsense
Geld Verdienen Mit Adsense
 
AJAX
AJAXAJAX
AJAX
 
Web 2.0 - "Fluch oder Segen"
Web 2.0 - "Fluch oder Segen"Web 2.0 - "Fluch oder Segen"
Web 2.0 - "Fluch oder Segen"
 

Dernier

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Paper the plista dataset

  • 1. The plista Dataset ACM RecSys 2013, Hong Kong Authors: Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias Speaker: Brodt, Torben International News Recommender Systems Workshop and Challenge October 13th, 2013
  • 2. Introduction and Motivation ● Context: News Article Recommendation
  • 3. Introduction and Motivation ● Do we need another recommendation data set? we have ... ● What features are those data sets missing? ● What requirements entail news articles for recommendation?
  • 4. Introduction and Motivation ● Features that had not been available in existing data sets: ○ contextual features: device, operating system, browser, etc. ○ cross-domain features: 13 different news providers included ○ different interaction types: interactions with recommendations (clicks), as well as news items (impressions) ○ content features: headline, URL, images, text snippets, etc.
  • 5. Introduction and Motivation ● Additional requirements for recommending news articles ○ real-time → recommendations must be provided within a short time interval (< 200ms) ○ changing relevancy → items’ relevancy decreases with time ○ dynamics → new news items are being continuously added ● Requirements inherent to existing recommender systems: ○ sparsity → users typically read only few news articles ○ cold start → systems refrain from requesting users to create profiles; this results in a majority of small user profiles
  • 6. Dataset characteristics { // json "type": "impression", "context": { "simple": { "27": 418, // publisher "14": 31721, // widget ... }, "lists": { "10": [100, 101] // channel } ... specs hosted at http://orp.plista. api } com
  • 7. Dataset characteristics ● object types ○ ○ ○ ○ impressions → users reading news articles clicks → users clicking recommendations creates → news articles being created updates → news articles being updated api specs hosted at http://orp.plista. com
  • 9. Dataset usage ● Evaluation based on Click-Through-Rate (CTR) ● ~ 84 million impressions ● ~ 1 million clicks
  • 10. Dataset usage ● evaluation cross-news portal recommenders ● 10 - 36 % user overlap in between different news portals
  • 11. Dataset usage ● news portal comparisons ● do we observe similar user behaviour on news portals offering similar content?
  • 12. Dataset usage ● evaluating contextual recommendation algorithms ● sensitive to ○ weekday ○ hour of day ○ ...
  • 13. Dataset usage When using the data set you may consider… ● … we identify users by session IDs ○ ○ individual users may have several IDs users sharing their device might be mapped to one ID ● … interactions (clicks, impressions) and content dynamics (creates, updates) differ between news portals ● … contents are restricted to German ● … preferences are represented on a binary scale (user read article, user clicked recommendation) ● … clicking on recommendations might not reveal the actual relevancy of an item
  • 14. Conclusions ● we introduce a new data set intended to support recommender systems research ● we outlined novel features which existing data sets lacked ● we presented scenarios which can be evaluated using the data set ● we pointed to critical aspects which ought to be considered when working with the data set
  • 15. Summary ● news articles ○ of ~13 publishers ● transactional data ○ Impressions ○ Clicks ● contextual data ○ of ~50 attributes ● cross domain application
  • 16. The plista Dataset @inproceedings{Kille:2013, title = {The plista Dataset}, author = { Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias }, booktitle = { NRS'13: Proceedings of the International Workshop and Challenge on News Recommender Systems }, year = {2013}, month = {10}, location = {Hong Kong, China}, publisher = {ACM}, pages={14--21} }