SlideShare une entreprise Scribd logo
1  sur  17
Mining Cross-Domain Rating Datasets
from Structured Data on Twitter
@sidooms
Simon Dooms
Rating Datasets
 What are ratings? Explicit user preference information
 Why ratings? Recommender systems
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 2
Rating Datasets
 What are ratings? Explicit user preference information
 Why ratings? Recommender systems
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 3
Ratings Scarcity in Research
 Ratings = private data
 Public datasets to the rescue?
– MovieLens 100K (1998)
– MovieLens 1M (2000)
– MovieLens 10M (2008)
– More on recsyswiki.com
Old, Synthetic Datasets
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 4
Social Sharing = Ratings Goldmine
 Previous research: MovieTweetings
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 5
Social Sharing = Ratings Goldmine
 Previous research: MovieTweetings
– Movie Rating dataset from IMDb – Twitter
– https://github.com/sidooms/MovieTweetings
 What about other domains? Websites?
Well, let’s try it out!
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 6
Target Websites - Goodreads
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 7
Twitter user - Rating - Book title
Book author - Goodreads URL - Time
Target Websites - Pandora
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 8
Twitter user - Song
Pandora URL - Time
Target Websites - YouTube
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 9
Twitter user - (Video uploader)
YouTube URL - Time
Mining Experiment
 But words are wind…
– 2 Weeks experiment
– 4 Online platforms
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 10
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 12
Python code + Task Scheduler = Dataset files
https://github.com/sidooms/Twitter-ratings
The Numbers
One more thing …
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 13
Cross-Domain Rating Dataset
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 14
Applications
 Collect ratings for recsys research / input
 Cross-domain recsys research
 Trend detection, analytics, ...
 Applicable for all social sharing webs
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 15
Conclusions
 Ratings scarcity in research
 Public dataset are old and synthetic
 Social sharing = ratings goldmine
 2 week experiment, 4 major websites
 Python code & datasets on Github
 True cross-domain ratings dataset
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 16
@sidooms
Simon Dooms
Mining Cross-Domain Rating Datasets
from Structured Data on Twitter

Contenu connexe

Plus de Simon Dooms

Caching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systemsCaching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systems
Simon Dooms
 

Plus de Simon Dooms (7)

PhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems
PhD Defense: Dynamic Generation of Personalized Hybrid Recommender SystemsPhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems
PhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems
 
An online evaluation of explicit feedback mechanisms for recommender systems
An online evaluation of explicit feedback mechanisms for recommender systemsAn online evaluation of explicit feedback mechanisms for recommender systems
An online evaluation of explicit feedback mechanisms for recommender systems
 
Dynamic generation of personalized hybrid recommender systems
Dynamic generation of personalized hybrid recommender systemsDynamic generation of personalized hybrid recommender systems
Dynamic generation of personalized hybrid recommender systems
 
Improving IMDb Movie Recommendations with Interactive Settings and Filters
Improving IMDb Movie Recommendations with Interactive Settings and FiltersImproving IMDb Movie Recommendations with Interactive Settings and Filters
Improving IMDb Movie Recommendations with Interactive Settings and Filters
 
Caching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systemsCaching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systems
 
A User-centric Evaluation of Recommender Algorithms for an Event Recommendati...
A User-centric Evaluation of Recommender Algorithms for an Event Recommendati...A User-centric Evaluation of Recommender Algorithms for an Event Recommendati...
A User-centric Evaluation of Recommender Algorithms for an Event Recommendati...
 
A File-Based Approach for Recommender Systems in High-Performance Computing E...
A File-Based Approach for Recommender Systems in High-Performance Computing E...A File-Based Approach for Recommender Systems in High-Performance Computing E...
A File-Based Approach for Recommender Systems in High-Performance Computing E...
 

Dernier

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Dernier (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 

Mining Cross-Domain Rating Datasets from Structured Data on Twitter

  • 1. Mining Cross-Domain Rating Datasets from Structured Data on Twitter @sidooms Simon Dooms
  • 2. Rating Datasets  What are ratings? Explicit user preference information  Why ratings? Recommender systems ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 2
  • 3. Rating Datasets  What are ratings? Explicit user preference information  Why ratings? Recommender systems ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 3
  • 4. Ratings Scarcity in Research  Ratings = private data  Public datasets to the rescue? – MovieLens 100K (1998) – MovieLens 1M (2000) – MovieLens 10M (2008) – More on recsyswiki.com Old, Synthetic Datasets ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 4
  • 5. Social Sharing = Ratings Goldmine  Previous research: MovieTweetings ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 5
  • 6. Social Sharing = Ratings Goldmine  Previous research: MovieTweetings – Movie Rating dataset from IMDb – Twitter – https://github.com/sidooms/MovieTweetings  What about other domains? Websites? Well, let’s try it out! ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 6
  • 7. Target Websites - Goodreads ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 7 Twitter user - Rating - Book title Book author - Goodreads URL - Time
  • 8. Target Websites - Pandora ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 8 Twitter user - Song Pandora URL - Time
  • 9. Target Websites - YouTube ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 9 Twitter user - (Video uploader) YouTube URL - Time
  • 10. Mining Experiment  But words are wind… – 2 Weeks experiment – 4 Online platforms ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 10
  • 11.
  • 12. ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 12 Python code + Task Scheduler = Dataset files https://github.com/sidooms/Twitter-ratings
  • 13. The Numbers One more thing … ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 13
  • 14. Cross-Domain Rating Dataset ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 14
  • 15. Applications  Collect ratings for recsys research / input  Cross-domain recsys research  Trend detection, analytics, ...  Applicable for all social sharing webs ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 15
  • 16. Conclusions  Ratings scarcity in research  Public dataset are old and synthetic  Social sharing = ratings goldmine  2 week experiment, 4 major websites  Python code & datasets on Github  True cross-domain ratings dataset ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 16
  • 17. @sidooms Simon Dooms Mining Cross-Domain Rating Datasets from Structured Data on Twitter