SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Mendeley:
    crowdsourcing and
recommending research
       on a large scale




     Kris Jack, PhD
  Data Mining Team Lead
Summary

➔
    what is mendeley?

➔
    crowdsourcing on a large scale

➔
    recommendations on a large scale

➔
    data for you
Mendeley is...


...a startup         ...going to change
company                 the way that we
                          do research...
Mendeley provides tools to help users...
                 ...collaborate with
                     one another
...organise                            ...discover new
their research                                research
Mendeley provides tools to help users...
                 ...collaborate with
                     one another
...organise                            ...discover new
their research                                research
Mendeley provides tools to help users...
                 ...collaborate with
                     one another
...organise                            ...discover new
their research                                research
Mendeley provides tools to help users...
                 ...collaborate with
                     one another
...organise                            ...discover new
their research                                research
Mendeley provides tools to help users...
                 ...collaborate with
                     one another
...organise                            ...discover new
their research                                research
Summary
Summary

➔
    what is mendeley?

➔
    crowdsourcing on a large scale

➔
    recommendations on a large scale

➔
    data for you
Mendeley          Last.fm
                                                   3) Last.fm builds your music
                works like this:                   profile and recommends you
                                                   music you also could like
1) Install “Audioscrobbler”




                                                             and it’s the world’s
                                                             largest open music
                              2) Listen to music             database!
Mendeley   Last.fm
music libraries                 research libraries


artists                         researchers


songs                           papers


genres                          disciplines




                                              Screenshot taken from
    Mendeley is the world’s                   www.mendeley.com
    largest crowdsourced                      on 04/09/11
    research catalogue!
Catalogue Crowdsourcing:
System Requirements



assimilate research artefacts
into catalogue in real time
(pdfs + citation metadata)




                          recognise duplicate and
                          non-duplicate artefacts
                          in noisy input
Main sources of input:
                          Main types of input:
                        → Mendeley Desktop
                        → Mendeley Web Importer
                          → article PDFs
                        → External catalogue imports (e.g. ArXiv)
                          → article metadata (e.g. reference)
articles                → External catalogue lookups (e.g.
                        CrossRef)




           catalogue generator




                                                 catalogue
articles




                         catalogue generator




Aims:

→ Cluster documents together
→ Generate catalogue entries

                                               catalogue
articles




                           catalogue generator


Process:

→ Filehash check (SHA-1)
→ Identifier check (e.g. PubMed id)
→ Document fingerprint (full text)
→ Metadata similarity check
→ Update individual article page                 catalogue
articles




Catalogue with:
                          catalogue generator
→ article metadata
→ aggregated statistics
→ support recs, etc.




                                                catalogue
Summary
Summary

➔
    what is mendeley?

➔
    crowdsourcing on a large scale

➔
    recommendations on a large scale

➔
    what does this mean for you?
Article Recommendation:
System Requirements



generate personal article
recommendations for users
(i.e. “here are some articles
that may interest you”)

                                update recommendations
                                every 24 hours
Input:
User libraries




                 Output:
                 Recommend 10
                 articles to each user
Recommendation through          Test:
collaborative filtering         10-fold cross validation
                                50,000 user libraries
Article's in library or not
(e.g. binary input)                  16 months ago

Various similarity metrics
(e.g. cooccurrence,
loglikelihood, tanimoto)




       Results:
       <0.025 precision at 10
Recommendation through        Test:
collaborative filtering       10-fold cross validation
                              50,000 user libraries
Article's in library or not        10 months ago
(e.g. binary input)                (i.e. + 6 months)

Various similarity metrics
(e.g. cooccurrence,
loglikelihood, tanimoto)




       Results:
       ~0.1 precision at 10
Recommendation through        Test:
collaborative filtering       Release to a subset of
                              users
Article's in library or not        10 months ago
(e.g. binary input)                (i.e. + 6 months)

Various similarity metrics
(e.g. cooccurrence,
loglikelihood, tanimoto)




       Results:
       ~0.4 precision at 10
Article Recommendation Acceptance Rates
Acceptance rate (i.e. accept/reject clicks)




                                                 Number of months live
Article Recommendation:
System Requirements

                                      1 million users!

generate personal article
recommendations users
(i.e. “here are some articles                            days!
that may interest you”)

                                update recommendations
                                every 24 hours



        How to scale up?
Test:
                                       10-fold cross validation
                                       50,000 user libraries




So, results comparable to non-   Completely distributed, so can
distributed recommender          easily run on EC2 within 24
                                 hours...
Article Recommendation Precision Across User
     Library Sizes (using cooccurrence)
Precision at 10 articles




                                  How will real
                                  users react?




                           Number of articles in user library
Summary
Summary

➔
    what is mendeley?

➔
    crowdsourcing on a large scale

➔
    recommendations on a large scale

➔
    data for you
Public Data


                               user libraries

                           50,000 libraries
                          4,848,724 articles
                       3,652,285 unique articles




          library readership                    library stars




    Obtain from: http://dev.mendeley.com/datachallenge
Mendeley's API
www.mendeley.com

Contenu connexe

Similaire à Mendeley: crowdsourcing and recommending research on a large scale

Mendeley and Activity Data
Mendeley and Activity DataMendeley and Activity Data
Mendeley and Activity Data
Ian Mulvany
 
Mendeley teaching presentation_0981_template
Mendeley teaching presentation_0981_templateMendeley teaching presentation_0981_template
Mendeley teaching presentation_0981_template
William Gunn
 

Similaire à Mendeley: crowdsourcing and recommending research on a large scale (20)

DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from Mendeley
 
Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchers
 
Mendeley manual
Mendeley manualMendeley manual
Mendeley manual
 
Usage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital LibraryUsage-Based vs. Citation-Based Recommenders in a Digital Library
Usage-Based vs. Citation-Based Recommenders in a Digital Library
 
Session 2_Mendeley_2.pdf
Session 2_Mendeley_2.pdfSession 2_Mendeley_2.pdf
Session 2_Mendeley_2.pdf
 
Mendeley Institutional Edition - Universiti Kebangasaan Malaysia
Mendeley Institutional Edition - Universiti Kebangasaan MalaysiaMendeley Institutional Edition - Universiti Kebangasaan Malaysia
Mendeley Institutional Edition - Universiti Kebangasaan Malaysia
 
Mendeley software presentation
Mendeley software presentationMendeley software presentation
Mendeley software presentation
 
Mendeley Teaching Presentation
Mendeley Teaching PresentationMendeley Teaching Presentation
Mendeley Teaching Presentation
 
000000-tutorial_mendeley.pdf
000000-tutorial_mendeley.pdf000000-tutorial_mendeley.pdf
000000-tutorial_mendeley.pdf
 
Building bibliographies and managing citations with Mendeley
Building bibliographies and managing citations with MendeleyBuilding bibliographies and managing citations with Mendeley
Building bibliographies and managing citations with Mendeley
 
Academic SEO, or: How do I get my research to show up in search engines and d...
Academic SEO, or: How do I get my research to show up in search engines and d...Academic SEO, or: How do I get my research to show up in search engines and d...
Academic SEO, or: How do I get my research to show up in search engines and d...
 
Introduction to-mendeley presentation-2014
Introduction to-mendeley presentation-2014Introduction to-mendeley presentation-2014
Introduction to-mendeley presentation-2014
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
 
Mendeley and Activity Data
Mendeley and Activity DataMendeley and Activity Data
Mendeley and Activity Data
 
Libraries meet research 2.0
Libraries meet research 2.0Libraries meet research 2.0
Libraries meet research 2.0
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
 
Literature Searching For Your Summer Scholarship 2011 - Science and Engineering
Literature Searching For Your Summer Scholarship 2011 - Science and EngineeringLiterature Searching For Your Summer Scholarship 2011 - Science and Engineering
Literature Searching For Your Summer Scholarship 2011 - Science and Engineering
 
Mendeley teaching presentation_0981_template
Mendeley teaching presentation_0981_templateMendeley teaching presentation_0981_template
Mendeley teaching presentation_0981_template
 
Mendeley Workshop Presentation
Mendeley Workshop PresentationMendeley Workshop Presentation
Mendeley Workshop Presentation
 

Plus de Kris Jack

Plus de Kris Jack (13)

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ Mendeley
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
 
Mendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesMendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data Challenges
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with Mahout
 
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similarities
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language Acquisition
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
 
Recommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureRecommendation Engines for Scientific Literature
Recommendation Engines for Scientific Literature
 

Dernier

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Dernier (20)

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 

Mendeley: crowdsourcing and recommending research on a large scale

  • 1. Mendeley: crowdsourcing and recommending research on a large scale Kris Jack, PhD Data Mining Team Lead
  • 2. Summary ➔ what is mendeley? ➔ crowdsourcing on a large scale ➔ recommendations on a large scale ➔ data for you
  • 3. Mendeley is... ...a startup ...going to change company the way that we do research...
  • 4. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research
  • 5. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research
  • 6. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research
  • 7.
  • 8. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research
  • 9. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research
  • 10. Summary Summary ➔ what is mendeley? ➔ crowdsourcing on a large scale ➔ recommendations on a large scale ➔ data for you
  • 11. Mendeley Last.fm 3) Last.fm builds your music works like this: profile and recommends you music you also could like 1) Install “Audioscrobbler” and it’s the world’s largest open music 2) Listen to music database!
  • 12. Mendeley Last.fm music libraries research libraries artists researchers songs papers genres disciplines Screenshot taken from Mendeley is the world’s www.mendeley.com largest crowdsourced on 04/09/11 research catalogue!
  • 13. Catalogue Crowdsourcing: System Requirements assimilate research artefacts into catalogue in real time (pdfs + citation metadata) recognise duplicate and non-duplicate artefacts in noisy input
  • 14. Main sources of input: Main types of input: → Mendeley Desktop → Mendeley Web Importer → article PDFs → External catalogue imports (e.g. ArXiv) → article metadata (e.g. reference) articles → External catalogue lookups (e.g. CrossRef) catalogue generator catalogue
  • 15. articles catalogue generator Aims: → Cluster documents together → Generate catalogue entries catalogue
  • 16. articles catalogue generator Process: → Filehash check (SHA-1) → Identifier check (e.g. PubMed id) → Document fingerprint (full text) → Metadata similarity check → Update individual article page catalogue
  • 17. articles Catalogue with: catalogue generator → article metadata → aggregated statistics → support recs, etc. catalogue
  • 18. Summary Summary ➔ what is mendeley? ➔ crowdsourcing on a large scale ➔ recommendations on a large scale ➔ what does this mean for you?
  • 19. Article Recommendation: System Requirements generate personal article recommendations for users (i.e. “here are some articles that may interest you”) update recommendations every 24 hours
  • 20. Input: User libraries Output: Recommend 10 articles to each user
  • 21. Recommendation through Test: collaborative filtering 10-fold cross validation 50,000 user libraries Article's in library or not (e.g. binary input) 16 months ago Various similarity metrics (e.g. cooccurrence, loglikelihood, tanimoto) Results: <0.025 precision at 10
  • 22. Recommendation through Test: collaborative filtering 10-fold cross validation 50,000 user libraries Article's in library or not 10 months ago (e.g. binary input) (i.e. + 6 months) Various similarity metrics (e.g. cooccurrence, loglikelihood, tanimoto) Results: ~0.1 precision at 10
  • 23. Recommendation through Test: collaborative filtering Release to a subset of users Article's in library or not 10 months ago (e.g. binary input) (i.e. + 6 months) Various similarity metrics (e.g. cooccurrence, loglikelihood, tanimoto) Results: ~0.4 precision at 10
  • 24. Article Recommendation Acceptance Rates Acceptance rate (i.e. accept/reject clicks) Number of months live
  • 25. Article Recommendation: System Requirements 1 million users! generate personal article recommendations users (i.e. “here are some articles days! that may interest you”) update recommendations every 24 hours How to scale up?
  • 26.
  • 27. Test: 10-fold cross validation 50,000 user libraries So, results comparable to non- Completely distributed, so can distributed recommender easily run on EC2 within 24 hours...
  • 28. Article Recommendation Precision Across User Library Sizes (using cooccurrence) Precision at 10 articles How will real users react? Number of articles in user library
  • 29. Summary Summary ➔ what is mendeley? ➔ crowdsourcing on a large scale ➔ recommendations on a large scale ➔ data for you
  • 30. Public Data user libraries 50,000 libraries 4,848,724 articles 3,652,285 unique articles library readership library stars Obtain from: http://dev.mendeley.com/datachallenge
  • 32.
  • 33.