Mendeley, putting data into the hands of researchers

•

2 j'aime•587 vues

I was invited to give a keynote presentation at the RecSysTEL Workshop (http://bit.ly/b2Bg2J) on 2010/09/30. It presents Mendeley's tools for researchers and data sets that we made available for the dataTEL challenge, designed to provide new large scale data for researcers in recommendation systems. The event was really enjoyable and the participants were excited about Mendeley.

Technologie Formation

Mendeley, putting data
into the hands of
researchers

Kris Jack, PhD
Data Mining Team Coordinator

“All the time we are very
conscious of the huge
challenges that human
society has now – curing
cancer, understanding the
brain for Alzheimer‘s [...].
But a lot of the state of
knowledge of the human race
is sitting in the scientists’
computers, and is currently
not shared […] We need to
get it unlocked so we can
tackle those huge problems.“

Summary

➔
idea behind mendeley
➔
our features

➔
our technical challenges and solutions

➔
what does this mean for you?

Mendeley Last.fm

music libraries research libraries

artists researchers

songs papers

genres disciplines

Mendeley helps researchers work smarter

Install
Mendeley Desktop

Mendeley extracts
research data..

Mendeley helps researchers work smarter

..and aggregates research
data in the cloud

Mendeley extracts
research data..

By doing this, Mendeley makes science
more collaborative and transparent

500,000+ users; the 20 largest userbases:
University of Cambridge
Stanford University
MIT
University of Michigan
Harvard University
University of Oxford
Sao Paulo University
Imperial College London
University of Edinburgh
Cornell University
University of California at Berkeley
RWTH Aachen
Columbia University
Georgia Tech
University of Wisconsin
UC San Diego
39,000,000+ articles University of California at LA
University of Florida
University of North Carolina

we can only use algorithms that scale up

readership statistics
search

most frequent tags related research + dozens of other services

most frequent tags on our scale

readership statistics
search

most frequent tags related research

most frequent tags on our scale

most frequent tags
called 39,000,000 times

for each document
for each tag in document
increment count for tag
called ~3 times
sort tags by frequency

called ~39,000,000 x 3 = ~117,000,000 times

solution: distributed computing
map reduce

for each document
for each tag in document
increment count for tag

sort tags by frequency
for each tag counted
emit the tag and frequency

MapReduce: Simplified Data Processing on Large Clusters
In Proceedings of OSDI 2004, San Francisco, CA, 2004.
Jeffrey Dean and Sanjay Ghemawat

solution: distributed computing
hadoop

MapReduce: Simplified Data Processing on Large Clusters
In Proceedings of OSDI 2004, San Francisco, CA, 2004.
Jeffrey Dean and Sanjay Ghemawat

support vector machines
hidden markov models

conditional random fields
Isaac G. Councill, C. Lee Giles, Min-Yen Kan. (2008) ParsCit:
An open-source CRF reference string parsing package.
In Proceedings of the LREC 08, Marrakesh, Morrocco.

deduplication crowd sourcing new articles from users

collapse metadata and update canonical docs

file hash check

metadata comparison

document fingerprinting
39,000,000 canonical documents

currently tf-idf similarity between
documents
developing collaborative filtering

contact recommendations

currently recommendations
based on contact network
developing version based
on interests

online catalog
datatel data set

online article view logs article tags

library readership library stars

*new* you can get all of the articles in a group
- data for you to test related research algos?

Mendeley's API
Mashups with
data on:

Chemical
compounds

Locations

Alzheimer’s
research
Grant funding

Twitter streams

Contenu connexe

Similaire à Mendeley, putting data into the hands of researchers

IFTF Future of Science PanelWilliam Gunn

Strata 2012: Big Data and BibliometricsWilliam Gunn

Towards a Cloud LibraryRachel Frick

RSC ChemSpider is the online chemistry database where community contributions...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

eScience: A Transformed Scientific MethodDuncan Hull

Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella

How the Web can change social science research (including yours)Frank van Harmelen

Telstar cambridge-2010-07-22-im.keyIan Mulvany

Computation and KnowledgeIan Foster

EMTACL 2012: Connecting Researchers to Information - and Unlocking It!William Gunn

VIVO 2012: Connecting Researcher to Information - and Unlocking It!William Gunn

Qualifying Online Information Resources for ChemistsUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Making the web work for science - RIT Dean's Lecture SeriesKaitlin Thaney

ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Using Architectures for Semantic Interoperability to Create Journal Clubs for...James Powell

The Internet, Science, and Transformations of KnowledgeEric Meyer

Checking, Curating And Qualifying ChemistryUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Learning Systems for ScienceIan Foster

Program of Academic ExcellenceDarrell W. Gunter

Can machines understand the scientific literaturepetermurrayrust

Similaire à Mendeley, putting data into the hands of researchers (20)

IFTF Future of Science Panel

Strata 2012: Big Data and Bibliometrics

Towards a Cloud Library

RSC ChemSpider is the online chemistry database where community contributions...

eScience: A Transformed Scientific Method

Spark Summit Europe: Share and analyse genomic data at scale

How the Web can change social science research (including yours)

Telstar cambridge-2010-07-22-im.key

Computation and Knowledge

EMTACL 2012: Connecting Researchers to Information - and Unlocking It!

VIVO 2012: Connecting Researcher to Information - and Unlocking It!

Qualifying Online Information Resources for Chemists

Making the web work for science - RIT Dean's Lecture Series

ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...

Using Architectures for Semantic Interoperability to Create Journal Clubs for...

The Internet, Science, and Transformations of Knowledge

Checking, Curating And Qualifying Chemistry

Learning Systems for Science

Program of Academic Excellence

Can machines understand the scientific literature

Plus de Kris Jack

Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack

Machine Learning @ MendeleyKris Jack

Mendeley’s Research Catalogue: building it, opening it up and making it even ...Kris Jack

Mendeley Suggest: What will you read next?Kris Jack

Mendeley Suggest: Engineering a Personalised Article Recommender SystemKris Jack

Mendeley's Data and Perspectives on Data ChallengesKris Jack

Scientific Article Recommendation with MahoutKris Jack

Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyKris Jack

improving explicit preference entry by visualising data similaritiesKris Jack

Etude de la pertinence de critères de recherche en recherche d'informations s...Kris Jack

A Computational Model of Staged Language AcquisitionKris Jack

A Collaborative Tool for the Computational Modelling of Child Language Acquis...Kris Jack

Plus de Kris Jack (12)

Modern Perspectives on Recommender Systems and their Applications in Mendeley

Machine Learning @ Mendeley

Mendeley’s Research Catalogue: building it, opening it up and making it even ...

Mendeley Suggest: What will you read next?

Mendeley Suggest: Engineering a Personalised Article Recommender System

Mendeley's Data and Perspectives on Data Challenges

Scientific Article Recommendation with Mahout

Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley

improving explicit preference entry by visualising data similarities

Etude de la pertinence de critères de recherche en recherche d'informations s...

A Computational Model of Staged Language Acquisition

A Collaborative Tool for the Computational Modelling of Child Language Acquis...

Dernier

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

Scale your database traffic with Read & Write split using MySQL RouterMydbops

Decarbonising Buildings: Making a net-zero built environment a realityIES VE

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Rise of the Machines: Known As Drones...Rick Flair

So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

How to write a Business Continuity PlanDatabarracks

Dernier (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

Emixa Mendix Meetup 11 April 2024 about Mendix Native development

What is DBT - The Ultimate Data Build Tool.pdf

Scale your database traffic with Read & Write split using MySQL Router

Decarbonising Buildings: Making a net-zero built environment a reality

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Assure Ecommerce and Retail Operations Uptime with ThousandEyes

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes

Take control of your SAP testing with UiPath Test Suite

Rise of the Machines: Known As Drones...

So einfach geht modernes Roaming fuer Notes und Nomad.pdf

Potential of AI (Generative AI) in Business: Learnings and Insights

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...

The Ultimate Guide to Choosing WordPress Pros and Cons

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

How to write a Business Continuity Plan

Mendeley, putting data into the hands of researchers

1. Mendeley, putting data into the hands of researchers Kris Jack, PhD Data Mining Team Coordinator

2. “All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...]. But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“

3. Summary ➔ idea behind mendeley ➔ our features ➔ our technical challenges and solutions ➔ what does this mean for you?

4. Mendeley Last.fm 3) Last.fm builds your music works like this: profile and recommends you music you also could like... 1) Install “Audioscrobbler” and it’s the world‘s biggest open music database 2) Listen to music

5. Mendeley Last.fm music libraries research libraries artists researchers songs papers genres disciplines

6. Summary ➔ idea behind mendeley ➔ our features ➔ our technical challenges and solutions ➔ what does this mean for you?

7. Mendeley helps researchers work smarter

8. Mendeley helps researchers work smarter Install Mendeley Desktop Mendeley extracts research data..

9. Mendeley helps researchers work smarter ..and aggregates research data in the cloud Mendeley extracts research data..

10. By doing this, Mendeley makes science more collaborative and transparent

11.

12.

13.

14.

15.

16.

17.

18. Summary ➔ idea behind mendeley ➔ our features ➔ our technical challenges and solutions ➔ what does this mean for you?

19. 500,000+ users; the 20 largest userbases: University of Cambridge Stanford University MIT University of Michigan Harvard University University of Oxford Sao Paulo University Imperial College London University of Edinburgh Cornell University University of California at Berkeley RWTH Aachen Columbia University Georgia Tech University of Wisconsin UC San Diego 39,000,000+ articles University of California at LA University of Florida University of North Carolina

20. we can only use algorithms that scale up readership statistics search most frequent tags related research + dozens of other services

21. most frequent tags on our scale readership statistics search most frequent tags related research

22. most frequent tags on our scale most frequent tags called 39,000,000 times for each document for each tag in document increment count for tag called ~3 times sort tags by frequency called ~39,000,000 x 3 = ~117,000,000 times

23. solution: distributed computing map reduce for each document for each tag in document increment count for tag sort tags by frequency for each tag counted emit the tag and frequency MapReduce: Simplified Data Processing on Large Clusters In Proceedings of OSDI 2004, San Francisco, CA, 2004. Jeffrey Dean and Sanjay Ghemawat

24. solution: distributed computing hadoop MapReduce: Simplified Data Processing on Large Clusters In Proceedings of OSDI 2004, San Francisco, CA, 2004. Jeffrey Dean and Sanjay Ghemawat

25. support vector machines hidden markov models

26. conditional random fields Isaac G. Councill, C. Lee Giles, Min-Yen Kan. (2008) ParsCit: An open-source CRF reference string parsing package. In Proceedings of the LREC 08, Marrakesh, Morrocco.

27. deduplication crowd sourcing new articles from users collapse metadata and update canonical docs file hash check metadata comparison document fingerprinting 39,000,000 canonical documents

28. statistics pig

29. readerrank

30. currently tf-idf similarity between documents developing collaborative filtering

31. contact recommendations currently recommendations based on contact network developing version based on interests

32. Summary ➔ idea behind mendeley ➔ our features ➔ our technical challenges and solutions ➔ what does this mean for you?

33. access to data

34. online catalog datatel data set online article view logs article tags library readership library stars

35. Mendeley's API

36. *new* you can get all of the articles in a group - data for you to test related research algos?

37. Mendeley's API Mashups with data on: Chemical compounds Locations Alzheimer’s research Grant funding Twitter streams

38.

39. want more? let us know...

40. “All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...]. But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“

41. www.mendeley.com we're hiring!

Mendeley, putting data into the hands of researchers

Recommandé

Recommandé

Contenu connexe

Similaire à Mendeley, putting data into the hands of researchers

Similaire à Mendeley, putting data into the hands of researchers (20)

Plus de Kris Jack

Plus de Kris Jack (12)

Dernier

Dernier (20)

Mendeley, putting data into the hands of researchers