SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Mendeley, putting data
     into the hands of
           researchers




           Kris Jack, PhD
Data Mining Team Coordinator
“All the time we are very
conscious of the huge
challenges that human
society has now – curing
cancer, understanding the
brain for Alzheimer‘s [...].
But a lot of the state of
knowledge of the human race
is sitting in the scientists’
computers, and is currently
not shared […] We need to
get it unlocked so we can
tackle those huge problems.“
Summary

➔
    idea behind mendeley
➔
    our features

➔
    our technical challenges and solutions

➔
    what does this mean for you?
Mendeley                Last.fm
                                                3) Last.fm builds your music
              works like this:                  profile and recommends you
                                                music you also could like...
1) Install “Audioscrobbler”                     and it’s the world‘s biggest
                                                open music database




                              2) Listen to
                              music
Mendeley   Last.fm


music libraries             research libraries


artists                     researchers


songs                       papers


genres                      disciplines
Summary

➔
    idea behind mendeley

➔
    our features
➔
    our technical challenges and solutions

➔
    what does this mean for you?
Mendeley helps researchers work smarter
Mendeley helps researchers work smarter




Install
Mendeley Desktop




    Mendeley extracts
      research data..
Mendeley helps researchers work smarter


                   ..and aggregates research
                             data in the cloud




   Mendeley extracts
     research data..
By doing this, Mendeley makes science
more collaborative and transparent
Summary

➔
    idea behind mendeley

➔
    our features

➔
  our technical challenges and
solutions
➔
    what does this mean for you?
500,000+ users; the 20 largest userbases:
                   University of Cambridge
                        Stanford University
                                              MIT
                           University of Michigan
                                Harvard University
                                University of Oxford
                               Sao Paulo University
                             Imperial College London
                               University of Edinburgh
                                     Cornell University
                       University of California at Berkeley
                                               RWTH Aachen
                                        Columbia University
                                                    Georgia Tech
                                        University of Wisconsin
                                                     UC San Diego
39,000,000+ articles                   University of California at LA
                                                 University of Florida
                                            University of North Carolina
we can only use algorithms that scale up




readership statistics
                                                  search




  most frequent tags    related research   + dozens of other services
most frequent tags on our scale




readership statistics
                                           search




  most frequent tags    related research
most frequent tags on our scale

                                                       most frequent tags
     called 39,000,000 times



                  for each document
                     for each tag in document
                        increment count for tag
called ~3 times
                  sort tags by frequency


                               called ~39,000,000 x 3 = ~117,000,000 times
solution: distributed computing
   map reduce


       for each document
          for each tag in document
             increment count for tag

       sort tags by frequency
       for each tag counted
          emit the tag and frequency

       MapReduce: Simplified Data Processing on Large Clusters
       In Proceedings of OSDI 2004, San Francisco, CA, 2004.
       Jeffrey Dean and Sanjay Ghemawat
solution: distributed computing
   hadoop




       MapReduce: Simplified Data Processing on Large Clusters
       In Proceedings of OSDI 2004, San Francisco, CA, 2004.
       Jeffrey Dean and Sanjay Ghemawat
support vector machines
hidden markov models
conditional random fields
              Isaac G. Councill, C. Lee Giles, Min-Yen Kan. (2008) ParsCit:
               An open-source CRF reference string parsing package.
              In Proceedings of the LREC 08, Marrakesh, Morrocco.
deduplication            crowd sourcing new articles from users




              collapse metadata and update canonical docs




 file hash check

                                                      metadata comparison




           document fingerprinting
                                              39,000,000 canonical documents
statistics




        pig
readerrank
currently tf-idf similarity between
 documents
developing collaborative filtering
contact recommendations




                          currently recommendations
                           based on contact network
                          developing version based
                           on interests
Summary

➔
    idea behind mendeley

➔
    our features

➔
    our technical challenges and solutions

➔
    what does this mean for you?
access to data
online catalog
datatel data set




        online article view logs                    article tags




                    library readership   library stars
Mendeley's API
*new* you can get all of the articles in a group
 - data for you to test related research algos?
Mendeley's API
                 Mashups with
                 data on:

                     Chemical
                     compounds

                     Locations

                     Alzheimer’s
                     research
                     Grant funding

                     Twitter streams
want more?

    let us know...
“All the time we are very
conscious of the huge
challenges that human
society has now – curing
cancer, understanding the
brain for Alzheimer‘s [...].
But a lot of the state of
knowledge of the human race
is sitting in the scientists’
computers, and is currently
not shared […] We need to
get it unlocked so we can
tackle those huge problems.“
www.mendeley.com




   we're hiring!

Contenu connexe

Similaire à Mendeley, putting data into the hands of researchers

IFTF Future of Science Panel
IFTF Future of Science PanelIFTF Future of Science Panel
IFTF Future of Science PanelWilliam Gunn
 
Strata 2012: Big Data and Bibliometrics
Strata 2012: Big Data and BibliometricsStrata 2012: Big Data and Bibliometrics
Strata 2012: Big Data and BibliometricsWilliam Gunn
 
Towards a Cloud Library
Towards a Cloud LibraryTowards a Cloud Library
Towards a Cloud LibraryRachel Frick
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)Frank van Harmelen
 
Telstar cambridge-2010-07-22-im.key
Telstar cambridge-2010-07-22-im.keyTelstar cambridge-2010-07-22-im.key
Telstar cambridge-2010-07-22-im.keyIan Mulvany
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
EMTACL 2012: Connecting Researchers to Information - and Unlocking It!
EMTACL 2012: Connecting Researchers to Information - and Unlocking It!EMTACL 2012: Connecting Researchers to Information - and Unlocking It!
EMTACL 2012: Connecting Researchers to Information - and Unlocking It!William Gunn
 
VIVO 2012: Connecting Researcher to Information - and Unlocking It!
VIVO 2012: Connecting Researcher to Information - and Unlocking It!VIVO 2012: Connecting Researcher to Information - and Unlocking It!
VIVO 2012: Connecting Researcher to Information - and Unlocking It!William Gunn
 
Making the web work for science - RIT Dean's Lecture Series
Making the web work for science - RIT Dean's Lecture SeriesMaking the web work for science - RIT Dean's Lecture Series
Making the web work for science - RIT Dean's Lecture SeriesKaitlin Thaney
 
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...James Powell
 
The Internet, Science, and Transformations of Knowledge
The Internet, Science, and Transformations of KnowledgeThe Internet, Science, and Transformations of Knowledge
The Internet, Science, and Transformations of KnowledgeEric Meyer
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Program of Academic Excellence
Program of Academic ExcellenceProgram of Academic Excellence
Program of Academic ExcellenceDarrell W. Gunter
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literaturepetermurrayrust
 

Similaire à Mendeley, putting data into the hands of researchers (20)

IFTF Future of Science Panel
IFTF Future of Science PanelIFTF Future of Science Panel
IFTF Future of Science Panel
 
Strata 2012: Big Data and Bibliometrics
Strata 2012: Big Data and BibliometricsStrata 2012: Big Data and Bibliometrics
Strata 2012: Big Data and Bibliometrics
 
Towards a Cloud Library
Towards a Cloud LibraryTowards a Cloud Library
Towards a Cloud Library
 
RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
Telstar cambridge-2010-07-22-im.key
Telstar cambridge-2010-07-22-im.keyTelstar cambridge-2010-07-22-im.key
Telstar cambridge-2010-07-22-im.key
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
EMTACL 2012: Connecting Researchers to Information - and Unlocking It!
EMTACL 2012: Connecting Researchers to Information - and Unlocking It!EMTACL 2012: Connecting Researchers to Information - and Unlocking It!
EMTACL 2012: Connecting Researchers to Information - and Unlocking It!
 
VIVO 2012: Connecting Researcher to Information - and Unlocking It!
VIVO 2012: Connecting Researcher to Information - and Unlocking It!VIVO 2012: Connecting Researcher to Information - and Unlocking It!
VIVO 2012: Connecting Researcher to Information - and Unlocking It!
 
Qualifying Online Information Resources for Chemists
Qualifying Online Information Resources for ChemistsQualifying Online Information Resources for Chemists
Qualifying Online Information Resources for Chemists
 
Making the web work for science - RIT Dean's Lecture Series
Making the web work for science - RIT Dean's Lecture SeriesMaking the web work for science - RIT Dean's Lecture Series
Making the web work for science - RIT Dean's Lecture Series
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
 
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
 
The Internet, Science, and Transformations of Knowledge
The Internet, Science, and Transformations of KnowledgeThe Internet, Science, and Transformations of Knowledge
The Internet, Science, and Transformations of Knowledge
 
Checking, Curating And Qualifying Chemistry
Checking, Curating And Qualifying ChemistryChecking, Curating And Qualifying Chemistry
Checking, Curating And Qualifying Chemistry
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Program of Academic Excellence
Program of Academic ExcellenceProgram of Academic Excellence
Program of Academic Excellence
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 

Plus de Kris Jack

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ MendeleyKris Jack
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Kris Jack
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Kris Jack
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemKris Jack
 
Mendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesMendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesKris Jack
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutKris Jack
 
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyKris Jack
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesKris Jack
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Kris Jack
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionKris Jack
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...Kris Jack
 

Plus de Kris Jack (12)

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ Mendeley
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
 
Mendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesMendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data Challenges
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with Mahout
 
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similarities
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language Acquisition
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
 

Dernier

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Dernier (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Mendeley, putting data into the hands of researchers

  • 1. Mendeley, putting data into the hands of researchers Kris Jack, PhD Data Mining Team Coordinator
  • 2. “All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...]. But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“
  • 3. Summary ➔ idea behind mendeley ➔ our features ➔ our technical challenges and solutions ➔ what does this mean for you?
  • 4. Mendeley Last.fm 3) Last.fm builds your music works like this: profile and recommends you music you also could like... 1) Install “Audioscrobbler” and it’s the world‘s biggest open music database 2) Listen to music
  • 5. Mendeley Last.fm music libraries research libraries artists researchers songs papers genres disciplines
  • 6. Summary ➔ idea behind mendeley ➔ our features ➔ our technical challenges and solutions ➔ what does this mean for you?
  • 8. Mendeley helps researchers work smarter Install Mendeley Desktop Mendeley extracts research data..
  • 9. Mendeley helps researchers work smarter ..and aggregates research data in the cloud Mendeley extracts research data..
  • 10. By doing this, Mendeley makes science more collaborative and transparent
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. Summary ➔ idea behind mendeley ➔ our features ➔ our technical challenges and solutions ➔ what does this mean for you?
  • 19. 500,000+ users; the 20 largest userbases: University of Cambridge Stanford University MIT University of Michigan Harvard University University of Oxford Sao Paulo University Imperial College London University of Edinburgh Cornell University University of California at Berkeley RWTH Aachen Columbia University Georgia Tech University of Wisconsin UC San Diego 39,000,000+ articles University of California at LA University of Florida University of North Carolina
  • 20. we can only use algorithms that scale up readership statistics search most frequent tags related research + dozens of other services
  • 21. most frequent tags on our scale readership statistics search most frequent tags related research
  • 22. most frequent tags on our scale most frequent tags called 39,000,000 times for each document for each tag in document increment count for tag called ~3 times sort tags by frequency called ~39,000,000 x 3 = ~117,000,000 times
  • 23. solution: distributed computing map reduce for each document for each tag in document increment count for tag sort tags by frequency for each tag counted emit the tag and frequency MapReduce: Simplified Data Processing on Large Clusters In Proceedings of OSDI 2004, San Francisco, CA, 2004. Jeffrey Dean and Sanjay Ghemawat
  • 24. solution: distributed computing hadoop MapReduce: Simplified Data Processing on Large Clusters In Proceedings of OSDI 2004, San Francisco, CA, 2004. Jeffrey Dean and Sanjay Ghemawat
  • 26. conditional random fields Isaac G. Councill, C. Lee Giles, Min-Yen Kan. (2008) ParsCit: An open-source CRF reference string parsing package. In Proceedings of the LREC 08, Marrakesh, Morrocco.
  • 27. deduplication crowd sourcing new articles from users collapse metadata and update canonical docs file hash check metadata comparison document fingerprinting 39,000,000 canonical documents
  • 28. statistics pig
  • 30. currently tf-idf similarity between documents developing collaborative filtering
  • 31. contact recommendations currently recommendations based on contact network developing version based on interests
  • 32. Summary ➔ idea behind mendeley ➔ our features ➔ our technical challenges and solutions ➔ what does this mean for you?
  • 34. online catalog datatel data set online article view logs article tags library readership library stars
  • 36. *new* you can get all of the articles in a group - data for you to test related research algos?
  • 37. Mendeley's API Mashups with data on: Chemical compounds Locations Alzheimer’s research Grant funding Twitter streams
  • 38.
  • 39. want more? let us know...
  • 40. “All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...]. But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“
  • 41. www.mendeley.com we're hiring!