SlideShare une entreprise Scribd logo
1  sur  67
Télécharger pour lire hors ligne
Mendeley’s Research Catalogue:
building it, opening it up and
making it even more useful for researchers
Kris Jack, PhD
Chief Data Scientist, @_krisjack
Outline
1. What‘s Mendeley?
2. Under the Bonnet
3. Opening up Data
4. Working with Academia
5. Conclusions
What's Mendeley?
Mendeley‘s not just a reference manager
è  Mendeley is a platform that connects
researchers, research data and apps
Mendeley Open API
Mendeley Open API
research catalogue
è  Mendeley is a platform that connects
researchers, research data and apps
...organise
their research
Mendeley provides tools to help users...
è  Reference
management
è  Cite-as-you-
write
è  Full-text
article search
è  Digitalised
annotations
...organise
their research
...collaborate with
one another
Mendeley provides tools to help users...
è  Professional
research groups
è  Social network
è  Annotation
sharing
...organise
their research
...collaborate with
one another
...discover new
research
Mendeley provides tools to help users...
è  Explore crowdsourced
research catalogue
è  Document statistics
è  Personalised article
recommendations
è  Related research
è  Research contact
suggestions
...organise
their research
...collaborate with
one another
...discover new
research
Mendeley provides tools to help users...
...organise
their research
...collaborate with
one another
...discover new
research
Mendeley provides tools to help users...
Social network
(>2.4M users)
Research catalogue
(~85M unique articles)
Research groups
(~240K groups)
Personal libraries
(>425M articles)
Our community from a data perspective
Logging massive
set of usage data
Under the Bonnet
Lots of features to build & support
è  Reference
management
è  Cite-as-you-
write
è  Full-text
article search
è  Digitalised
annotations
è  Professional
research groups
è  Social network
è  Annotation
sharing
è  Explore crowdsourced
research catalogue
è  Document statistics
è  Personalised article
recommendations
è  Related research
è  Research contact
suggestions
Lots of features to build & support
è  Reference
management
è  Cite-as-you-
write
è  Full-text
article search
è  Digitalised
annotations
è  Professional
research groups
è  Social network
è  Annotation
sharing
è  Explore crowdsourced
research catalogue
è  Document statistics
è  Personalised article
recommendations
è  Related research
è  Research contact
suggestions
Lots of features to build & support
è  Reference
management
è  Cite-as-you-
write
è  Full-text
article search
è  Digitalised
annotations
è  Professional
research groups
è  Social network
è  Annotation
sharing
è  Explore crowdsourced
research catalogue
è  Document statistics
è  Personalised article
recommendations
è  Related research
è  Research contact
suggestions
Lots of features to build & support
features
Lots of features to build & support
features
Research catalogue
(~30M unique articles)
Personal libraries
(>100M articles)
Lots of features to build & support
features
Research catalogue
(~30M unique articles)
Personal libraries
(>100M articles)
Crowdsourcing
(deduplication,
metadata
aggregation,
statistics)
The curse of success
•  More articles came
•  More users came
•  Keeping catalogue data fresh was a burden
•  Algorithms relied on global counts
•  Iterating over MySQL tables was slow
•  Needed to shard tables to grow catalogue
•  In short, our backend system didn’t scale
Please try again later
~0.5 million users; the 20 largest user bases:
University of Cambridge
Stanford University
MIT
University of Michigan
Harvard University
University of Oxford
Sao Paulo University
Imperial College London
University of Edinburgh
Cornell University
University of California at Berkeley
RWTH Aachen
Columbia University
Georgia Tech
University of Wisconsin
UC San Diego
University of California at LA
University of Florida
University of North Carolina
~30m research articles
~0.5 million users; the 20 largest user bases:
University of Cambridge
Stanford University
MIT
University of Michigan
Harvard University
University of Oxford
Sao Paulo University
Imperial College London
University of Edinburgh
Cornell University
University of California at Berkeley
RWTH Aachen
Columbia University
Georgia Tech
University of Wisconsin
UC San Diego
University of California at LA
University of Florida
University of North Carolina
~30m research articles
The system started to become
slow.
How long did it take to
generate our daily readership
statistics?
~0.5 million users; the 20 largest user bases:
University of Cambridge
Stanford University
MIT
University of Michigan
Harvard University
University of Oxford
Sao Paulo University
Imperial College London
University of Edinburgh
Cornell University
University of California at Berkeley
RWTH Aachen
Columbia University
Georgia Tech
University of Wisconsin
UC San Diego
University of California at LA
University of Florida
University of North Carolina
~30m research articles
The system started to become
slow.
How long did it take to
generate our daily readership
statistics?
23 hours!
We had serious needs
•  Build a catalogue based on billions of articles
•  Support many features that rely on the catalogue
•  Statistics
•  Search
•  Recommendations
•  Sharing
•  Data
•  Freshness
•  Consistency
•  Business context
•  Agile development (rapid prototyping)
•  Cost effective
•  Going viral
•  Technical debt stacking up
Enter Hadoop
What is Hadoop?
The Apache Hadoop Project develops
open-source software for reliable,
scalable, distributed computing
www.hadoop.apache.org
Hadoop
•  Designed to operate on a cluster of
computers
•  1…thousands
•  Commodity hardware (low cost units)
•  Each node offers local computation and
storage
•  Provides framework for working with big
data (beyond petabytes)
New tech stack for backend
features
Research catalogue
(~30M unique articles)
Personal libraries
(>100M articles)
Crowdsourcing
(deduplication,
metadata
aggregation,
statistics)
New tech stack for backend
features
Research catalogue
(~30M unique articles)
Personal libraries
(>100M articles)
Crowdsourcing
(deduplication,
metadata
aggregation,
statistics)
23 hr
computations
now took 15
minutes
New tech stack for backend
features
Research catalogue
(~30M unique articles)
Personal libraries
(>100M articles)
Crowdsourcing
(deduplication,
metadata
aggregation,
statistics)
recommended
reading
Mendeley Suggest
Generating recommendations
through matrix multiplication
This is item-based
recommendations as
similarity is based on
items, not users
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
Running on Amazon's Elastic Map Reduce
On demand use and easy to cost
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
3
Mahout's
Performance
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
3
Mahout's
Performance
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
3
-4.1K
(63%)
Mahout's
Performance
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
3
Mahout's
Performance
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
Orig. user-based
è 1K, 2.5
3
Mahout's
Performance
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
Orig. user-based
è 1K, 2.5
3
-1.4K
(58%)
+1 (67%)
Mahout's
Performance
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
Orig. user-based
è 1K, 2.5
3
Cust. user-based
è 0.3K, 2.5
Mahout's
Performance
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
Orig. user-based
è 1K, 2.5
3
Cust. user-based
è 0.3K, 2.5
-0.7K
(70%)
Mahout's
Performance
-4.1K
(63%)
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
Orig. user-based
è 1K, 2.5
3
Cust. user-based
è 0.3K, 2.5
-6.2K
(95%)
Mahout's
Performance
+1 (67%)
Disclaimer: these advantages have costs
•  Migrating to a new system (data consistency)
•  Setup costs
•  Learn black magic to configure
•  Hardware for cluster
•  Administrative costs
•  High learning curve to administrate Hadoop
•  Still an immature technology
•  You may need to debug the source code
•  Developing against Mahout
•  Still needs lots of love
Big data backend
features
Research catalogue
(~30M unique articles)
Personal libraries
(>100M articles)
Crowdsourcing
(deduplication,
metadata
aggregation,
statistics)
Opening up Data
Social network
(>2.4M users)
Research catalogue
(~85M unique articles)
Research groups
(~240K groups)
Personal libraries
(>425M articles)
Our community from a data perspective
Logging massive
set of usage data
Challenge: Build an application with our data,
make science more open.
PloS/Mendeley's Binary Battle
More details at http://dev.mendeley.com/api-binary-battle/
Challenge: Build off-line system for scientific
recommendations with our API
and DataTEL data set
ScienceRec Challenge 2012
More details at http://2012.recsyschallenge.com/tracks/sciencerec/
Challenge: Build off-line system for scientific
recommendations with our API
and DataTEL data set
ScienceRec Challenge 2012
More details at http://2012.recsyschallenge.com/tracks/sciencerec/
Challenge: Metadata Extraction Challenge
The Next Challenge…?
Working with Academia
We have a history of academic
collaborations
Duration Project
2009-2011 MAKIN’IT
2010-2014 TEAM
2010-2011 DURA
2012-2012 CSL Editor
2012-2014 CODE
2012-2014 ERASM
2013-2015 EEXCESS
Demo
CSL Editor
http://editor.citationstyles.org/
Demo
CODE Mendeley Desktop
http://code-research.eu/results
Demo
Mendeley Labs
http://labs.mendeley.com/
We have a history of academic
collaborations
Duration Project
2009-2011 MAKIN’IT
2010-2014 TEAM
2010-2011 DURA
2012-2012 CSL Editor
2012-2014 CODE
2012-2014 ERASM
2013-2015 EEXCESS
Want to collaborate?
Conclusions
Conclusions
è  Mendeley is far more than a reference manager – it‘s
a platform that connects researchers, data and apps
è  Starting small is good, but be prepared for the cost of
scaling up
è  We‘re opening up our data for you to build apps on
our platform
è  We‘re always keen to collaborate with academic
groups
Kris Jack, PhD
Chief Data Scientist, @_krisjack

Contenu connexe

Tendances

Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedXavier Amatriain
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakDeepak Agarwal
 
ESSIR 2013 Recommender Systems tutorial
ESSIR 2013 Recommender Systems tutorial ESSIR 2013 Recommender Systems tutorial
ESSIR 2013 Recommender Systems tutorial Alexandros Karatzoglou
 
Content - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative InformationContent - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative InformationAlessandro Liparoti
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Machine Learning to Grow the World's Knowledge
Machine Learning to Grow  the World's KnowledgeMachine Learning to Grow  the World's Knowledge
Machine Learning to Grow the World's KnowledgeXavier Amatriain
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS BigDataCloud
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engineKeeyong Han
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...Sri Ambati
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricEdward Baker
 
Product Recommendations Enhanced with Reviews
Product Recommendations Enhanced with ReviewsProduct Recommendations Enhanced with Reviews
Product Recommendations Enhanced with Reviewsmaranlar
 
Browsemap: Collaborative Filtering at LinkedIn
Browsemap: Collaborative Filtering at LinkedInBrowsemap: Collaborative Filtering at LinkedIn
Browsemap: Collaborative Filtering at LinkedInLili Wu
 
Intro to Mahout
Intro to MahoutIntro to Mahout
Intro to MahoutUri Lavi
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Lillian Rigling
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerceAlexander Konduforov
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Dakiry
 

Tendances (20)

Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem Revisited
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and Deepak
 
ESSIR 2013 Recommender Systems tutorial
ESSIR 2013 Recommender Systems tutorial ESSIR 2013 Recommender Systems tutorial
ESSIR 2013 Recommender Systems tutorial
 
Content - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative InformationContent - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative Information
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Machine Learning to Grow the World's Knowledge
Machine Learning to Grow  the World's KnowledgeMachine Learning to Grow  the World's Knowledge
Machine Learning to Grow the World's Knowledge
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metric
 
Product Recommendations Enhanced with Reviews
Product Recommendations Enhanced with ReviewsProduct Recommendations Enhanced with Reviews
Product Recommendations Enhanced with Reviews
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Browsemap: Collaborative Filtering at LinkedIn
Browsemap: Collaborative Filtering at LinkedInBrowsemap: Collaborative Filtering at LinkedIn
Browsemap: Collaborative Filtering at LinkedIn
 
Intro to Mahout
Intro to MahoutIntro to Mahout
Intro to Mahout
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”
 

En vedette

Eurozone crisis and policy failure (Cádiz)
Eurozone crisis and policy failure (Cádiz)Eurozone crisis and policy failure (Cádiz)
Eurozone crisis and policy failure (Cádiz)Luis López-Molina
 
Presentation PositionGreen
Presentation PositionGreenPresentation PositionGreen
Presentation PositionGreenelvislaban
 
How to become Crorepati in 25 years
How to become Crorepati in 25 yearsHow to become Crorepati in 25 years
How to become Crorepati in 25 yearsNeeraj Maurya
 
Swing Trading Tactics - Lagging Indicators and Price Action
Swing Trading Tactics - Lagging Indicators and Price Action Swing Trading Tactics - Lagging Indicators and Price Action
Swing Trading Tactics - Lagging Indicators and Price Action Marketgeekschannel
 
What are the option greeks
What are the option greeksWhat are the option greeks
What are the option greeksOptionTiger.com
 
Technical Analysis of Stock----Wyckoff Method Project
Technical Analysis of Stock----Wyckoff Method ProjectTechnical Analysis of Stock----Wyckoff Method Project
Technical Analysis of Stock----Wyckoff Method ProjectLAI Wei
 
Windrose auswertung prototyp
Windrose auswertung prototypWindrose auswertung prototyp
Windrose auswertung prototypAndreas Kurth
 
Technology Budgeting for SMB
Technology Budgeting for SMBTechnology Budgeting for SMB
Technology Budgeting for SMBthomasmking1
 
Cal 30 s 29th october 2010
Cal 30 s 29th october 2010Cal 30 s 29th october 2010
Cal 30 s 29th october 2010Garden City
 
Daily Stock Report Monday Morning, July 9, 2012
Daily Stock Report Monday Morning, July 9, 2012Daily Stock Report Monday Morning, July 9, 2012
Daily Stock Report Monday Morning, July 9, 2012James Stewart
 
The Meisner Law Group - Community Association Seminar pt 4
The Meisner Law Group - Community Association Seminar pt 4The Meisner Law Group - Community Association Seminar pt 4
The Meisner Law Group - Community Association Seminar pt 4Robert M. Meisner
 

En vedette (17)

Eurozone crisis and policy failure (Cádiz)
Eurozone crisis and policy failure (Cádiz)Eurozone crisis and policy failure (Cádiz)
Eurozone crisis and policy failure (Cádiz)
 
Berlin
BerlinBerlin
Berlin
 
Presentation PositionGreen
Presentation PositionGreenPresentation PositionGreen
Presentation PositionGreen
 
2008 crisis
2008 crisis2008 crisis
2008 crisis
 
How to become Crorepati in 25 years
How to become Crorepati in 25 yearsHow to become Crorepati in 25 years
How to become Crorepati in 25 years
 
Swing Trading Tactics - Lagging Indicators and Price Action
Swing Trading Tactics - Lagging Indicators and Price Action Swing Trading Tactics - Lagging Indicators and Price Action
Swing Trading Tactics - Lagging Indicators and Price Action
 
What are the option greeks
What are the option greeksWhat are the option greeks
What are the option greeks
 
Technical Analysis of Stock----Wyckoff Method Project
Technical Analysis of Stock----Wyckoff Method ProjectTechnical Analysis of Stock----Wyckoff Method Project
Technical Analysis of Stock----Wyckoff Method Project
 
Act as state machine
Act as state machineAct as state machine
Act as state machine
 
Corporate governance jagadeesh
Corporate governance jagadeeshCorporate governance jagadeesh
Corporate governance jagadeesh
 
Windrose auswertung prototyp
Windrose auswertung prototypWindrose auswertung prototyp
Windrose auswertung prototyp
 
Technology Budgeting for SMB
Technology Budgeting for SMBTechnology Budgeting for SMB
Technology Budgeting for SMB
 
Cal 30 s 29th october 2010
Cal 30 s 29th october 2010Cal 30 s 29th october 2010
Cal 30 s 29th october 2010
 
Daily Stock Report Monday Morning, July 9, 2012
Daily Stock Report Monday Morning, July 9, 2012Daily Stock Report Monday Morning, July 9, 2012
Daily Stock Report Monday Morning, July 9, 2012
 
Clasificacion ultra "Güeyos del Diablu" veteranos
Clasificacion ultra "Güeyos del Diablu" veteranosClasificacion ultra "Güeyos del Diablu" veteranos
Clasificacion ultra "Güeyos del Diablu" veteranos
 
Pragati Bhotika - CV
Pragati Bhotika - CVPragati Bhotika - CV
Pragati Bhotika - CV
 
The Meisner Law Group - Community Association Seminar pt 4
The Meisner Law Group - Community Association Seminar pt 4The Meisner Law Group - Community Association Seminar pt 4
The Meisner Law Group - Community Association Seminar pt 4
 

Similaire à Making Mendeley's Research Catalogue More Useful

eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleKris Jack
 
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...datascience_at
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyKris Jack
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slidesLouis Rosenfeld
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsIlkay Altintas, Ph.D.
 
8 Information Architecture Better Practices
8 Information Architecture Better Practices8 Information Architecture Better Practices
8 Information Architecture Better PracticesLouis Rosenfeld
 
BESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User InterfacesBESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User InterfacesRoberto García
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudDatabricks
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSpark Summit
 
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache SparkAI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache SparkValue Amplify Consulting
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sangerChris Dwan
 
Publishing in a High Quality Journal.pptx
Publishing in a High Quality Journal.pptxPublishing in a High Quality Journal.pptx
Publishing in a High Quality Journal.pptxIbrahim573144
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsMaya Hristakeva
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!Cloudera, Inc.
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009yhadoop
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 

Similaire à Making Mendeley's Research Catalogue More Useful (20)

eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
 
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from Mendeley
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slides
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
8 Information Architecture Better Practices
8 Information Architecture Better Practices8 Information Architecture Better Practices
8 Information Architecture Better Practices
 
BESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User InterfacesBESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User Interfaces
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
 
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache SparkAI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
 
Apsc 100 Clinic 3 C Fall 09
Apsc 100 Clinic 3 C Fall 09Apsc 100 Clinic 3 C Fall 09
Apsc 100 Clinic 3 C Fall 09
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Publishing in a High Quality Journal.pptx
Publishing in a High Quality Journal.pptxPublishing in a High Quality Journal.pptx
Publishing in a High Quality Journal.pptx
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research Recommendations
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
201 ssp discoverability
201 ssp discoverability201 ssp discoverability
201 ssp discoverability
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 

Plus de Kris Jack

Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ MendeleyKris Jack
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Kris Jack
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemKris Jack
 
Mendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesMendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesKris Jack
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutKris Jack
 
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyKris Jack
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesKris Jack
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Kris Jack
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionKris Jack
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...Kris Jack
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...Kris Jack
 
Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersKris Jack
 
Mendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic LiteratureMendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic LiteratureKris Jack
 
Recommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureRecommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureKris Jack
 

Plus de Kris Jack (14)

Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ Mendeley
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
 
Mendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesMendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data Challenges
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with Mahout
 
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similarities
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language Acquisition
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
 
Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchers
 
Mendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic LiteratureMendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic Literature
 
Recommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureRecommendation Engines for Scientific Literature
Recommendation Engines for Scientific Literature
 

Dernier

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Making Mendeley's Research Catalogue More Useful

  • 1. Mendeley’s Research Catalogue: building it, opening it up and making it even more useful for researchers Kris Jack, PhD Chief Data Scientist, @_krisjack
  • 2. Outline 1. What‘s Mendeley? 2. Under the Bonnet 3. Opening up Data 4. Working with Academia 5. Conclusions
  • 4. Mendeley‘s not just a reference manager
  • 5. è  Mendeley is a platform that connects researchers, research data and apps Mendeley Open API
  • 6. Mendeley Open API research catalogue è  Mendeley is a platform that connects researchers, research data and apps
  • 7. ...organise their research Mendeley provides tools to help users... è  Reference management è  Cite-as-you- write è  Full-text article search è  Digitalised annotations
  • 8. ...organise their research ...collaborate with one another Mendeley provides tools to help users... è  Professional research groups è  Social network è  Annotation sharing
  • 9. ...organise their research ...collaborate with one another ...discover new research Mendeley provides tools to help users... è  Explore crowdsourced research catalogue è  Document statistics è  Personalised article recommendations è  Related research è  Research contact suggestions
  • 10. ...organise their research ...collaborate with one another ...discover new research Mendeley provides tools to help users...
  • 11. ...organise their research ...collaborate with one another ...discover new research Mendeley provides tools to help users...
  • 12. Social network (>2.4M users) Research catalogue (~85M unique articles) Research groups (~240K groups) Personal libraries (>425M articles) Our community from a data perspective Logging massive set of usage data
  • 14. Lots of features to build & support è  Reference management è  Cite-as-you- write è  Full-text article search è  Digitalised annotations è  Professional research groups è  Social network è  Annotation sharing è  Explore crowdsourced research catalogue è  Document statistics è  Personalised article recommendations è  Related research è  Research contact suggestions
  • 15. Lots of features to build & support è  Reference management è  Cite-as-you- write è  Full-text article search è  Digitalised annotations è  Professional research groups è  Social network è  Annotation sharing è  Explore crowdsourced research catalogue è  Document statistics è  Personalised article recommendations è  Related research è  Research contact suggestions
  • 16. Lots of features to build & support è  Reference management è  Cite-as-you- write è  Full-text article search è  Digitalised annotations è  Professional research groups è  Social network è  Annotation sharing è  Explore crowdsourced research catalogue è  Document statistics è  Personalised article recommendations è  Related research è  Research contact suggestions
  • 17. Lots of features to build & support features
  • 18. Lots of features to build & support features Research catalogue (~30M unique articles) Personal libraries (>100M articles)
  • 19. Lots of features to build & support features Research catalogue (~30M unique articles) Personal libraries (>100M articles) Crowdsourcing (deduplication, metadata aggregation, statistics)
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. The curse of success •  More articles came •  More users came •  Keeping catalogue data fresh was a burden •  Algorithms relied on global counts •  Iterating over MySQL tables was slow •  Needed to shard tables to grow catalogue •  In short, our backend system didn’t scale
  • 26. ~0.5 million users; the 20 largest user bases: University of Cambridge Stanford University MIT University of Michigan Harvard University University of Oxford Sao Paulo University Imperial College London University of Edinburgh Cornell University University of California at Berkeley RWTH Aachen Columbia University Georgia Tech University of Wisconsin UC San Diego University of California at LA University of Florida University of North Carolina ~30m research articles
  • 27. ~0.5 million users; the 20 largest user bases: University of Cambridge Stanford University MIT University of Michigan Harvard University University of Oxford Sao Paulo University Imperial College London University of Edinburgh Cornell University University of California at Berkeley RWTH Aachen Columbia University Georgia Tech University of Wisconsin UC San Diego University of California at LA University of Florida University of North Carolina ~30m research articles The system started to become slow. How long did it take to generate our daily readership statistics?
  • 28. ~0.5 million users; the 20 largest user bases: University of Cambridge Stanford University MIT University of Michigan Harvard University University of Oxford Sao Paulo University Imperial College London University of Edinburgh Cornell University University of California at Berkeley RWTH Aachen Columbia University Georgia Tech University of Wisconsin UC San Diego University of California at LA University of Florida University of North Carolina ~30m research articles The system started to become slow. How long did it take to generate our daily readership statistics? 23 hours!
  • 29. We had serious needs •  Build a catalogue based on billions of articles •  Support many features that rely on the catalogue •  Statistics •  Search •  Recommendations •  Sharing •  Data •  Freshness •  Consistency •  Business context •  Agile development (rapid prototyping) •  Cost effective •  Going viral •  Technical debt stacking up
  • 30. Enter Hadoop What is Hadoop? The Apache Hadoop Project develops open-source software for reliable, scalable, distributed computing www.hadoop.apache.org
  • 31. Hadoop •  Designed to operate on a cluster of computers •  1…thousands •  Commodity hardware (low cost units) •  Each node offers local computation and storage •  Provides framework for working with big data (beyond petabytes)
  • 32. New tech stack for backend features Research catalogue (~30M unique articles) Personal libraries (>100M articles) Crowdsourcing (deduplication, metadata aggregation, statistics)
  • 33. New tech stack for backend features Research catalogue (~30M unique articles) Personal libraries (>100M articles) Crowdsourcing (deduplication, metadata aggregation, statistics) 23 hr computations now took 15 minutes
  • 34. New tech stack for backend features Research catalogue (~30M unique articles) Personal libraries (>100M articles) Crowdsourcing (deduplication, metadata aggregation, statistics) recommended reading
  • 36.
  • 37. Generating recommendations through matrix multiplication This is item-based recommendations as similarity is based on items, not users org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
  • 38. Running on Amazon's Elastic Map Reduce On demand use and easy to cost
  • 39. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based 3 Mahout's Performance
  • 40. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 3 Mahout's Performance
  • 41. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 3 -4.1K (63%) Mahout's Performance
  • 42. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 3 Mahout's Performance
  • 43. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 Orig. user-based è 1K, 2.5 3 Mahout's Performance
  • 44. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 Orig. user-based è 1K, 2.5 3 -1.4K (58%) +1 (67%) Mahout's Performance
  • 45. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 Orig. user-based è 1K, 2.5 3 Cust. user-based è 0.3K, 2.5 Mahout's Performance
  • 46. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 Orig. user-based è 1K, 2.5 3 Cust. user-based è 0.3K, 2.5 -0.7K (70%) Mahout's Performance -4.1K (63%)
  • 47. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 Orig. user-based è 1K, 2.5 3 Cust. user-based è 0.3K, 2.5 -6.2K (95%) Mahout's Performance +1 (67%)
  • 48. Disclaimer: these advantages have costs •  Migrating to a new system (data consistency) •  Setup costs •  Learn black magic to configure •  Hardware for cluster •  Administrative costs •  High learning curve to administrate Hadoop •  Still an immature technology •  You may need to debug the source code •  Developing against Mahout •  Still needs lots of love
  • 49. Big data backend features Research catalogue (~30M unique articles) Personal libraries (>100M articles) Crowdsourcing (deduplication, metadata aggregation, statistics)
  • 51. Social network (>2.4M users) Research catalogue (~85M unique articles) Research groups (~240K groups) Personal libraries (>425M articles) Our community from a data perspective Logging massive set of usage data
  • 52.
  • 53.
  • 54.
  • 55. Challenge: Build an application with our data, make science more open. PloS/Mendeley's Binary Battle More details at http://dev.mendeley.com/api-binary-battle/
  • 56.
  • 57. Challenge: Build off-line system for scientific recommendations with our API and DataTEL data set ScienceRec Challenge 2012 More details at http://2012.recsyschallenge.com/tracks/sciencerec/
  • 58. Challenge: Build off-line system for scientific recommendations with our API and DataTEL data set ScienceRec Challenge 2012 More details at http://2012.recsyschallenge.com/tracks/sciencerec/
  • 59. Challenge: Metadata Extraction Challenge The Next Challenge…?
  • 61. We have a history of academic collaborations Duration Project 2009-2011 MAKIN’IT 2010-2014 TEAM 2010-2011 DURA 2012-2012 CSL Editor 2012-2014 CODE 2012-2014 ERASM 2013-2015 EEXCESS
  • 65. We have a history of academic collaborations Duration Project 2009-2011 MAKIN’IT 2010-2014 TEAM 2010-2011 DURA 2012-2012 CSL Editor 2012-2014 CODE 2012-2014 ERASM 2013-2015 EEXCESS Want to collaborate?
  • 67. Conclusions è  Mendeley is far more than a reference manager – it‘s a platform that connects researchers, data and apps è  Starting small is good, but be prepared for the cost of scaling up è  We‘re opening up our data for you to build apps on our platform è  We‘re always keen to collaborate with academic groups Kris Jack, PhD Chief Data Scientist, @_krisjack