SlideShare une entreprise Scribd logo
1  sur  22
DATA FOR SCIENCE
HOW ELSEVIER IS USING DATA SCIENCE TO EMPOWER RESEARCHERS
Paul Groth | @pgroth | pgroth.com
Disruptive Technology Director
Elsevier Labs | @elsevierlabs
European Data Forum 2016
12 million people
per month
40 million reactions
75 million compounds
500 million facts
3 EXAMPLES
• Personalized: what should I read?
• Actionable: who should I collaborate with?
• Consumable: how do I make my data available?
RECOMMENDATIONS AT MENDELEY
• Maya Hristakeva
• Data Scientist at Mendeley
• @mayahhf
• Spark Summit 2015
• http://www.slideshare.net/SparkSummit/sparkin
g-science-up-with-research-recommendations-
by-maya-hristakeva
Read
&
Organize
Search
&
Discover
Collaborate
&
Network
Experiment
&
Synthesize
MENDELEY BUILDS TOOLS TO HELP
RESEARCHERS …
BEING THE BEST RESEARCHER YOU CAN BE!
• Good researchers are on top of their game
• Large amount of research produced
• Takes time to get what you need
• Help researchers by recommending relevant research
PERSONALIZED ARTICLE RECOMMENDATION
Input:
User libraries
Output:
Suggested
articles to read
Algorithms:
• Collaborative Filtering
– Item-based
– User-Based
– Matrix Factorization
• Content-based
Costly & GoodCostly & Bad
Cheap & GoodCheap & Bad
Tuned IB Mahout
Tuned UB Mahout
Tuned UB Spark
Tuned IB Spark
UB DimSum
Spark MLlib
ALS Matrix Fact.
Spark MLlib
Performance
+100%
+150%
~$50
CALCULATING 75 TRILLION METRICS
• Benchmark 4600 institutions & 220 countries updated weekly
• 40 terabytes of data
• HPCC massively parallel compute system – 40 node system
ALL DATA ISN’T CURATED
60 % OF TIME IS SPENT ON DATA
PREPARATION
10 ASPECTS OF HIGHLY EFFECTIVE RESEARCH DATA
https://www.elsevier.com/con
nect/10-aspects-of-highly-
effective-research-data
http://data.mendeley.com/
Each dataset receives a versioned DOI,
so it can be cited
The citation for the
associated article is
displayed
ACADEMIC COLLABORATIONS
CONCLUSION
• Researchers are faced with an ever growing amount of data and content
• Data Science is key to making systems that help them
• I’ve shown three Elsevier examples. Many more!
• Antonio Gulli’s codingplayground.blogspot.nl
• labs.elsevier.com
• Of course, we’re hiring 
Contact: Paul Groth @pgroth

Contenu connexe

Tendances

NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
Susanna-Assunta Sansone
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
John Kunze
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-support
Sherry Lake
 

Tendances (20)

Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
 
THOR Workshop - Services PANGAEA
THOR Workshop - Services PANGAEATHOR Workshop - Services PANGAEA
THOR Workshop - Services PANGAEA
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset use
 
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureWhy Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
RDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseRDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuse
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-support
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
BEng Product Design 1st years session 1 Oct 2021
BEng Product Design 1st years session 1 Oct 2021BEng Product Design 1st years session 1 Oct 2021
BEng Product Design 1st years session 1 Oct 2021
 
Open Science: Research Data Management
Open Science: Research Data ManagementOpen Science: Research Data Management
Open Science: Research Data Management
 
Research methodology
Research methodologyResearch methodology
Research methodology
 
THOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing ElsevierTHOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing Elsevier
 
PDE2440 Nov 2019
PDE2440 Nov 2019PDE2440 Nov 2019
PDE2440 Nov 2019
 
Sharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemSharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags system
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharing
 
The Dataverse Commons
The Dataverse CommonsThe Dataverse Commons
The Dataverse Commons
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016
 

Similaire à Data for Science: How Elsevier is using data science to empower researchers

Sci Tech Forum LA 2013: New Directions in Scholarly Communication
Sci Tech Forum LA 2013: New Directions in Scholarly CommunicationSci Tech Forum LA 2013: New Directions in Scholarly Communication
Sci Tech Forum LA 2013: New Directions in Scholarly Communication
William Gunn
 
AAAS 2014: How the Web Changes Collaboration
AAAS 2014: How the Web Changes CollaborationAAAS 2014: How the Web Changes Collaboration
AAAS 2014: How the Web Changes Collaboration
William Gunn
 
Destroying the silo: how breaking down barriers can lead to proactive and coo...
Destroying the silo: how breaking down barriers can lead to proactive and coo...Destroying the silo: how breaking down barriers can lead to proactive and coo...
Destroying the silo: how breaking down barriers can lead to proactive and coo...
UKSG: connecting the knowledge community
 

Similaire à Data for Science: How Elsevier is using data science to empower researchers (20)

Open Science for sustainability and inclusiveness: the SKA role model
 Open Science for sustainability and inclusiveness: the SKA role model Open Science for sustainability and inclusiveness: the SKA role model
Open Science for sustainability and inclusiveness: the SKA role model
 
Open Access and Research Communication: The Perspective of Force11
Open Access and Research Communication: The Perspective of Force11Open Access and Research Communication: The Perspective of Force11
Open Access and Research Communication: The Perspective of Force11
 
Teaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate StudentsTeaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate Students
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Sci Tech Forum LA 2013: New Directions in Scholarly Communication
Sci Tech Forum LA 2013: New Directions in Scholarly CommunicationSci Tech Forum LA 2013: New Directions in Scholarly Communication
Sci Tech Forum LA 2013: New Directions in Scholarly Communication
 
Of Libraries and Labs: Effecting User-Driven Innovation - RLUK Members Mtg 2015
Of Libraries and Labs: Effecting User-Driven Innovation - RLUK Members Mtg 2015Of Libraries and Labs: Effecting User-Driven Innovation - RLUK Members Mtg 2015
Of Libraries and Labs: Effecting User-Driven Innovation - RLUK Members Mtg 2015
 
Five Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or ResearchFive Ways to Use Social Media to Raise Awareness for Your Paper or Research
Five Ways to Use Social Media to Raise Awareness for Your Paper or Research
 
Upgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureUpgrading the Scholarly Infrastructure
Upgrading the Scholarly Infrastructure
 
Lern, june 2016, digital media slides
Lern, june 2016, digital media slidesLern, june 2016, digital media slides
Lern, june 2016, digital media slides
 
AAAS 2014: How the Web Changes Collaboration
AAAS 2014: How the Web Changes CollaborationAAAS 2014: How the Web Changes Collaboration
AAAS 2014: How the Web Changes Collaboration
 
Ngsp
NgspNgsp
Ngsp
 
Is democracy the right system? Building an engaged RDM community - Marta Tepe...
Is democracy the right system? Building an engaged RDM community - Marta Tepe...Is democracy the right system? Building an engaged RDM community - Marta Tepe...
Is democracy the right system? Building an engaged RDM community - Marta Tepe...
 
Melissa Terras' Report on the #UKMHLiveLab
Melissa Terras' Report on the #UKMHLiveLabMelissa Terras' Report on the #UKMHLiveLab
Melissa Terras' Report on the #UKMHLiveLab
 
Destroying the silo: how breaking down barriers can lead to proactive and coo...
Destroying the silo: how breaking down barriers can lead to proactive and coo...Destroying the silo: how breaking down barriers can lead to proactive and coo...
Destroying the silo: how breaking down barriers can lead to proactive and coo...
 
Dataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. BorgmanDataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. Borgman
 
Advancing access to information - together
Advancing access to information - togetherAdvancing access to information - together
Advancing access to information - together
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes Search
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
 
Responsive and Responsible Use of Digital Resources for Research
Responsive and Responsible Use of Digital Resources  for Research Responsive and Responsible Use of Digital Resources  for Research
Responsive and Responsible Use of Digital Resources for Research
 
Plum analytics: Altmetrics in Practice - ALM workshop -- San Francisco - 201...
Plum analytics:  Altmetrics in Practice - ALM workshop -- San Francisco - 201...Plum analytics:  Altmetrics in Practice - ALM workshop -- San Francisco - 201...
Plum analytics: Altmetrics in Practice - ALM workshop -- San Francisco - 201...
 

Plus de Paul Groth

Plus de Paul Groth (20)

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Data for Science: How Elsevier is using data science to empower researchers

Notes de l'éditeur

  1. 1.8 million unique authors worldwide submitted 1.3 million manuscripts to Elsevier journals
  2. 40 million reactions 75 million compounds 500 million experimental facts ,
  3. 40 million reactions 75 million compounds 500 million experimental facts ,
  4. At Mendeley we build tools to help researchers organise and read research articles, collaborate and connect with other researchers, search and discover new research articles, etc. 
  5. 815 million articles
  6. “Mendeley Suggest” is our personalised article recommender. It is based on what users have in their libraries, and recommends other related articles. 
  7. Calculate for over 4 million users We are building a personalised article recommender based on what users read. Input is the users’ libraries and the output is a list of articles they may want to add to their library and read. There are a number of different algorithms we can use to generate the recommendations (content-based, collaborative filtering), and this talk we’ll focus on three types of collaborative filtering algorithms (user and item-based as well as matrix factorisation).
  8. To sum, we now have a Spark implementation of our production UB CF algorithm which performs well, and is a lot simpler to maintain and extend. There are still a few areas where we can tune and optimise further, so that could only make it faster and get bigger gains of using Spark. Depending on your data different algorithms might work better, so do experiment. 
  9. 40 million reactions 75 million compounds 500 million experimental facts ,
  10. http://www.tamr.com/piketty-revisited-improving-economics-data-science/
  11. NASA, A.40 Computational Modeling Algorithms and Cyberinfrastructure, tech. report, NASA, 19 Dec. 2011
  12. Data enginnering pipleines