SlideShare a Scribd company logo
1 of 34
Download to read offline
Making Content
Discoverable
February 2020 – WSDM, Houston
Dr. Daniel Kershaw
Automating Highlight Generation
Outline
• About Elsevier
• The Product Needs
• Models
• Validations
• Results
• Improvements
• Production Validation
To help customers meet those challenges:
Elsevier combines content with technology
to provide actionable knowledge
Operational Excellence
Content Technology
Chemistry database
500m published experimental facts
User queries
13m monthly users on ScienceDirect
Books
35,000 published books
Drug Database
100% of drug information from pharmaceutical
companies updated daily
Research
16% of the world’s research data and
articles published by Elsevier
1,000
technologists employed by Elsevier
Machine learning
Over 1,000 predictive models trained on 1.5
billion electronic health care events
Machine reading
475m facts extracted from ScienceDirect
Collaborative filtering:
1bn scientific articles added by 2.5m
researchers analyzed daily to generate over
250m article recommendations
Semantic Enhancement
Knowledge on 50m chemicals captured as 11B facts
Read Search Do
this this this
Cell
Fundamentals
Gray‘s Anatomy
ScienceDirect
Scopus
ClinicalKey
Reaxys
Sherpath
Mendeley
Knovel
Elsevier combines content with technology
to provide actionable knowledge
5
Product use cases: Identified in concept testing
6
UC1: Evaluate (Relevance)
• Is this paper relevant to me?
• What was their approach?
• What were the key novel findings?
UC3: Discover More
• What specifically was matched?
• What other papers share these findings?
• Are the results validated with other methods?
UC2: Understand (navigate)
• What is the context of this finding?
• Give me all the places where a specific finding is
being discussed.
Data Science Path
7
Extract
Extract key points from a
document e.g. main
findings, methods and
results
Connect
Connect these to core
locations within the
document
Relate
Find relations between
extracted sentences
across documents -
OpenIE
04.02.20
Can we use extractive
summarization to find the key
finding/points within a
document?
Our authors are
the best writers
Available Data
Full Text
Available Data
10
Title
Available Data
11
04.02.20
Available Data – Author Submitted Highlights
Focusing of text
13
Paper Abstract Author Highlights
04.02.20
Rouge Sampling
Rouge-l-f
In order to enhance the efficiency of the
discovery of natural active constituents
from plants, a bioactivity-guided cut CCC
separation strategy was developed and
used here to isolate LSD1 inhibitors from
S. baicalensis Georgi.
A bioactivity-guided cut CCC strategy was
designed
Highlight
Selected Sentence
04.02.20
Collins, E., Augenstein, I., & Riedel, S. (2017). A Supervised Approach to Extractive
Summarisation of Scientific Papers. CoRR, 195–205. http://doi.org/10.18653/v1/K17-1021
Initial Model – Sentence Classification
• Section Classification
• Content overlap
• Number of numbers
• Sentence length
04.02.20
Kedzie, C., McKeown, K. R., & Daumé, H., III. (2018). Content Selection in Deep Learning
Models of Summarization. Emnlp.
Sequential RNN
04.02.20
Accuracy Measures
Model Name Test Accuracy
LSTM 0.853
Abstractnet Classifier 0.718
Combined Linear Classifier 0.696
Combined MLP Classifier 0.730
Percceptron Features Abstract Vector 0.697
Single Layer NN 0.696 Accuracy does
not make a
good summary
04.02.20
Rouge Values of Extractions
04.02.20
“Human (Editor) in the loop”
Work with editors from Lancet &
Elsevier (SME)
1.Ask Editors to rate the output of
the machine learning model
2.Have multiple rates rate the same
output
Framework used with the Lancet editors
to evaluate computer generated
summaries/assertions
Selection
Ranking
Simplicity
Complete Set
20
04.02.20
How should we order sentences
Introduction
Methods
Results
Conclusion
Methods
Results
Conclusion
Methods
Results
or or
Basted off section classification
04.02.20
Ranking Preference Results
Ranker Ranker 1 Ranker 2 Ranker 3 Ranker 4 Grand Total
Ranker 1 9 7 4 20
Ranker 2 5 6 8 19
Ranker 3 9 8 7 24
Ranker 4 4 5 7 16
Grand Total 18 22 20 19 79
04.02.20
04.02.20
Anshelevich, E., Bhardwaj, O., Elkind, E., Postl, J. and Skowron, P., 2018. Approximating
optimal social choice under metric preferences. Artificial Intelligence, 264, pp.27-51.
Example Extractions
• We show how closely these rules, which only have access to ordinal preferences, approximate an optimal
alternative with respect to the metric costs.1 we consider two general objective functions to quantify the quality of
alternatives, and, for most of the rules we consider, we give distortion bounds with respect to both of these functions.
• Our other objective function defines the quality of an alternative as the median of agent costs for that alternative: focusing
on the median agent is quite common as well, as this reduces the impact of outliers, i.e., agents with very high or very low
costs.
• Our results show that, while some commonly used rules have high distortion, there are important voting rules for
which distortion is bounded by a small constant or grows slowly with the number of alternatives.
• We will now show upper bounds on the distortion of Plurality and Borda, which match the lower bounds for these rules
given by Theorem 9, and an upper bound on the distortion of the Harmonic rule, which almost matches the respective
lower bound.
• Although these rules know nothing about the metric costs other than the ordinal preferences induced by them,
and cannot possibly find the true optimal alternative, they nevertheless always select an alternative whose quality is
only a factor of 5 away from optimal!
• We prove that the distortion of every voting rule that always outputs a subset of the uncovered set [32] does not
exceed 5; this upper bound holds both for the sum objective and for the median objective, though different techniques are
required to prove it for the two cases.
Simplification
• Selected sentences are a tad to
long.
• Contain irrelevant openings e.g.
“Furthermore”
• Solution split sentences on first “,”
filter out common openings.
25
thus
however
in summary
finally
in this study
moreover
in this work
furthermore
in addition
in conclusion
in this section
then
to the best of our knowledge
hence
in particular
additionally
also
second
first
as a result
specifically
Simplification
In the following work, we design lightweight authentication protocol for three
tiers wireless body area network with wearable devices.
We design lightweight authentication protocol for three tiers wireless body
area network with wearable devices.
Simplified
04.02.20
Production
Getting user feedback
28
29
Click
30
31
04.02.20
Where the Data also goes
SEARCH EDITORIAL
SYSTEMS
04.02.20
Final Overview
• Data Science lead by User needs
• Multiple stages of evaluation
• Each evaluation can influence the whole model
• Value the original authors writing
• Relatively simple model have powerful results
WE ARE HIRING
(AMS & LDN)
Come Speak to Me
(Senior) Data Scientist (Research/ML)

More Related Content

What's hot

Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...Twitter Inc.
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
C05 Henslin9e
C05 Henslin9eC05 Henslin9e
C05 Henslin9eplisasm
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Dmitry Grapov
 
OA Week 2012 Miami U: How Open Scholarship is Changing Research
OA Week 2012 Miami U: How Open Scholarship is Changing ResearchOA Week 2012 Miami U: How Open Scholarship is Changing Research
OA Week 2012 Miami U: How Open Scholarship is Changing ResearchWilliam Gunn
 
BioSHaRE: The DataSHIELD Legal Analysis Template - Susan Wallace - University...
BioSHaRE: The DataSHIELD Legal Analysis Template - Susan Wallace - University...BioSHaRE: The DataSHIELD Legal Analysis Template - Susan Wallace - University...
BioSHaRE: The DataSHIELD Legal Analysis Template - Susan Wallace - University...Lisette Giepmans
 
Creating structured biomedical knowledge networks via crowdsourcing
Creating structured biomedical knowledge networks via crowdsourcingCreating structured biomedical knowledge networks via crowdsourcing
Creating structured biomedical knowledge networks via crowdsourcingveleritas
 
OSFair2017 Training | Increasing Research Transparency using the Open Science...
OSFair2017 Training | Increasing Research Transparency using the Open Science...OSFair2017 Training | Increasing Research Transparency using the Open Science...
OSFair2017 Training | Increasing Research Transparency using the Open Science...Open Science Fair
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...Andrew McEachran
 
Advanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisAdvanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisDmitry Grapov
 
The Usability Perception Scale (UPscale): A Measure for Evaluating Feedback D...
The Usability Perception Scale (UPscale): A Measure for Evaluating Feedback D...The Usability Perception Scale (UPscale): A Measure for Evaluating Feedback D...
The Usability Perception Scale (UPscale): A Measure for Evaluating Feedback D...Beth Karlin
 
Tsrl modified dosage forms august_non-confidential
Tsrl modified dosage forms august_non-confidentialTsrl modified dosage forms august_non-confidential
Tsrl modified dosage forms august_non-confidentialDrew Hertig, MBA, CLP
 

What's hot (17)

Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
 
Research Proposal
Research ProposalResearch Proposal
Research Proposal
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
C05 Henslin9e
C05 Henslin9eC05 Henslin9e
C05 Henslin9e
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
 
OA Week 2012 Miami U: How Open Scholarship is Changing Research
OA Week 2012 Miami U: How Open Scholarship is Changing ResearchOA Week 2012 Miami U: How Open Scholarship is Changing Research
OA Week 2012 Miami U: How Open Scholarship is Changing Research
 
BioSHaRE: The DataSHIELD Legal Analysis Template - Susan Wallace - University...
BioSHaRE: The DataSHIELD Legal Analysis Template - Susan Wallace - University...BioSHaRE: The DataSHIELD Legal Analysis Template - Susan Wallace - University...
BioSHaRE: The DataSHIELD Legal Analysis Template - Susan Wallace - University...
 
Creating structured biomedical knowledge networks via crowdsourcing
Creating structured biomedical knowledge networks via crowdsourcingCreating structured biomedical knowledge networks via crowdsourcing
Creating structured biomedical knowledge networks via crowdsourcing
 
OSFair2017 Training | Increasing Research Transparency using the Open Science...
OSFair2017 Training | Increasing Research Transparency using the Open Science...OSFair2017 Training | Increasing Research Transparency using the Open Science...
OSFair2017 Training | Increasing Research Transparency using the Open Science...
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...
 
C017510717
C017510717C017510717
C017510717
 
Advanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisAdvanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data Analysis
 
Cser13.ppt
Cser13.pptCser13.ppt
Cser13.ppt
 
The Usability Perception Scale (UPscale): A Measure for Evaluating Feedback D...
The Usability Perception Scale (UPscale): A Measure for Evaluating Feedback D...The Usability Perception Scale (UPscale): A Measure for Evaluating Feedback D...
The Usability Perception Scale (UPscale): A Measure for Evaluating Feedback D...
 
Tsrl modified dosage forms august_non-confidential
Tsrl modified dosage forms august_non-confidentialTsrl modified dosage forms august_non-confidential
Tsrl modified dosage forms august_non-confidential
 
Slideshow ire
Slideshow ireSlideshow ire
Slideshow ire
 

Similar to Elsevier Industry Talk - WSDM 2020

Presentation s rs
Presentation s rsPresentation s rs
Presentation s rsjnmueller
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Barry Smith
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodKarry Lu
 
In metrics we trust?
In metrics we trust?In metrics we trust?
In metrics we trust?ORCID, Inc
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...Christopher Hart
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsSean Ekins
 
Search in Medical Text
Search in Medical TextSearch in Medical Text
Search in Medical TextSarvnaz Karimi
 
Standards for interoperable EHR Christopher G Chute MD DrPH Professor, Biomed...
Standards for interoperable EHR Christopher G Chute MD DrPH Professor, Biomed...Standards for interoperable EHR Christopher G Chute MD DrPH Professor, Biomed...
Standards for interoperable EHR Christopher G Chute MD DrPH Professor, Biomed...ChannelTV2
 
9. Systematic review _Journal Club.pdf
9. Systematic review _Journal Club.pdf9. Systematic review _Journal Club.pdf
9. Systematic review _Journal Club.pdfssuserca7d2c
 
Systematic review and meta analaysis course - part 1
Systematic review and meta analaysis course - part 1Systematic review and meta analaysis course - part 1
Systematic review and meta analaysis course - part 1Ahmed Negida
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...Michel Dumontier
 
Uses and misuses of quantitative indicators of impact
Uses and misuses of quantitative indicators of impactUses and misuses of quantitative indicators of impact
Uses and misuses of quantitative indicators of impactBerenika Webster
 
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal ChemistryEmerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal ChemistryEd Griffen
 
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...Frederik van den Broek
 
Planning of experiment in industrial research
Planning of experiment in industrial researchPlanning of experiment in industrial research
Planning of experiment in industrial researchpbbharate
 
Journal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingJournal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingBram Zandbelt
 

Similar to Elsevier Industry Talk - WSDM 2020 (20)

The Importance of Open Data and Models for Energy Systems Analysis
The Importance of Open Data and Models for Energy Systems AnalysisThe Importance of Open Data and Models for Energy Systems Analysis
The Importance of Open Data and Models for Energy Systems Analysis
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
 
Presentation s rs
Presentation s rsPresentation s rs
Presentation s rs
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
meta analysis
meta analysis meta analysis
meta analysis
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
 
In metrics we trust?
In metrics we trust?In metrics we trust?
In metrics we trust?
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 
Search in Medical Text
Search in Medical TextSearch in Medical Text
Search in Medical Text
 
Standards for interoperable EHR Christopher G Chute MD DrPH Professor, Biomed...
Standards for interoperable EHR Christopher G Chute MD DrPH Professor, Biomed...Standards for interoperable EHR Christopher G Chute MD DrPH Professor, Biomed...
Standards for interoperable EHR Christopher G Chute MD DrPH Professor, Biomed...
 
9. Systematic review _Journal Club.pdf
9. Systematic review _Journal Club.pdf9. Systematic review _Journal Club.pdf
9. Systematic review _Journal Club.pdf
 
Systematic review and meta analaysis course - part 1
Systematic review and meta analaysis course - part 1Systematic review and meta analaysis course - part 1
Systematic review and meta analaysis course - part 1
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...
 
Uses and misuses of quantitative indicators of impact
Uses and misuses of quantitative indicators of impactUses and misuses of quantitative indicators of impact
Uses and misuses of quantitative indicators of impact
 
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal ChemistryEmerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
 
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
 
Planning of experiment in industrial research
Planning of experiment in industrial researchPlanning of experiment in industrial research
Planning of experiment in industrial research
 
Journal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingJournal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific Computing
 

More from Daniel Kershaw

Building Recommender Systems - Mendeley and Science Direct
Building Recommender Systems - Mendeley and Science DirectBuilding Recommender Systems - Mendeley and Science Direct
Building Recommender Systems - Mendeley and Science DirectDaniel Kershaw
 
Lancaster UCREL Summer School 2017 - Big Data and NLP
Lancaster UCREL Summer School 2017 - Big Data and NLPLancaster UCREL Summer School 2017 - Big Data and NLP
Lancaster UCREL Summer School 2017 - Big Data and NLPDaniel Kershaw
 
Building Recommender Systems for Scholarly Information
Building Recommender Systems for Scholarly InformationBuilding Recommender Systems for Scholarly Information
Building Recommender Systems for Scholarly InformationDaniel Kershaw
 
BrightonSEO - Is this presentation going to be ‘fleek’?
BrightonSEO - Is this presentation going to be ‘fleek’?BrightonSEO - Is this presentation going to be ‘fleek’?
BrightonSEO - Is this presentation going to be ‘fleek’?Daniel Kershaw
 
Towards Modelling Language Innovation Acceptance in Online Social Networks
Towards Modelling Language Innovation Acceptance in Online Social NetworksTowards Modelling Language Innovation Acceptance in Online Social Networks
Towards Modelling Language Innovation Acceptance in Online Social NetworksDaniel Kershaw
 
Twitter and Alcohol - BrightonSEO Pressentation
Twitter and Alcohol - BrightonSEO PressentationTwitter and Alcohol - BrightonSEO Pressentation
Twitter and Alcohol - BrightonSEO PressentationDaniel Kershaw
 
Monitoring Regional Alcohol Consumption through Social Media
Monitoring Regional Alcohol Consumption through Social MediaMonitoring Regional Alcohol Consumption through Social Media
Monitoring Regional Alcohol Consumption through Social MediaDaniel Kershaw
 

More from Daniel Kershaw (7)

Building Recommender Systems - Mendeley and Science Direct
Building Recommender Systems - Mendeley and Science DirectBuilding Recommender Systems - Mendeley and Science Direct
Building Recommender Systems - Mendeley and Science Direct
 
Lancaster UCREL Summer School 2017 - Big Data and NLP
Lancaster UCREL Summer School 2017 - Big Data and NLPLancaster UCREL Summer School 2017 - Big Data and NLP
Lancaster UCREL Summer School 2017 - Big Data and NLP
 
Building Recommender Systems for Scholarly Information
Building Recommender Systems for Scholarly InformationBuilding Recommender Systems for Scholarly Information
Building Recommender Systems for Scholarly Information
 
BrightonSEO - Is this presentation going to be ‘fleek’?
BrightonSEO - Is this presentation going to be ‘fleek’?BrightonSEO - Is this presentation going to be ‘fleek’?
BrightonSEO - Is this presentation going to be ‘fleek’?
 
Towards Modelling Language Innovation Acceptance in Online Social Networks
Towards Modelling Language Innovation Acceptance in Online Social NetworksTowards Modelling Language Innovation Acceptance in Online Social Networks
Towards Modelling Language Innovation Acceptance in Online Social Networks
 
Twitter and Alcohol - BrightonSEO Pressentation
Twitter and Alcohol - BrightonSEO PressentationTwitter and Alcohol - BrightonSEO Pressentation
Twitter and Alcohol - BrightonSEO Pressentation
 
Monitoring Regional Alcohol Consumption through Social Media
Monitoring Regional Alcohol Consumption through Social MediaMonitoring Regional Alcohol Consumption through Social Media
Monitoring Regional Alcohol Consumption through Social Media
 

Recently uploaded

Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectErbil Polytechnic University
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentBharaniDharan195623
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectssuserb6619e
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxachiever3003
 

Recently uploaded (20)

Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction Project
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managament
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptx
 

Elsevier Industry Talk - WSDM 2020

  • 1. Making Content Discoverable February 2020 – WSDM, Houston Dr. Daniel Kershaw Automating Highlight Generation
  • 2. Outline • About Elsevier • The Product Needs • Models • Validations • Results • Improvements • Production Validation
  • 3. To help customers meet those challenges: Elsevier combines content with technology to provide actionable knowledge Operational Excellence Content Technology Chemistry database 500m published experimental facts User queries 13m monthly users on ScienceDirect Books 35,000 published books Drug Database 100% of drug information from pharmaceutical companies updated daily Research 16% of the world’s research data and articles published by Elsevier 1,000 technologists employed by Elsevier Machine learning Over 1,000 predictive models trained on 1.5 billion electronic health care events Machine reading 475m facts extracted from ScienceDirect Collaborative filtering: 1bn scientific articles added by 2.5m researchers analyzed daily to generate over 250m article recommendations Semantic Enhancement Knowledge on 50m chemicals captured as 11B facts
  • 4. Read Search Do this this this Cell Fundamentals Gray‘s Anatomy ScienceDirect Scopus ClinicalKey Reaxys Sherpath Mendeley Knovel Elsevier combines content with technology to provide actionable knowledge
  • 5. 5
  • 6. Product use cases: Identified in concept testing 6 UC1: Evaluate (Relevance) • Is this paper relevant to me? • What was their approach? • What were the key novel findings? UC3: Discover More • What specifically was matched? • What other papers share these findings? • Are the results validated with other methods? UC2: Understand (navigate) • What is the context of this finding? • Give me all the places where a specific finding is being discussed.
  • 7. Data Science Path 7 Extract Extract key points from a document e.g. main findings, methods and results Connect Connect these to core locations within the document Relate Find relations between extracted sentences across documents - OpenIE
  • 8. 04.02.20 Can we use extractive summarization to find the key finding/points within a document? Our authors are the best writers
  • 12. 04.02.20 Available Data – Author Submitted Highlights
  • 13. Focusing of text 13 Paper Abstract Author Highlights
  • 14. 04.02.20 Rouge Sampling Rouge-l-f In order to enhance the efficiency of the discovery of natural active constituents from plants, a bioactivity-guided cut CCC separation strategy was developed and used here to isolate LSD1 inhibitors from S. baicalensis Georgi. A bioactivity-guided cut CCC strategy was designed Highlight Selected Sentence
  • 15. 04.02.20 Collins, E., Augenstein, I., & Riedel, S. (2017). A Supervised Approach to Extractive Summarisation of Scientific Papers. CoRR, 195–205. http://doi.org/10.18653/v1/K17-1021 Initial Model – Sentence Classification • Section Classification • Content overlap • Number of numbers • Sentence length
  • 16. 04.02.20 Kedzie, C., McKeown, K. R., & Daumé, H., III. (2018). Content Selection in Deep Learning Models of Summarization. Emnlp. Sequential RNN
  • 17. 04.02.20 Accuracy Measures Model Name Test Accuracy LSTM 0.853 Abstractnet Classifier 0.718 Combined Linear Classifier 0.696 Combined MLP Classifier 0.730 Percceptron Features Abstract Vector 0.697 Single Layer NN 0.696 Accuracy does not make a good summary
  • 19. 04.02.20 “Human (Editor) in the loop” Work with editors from Lancet & Elsevier (SME) 1.Ask Editors to rate the output of the machine learning model 2.Have multiple rates rate the same output Framework used with the Lancet editors to evaluate computer generated summaries/assertions Selection Ranking Simplicity Complete Set
  • 20. 20
  • 21. 04.02.20 How should we order sentences Introduction Methods Results Conclusion Methods Results Conclusion Methods Results or or Basted off section classification
  • 22. 04.02.20 Ranking Preference Results Ranker Ranker 1 Ranker 2 Ranker 3 Ranker 4 Grand Total Ranker 1 9 7 4 20 Ranker 2 5 6 8 19 Ranker 3 9 8 7 24 Ranker 4 4 5 7 16 Grand Total 18 22 20 19 79
  • 24. 04.02.20 Anshelevich, E., Bhardwaj, O., Elkind, E., Postl, J. and Skowron, P., 2018. Approximating optimal social choice under metric preferences. Artificial Intelligence, 264, pp.27-51. Example Extractions • We show how closely these rules, which only have access to ordinal preferences, approximate an optimal alternative with respect to the metric costs.1 we consider two general objective functions to quantify the quality of alternatives, and, for most of the rules we consider, we give distortion bounds with respect to both of these functions. • Our other objective function defines the quality of an alternative as the median of agent costs for that alternative: focusing on the median agent is quite common as well, as this reduces the impact of outliers, i.e., agents with very high or very low costs. • Our results show that, while some commonly used rules have high distortion, there are important voting rules for which distortion is bounded by a small constant or grows slowly with the number of alternatives. • We will now show upper bounds on the distortion of Plurality and Borda, which match the lower bounds for these rules given by Theorem 9, and an upper bound on the distortion of the Harmonic rule, which almost matches the respective lower bound. • Although these rules know nothing about the metric costs other than the ordinal preferences induced by them, and cannot possibly find the true optimal alternative, they nevertheless always select an alternative whose quality is only a factor of 5 away from optimal! • We prove that the distortion of every voting rule that always outputs a subset of the uncovered set [32] does not exceed 5; this upper bound holds both for the sum objective and for the median objective, though different techniques are required to prove it for the two cases.
  • 25. Simplification • Selected sentences are a tad to long. • Contain irrelevant openings e.g. “Furthermore” • Solution split sentences on first “,” filter out common openings. 25 thus however in summary finally in this study moreover in this work furthermore in addition in conclusion in this section then to the best of our knowledge hence in particular additionally also second first as a result specifically
  • 26. Simplification In the following work, we design lightweight authentication protocol for three tiers wireless body area network with wearable devices. We design lightweight authentication protocol for three tiers wireless body area network with wearable devices. Simplified
  • 28. 28
  • 30. 30
  • 31. 31
  • 32. 04.02.20 Where the Data also goes SEARCH EDITORIAL SYSTEMS
  • 33. 04.02.20 Final Overview • Data Science lead by User needs • Multiple stages of evaluation • Each evaluation can influence the whole model • Value the original authors writing • Relatively simple model have powerful results
  • 34. WE ARE HIRING (AMS & LDN) Come Speak to Me (Senior) Data Scientist (Research/ML)