SlideShare une entreprise Scribd logo
1  sur  49
Smart Data and
Algorithms for the
Publishing
Industry
Antonio Gulli
PLATFORMS
• Founded over 130 years ago
• Over the last 50 years the majority of Noble
Laureates have published with Elsevier
• Employ over 7,000 employees in 24
countries
• Published over 330,000 articles
• Received over 1 million submissions
• Work with over 30 million Scientists, students,
health & information professionals
• Over 53 million items indexed by Scopus
• Over 400 millions articles known by Mendeley
• Usage data on ScienceDirect, Scopus,
Mendeley, EV, Scival, Pure and the
submission platforms
Elsevier – A data driven company
Smart Data is a central element for the Publishing
Industry: content, usage, transactions
3
READER EDITORAUTHOR REVIEWER TEACHERCOLLABORATORRESEARCHER
ProductsCapabilities
Data:
MiningthesignalsResearchers
One login, one cookie
Data creation &
Query
Event
notification
Content
creation
Identity &
Authentification
Search &
Discovery
A/B
Taxonomies
Research data
Editorial &
Review
Funding data
Annotations
…Docs
Usage Data
Appl. events
Search
queries
Profiles
www.elsevier.com
journals.elsevier.com
Platforms
ScienceDirect
Scopus
The largest abstract and citation database of peer-reviewed research literature. Its used by
academics, government researchers and corporate R&D professionals who need to search,
discover and analyze research.
21,900+ journals | 5,000 publishers | 56 million items | 3,000+ customers
Scival
• Every 2 weeks
the entire
SciVal dataset
is updated
• Combinations
34.4M Scopus
articles and
their 400M
citations
• roughly
90,000B metric
combinations
Mendeley
Application::Recommenders
“Related and relevant” areas for multidisciplinary
areas
Recommenders @ Mendeley
Experiments & Results
• Offline Evaluation
• Online A/B testing
– 2 email tests
– ~40-50k users
– 6-8 algorithms
– 40-50% Open Rate
– 8-12% CTR
EPFL/Recommend – Increase Click Through Rate and Article Accesses
12
Article page view in ScienceDirect
Recommendation
s
• When customers look at articles on ScienceDirect, they’re also
provided with links to other articles of interest
EPFL: École Polytechnique Fédérale de Lausanne
(Switzerland)
Approach:
• Analyze usage data & articles on 4 dimensions
• Pre-compute recommendations on HPCC
• Offer recommendations during article page view
• A/B test variants for scoring / ranking results
EPFL / Recommend
13
Recommendation Engine development
14
During the pilot…
• EPFL cycled through 6 variants while testing for best dimension combinations &
weights
• A/B testing was used on 6 variants, each measured during a fixed usage window
• Compare to existing related articles
Results
• The 6th variant provided ~65% increase in CTR (click-through-rate) over Related
Articles
• Conclusion: Recommendation based on co-usage outperforms (the current)
recommendation based on topical similarity... when properly tuned.
Application::Search
Search – Migration Fast to Solr
16
• Hot migration from Fast to Solr optimizing an online metric
(FTA – clicked) with proper A/B testing
• Identification of offline proxy-metric for exploring the feature
space
904Lab: Learning to Rank, interleaving & A/B/C
Evaluation method
18
LanguageTools: Autosuggest
Application:: Entity Fingerprinting
What, again, is a Fingerprint?
Entities: structured representation of unstructured data
• Compute semantic content representation of scientific text
• Support of multiple vocabularies / ontologies is important
Engine’s value – agile adaptation to new thesauri
Indexing Scientific Content with the Fingerprint
Engine
Journal Finder
Application::Research Trends
Scival: Summary tab
1. The Summary page gives you a quick
impression of the nature of and trends within the
Research Area.
3. The bottom of page provides a
quick overview of the institutions,
countries, authors and journals that
make up the field.
2. The word cloud visualizes the
most important keyphrases for the
field. (based on Elsevier’s
Fingerprinting technology).
Scival : Institutions tab – map
1. The map view gives you a view
on the world where you can see all
citation and usage activity within
your Research Area
2. View the geographical
location of the top
performing institutions
and rising stars of this
Research Area
3. ‘Top performers’ and ‘rising
stars’ can be visualized using
citation, output and usage metrics.
Scival: Institution tab - chart
1. The chart view allows you to
analyze top performers and rising
stars within the selected Research
Area over time.
3. The selected items can be
compared against one another
using both previously available
metrics and the new usage metrics.
2. Here we have
selected the top
institutions based
usage (views count)
Application::Stream of News
Platform capabilities
Reporting Researcher profiles
Impact metricsBenchmarking & analytics
Clustering and tagging
• Process takes 5 minutes for typical institution
New
coverage
Clustering
coverage
Merge
clusters
Tag
clusters
Entity identification
Articles Tags(Researchers, institutions, papers)
500,000 per day
~5,000+ per institution
~4,000 institutions
×
Tagging accuracy
Imperial Leicester
Stories per day 13.2 7.6
Mentions per story 5.4 4.5
Stories with DOIs 17% 12%
Mentions with DOIs 2% 3%
Newsflo approachStandard approach
Application:: Entity Disambiguation
34
Scopus Author Identifier algorithm based on:
• Affiliation
• Address
• Subject area
• Source title
• Dates of publication
• Citations
• Co-authors
Searching in Scopus relies on disambiguating
authors and their affiliations
Authors like “Michael Smith” can be problematic
35
All of these
records appear to
be for the same
“Michael Smith,”
but they have
separate Author
IDs in Scopus
Authors like “Wei Wu” are also a challenge
36HPCC In Elsevier Update
This “Wei Wu”
appears to have
1642
documents, but
they span all of
these subject
areas:
Entities in the Social Network of Science
37
The Researcher Disambiguation Program is the place where those entities are given a true identity, with the
aim to reduce the errors in attribution
Researchers
Institutions
Articles
Journals
Patents
Funding
bodies
Grants
Research domains
Countries
Labs
Projects
Research data sets
Publishing
cluster
Usage
cluster
Editors
Reviewers
Author
s
Inventor
s
Funding
cluster
Opportunities
Corporations
Publishers
Conferences
Societies
Res Eval
Agencies
Counter
Typical needs of
the entities:
Develop research
strategy
Obtain funding
Acquire talent
Prioritize activities
Working efficiently
Measure/Demonstrat
e societal impact
Develop career
Publish
Find reviewers
Build reputation
Ranking
…
…/204
2.5k/20k
70m/70m
…/16k
…/11m
…/10k
12m/65m
90k/millio
ns
Press clippings
1/3.5k
7k/30k
300k/1.2
m
Application::APIs
Anyone can get an API key and build an
application with freely available content from
ScienceDirect and Scopus
http://dev.elsevier.com
Scopus metrics are freely available for use
Citation
counts
Journal
metrics
Platform::HPCC
42
• High Performance Computing Cluster Platform (HPCC) enables data integration on a scale not previously available and real-time
answers to millions of users. Built for big data and proven for 10 years with enterprise customers.
• Offers a single architecture, two data platforms (query and refinery) and a consistent data-intensive programming language (ECL)
• ECL Parallel Programming Language optimized for business differentiating data intensive applications
HPCC Systems: Proven with Enterprise Customers
HPCC Architecture – Open source
43
Recommendation Engine process at a glance
44HPCC In Elsevier Update
5 years of SD usage data/events
All SD XML Articles
Journal Rankings
Thor
Co-
download
matrix
Similarity
Attribute
Ranking
6 billion
events
~12M
articles
pii-739156
Twice weekly, move to
daily
Roxie
pii-684259, pii_585346,
pii_491635
ELSEVIER LABS - INTRO
ELSEVIER LABS - AREAS OF INTEREST
• Information extraction for scholarly text
• Knowledge Graphs
• Entity linking / disambiguation
• Machine Reading
• Information retrieval / Search
• Question answering
• Large Scale Data Management
• Deep Multimodal Networks for Image and Text Analysis
• Future of research communication
ELSEVIER LABS - INTRO
RECENT PUBLICATIONS
• Sara Magliacane, Philip Stutz, Paul Groth, Abraham Bernstein, foxPSL: A Fast, Optimized and
eXtended PSL implementation, International Journal of Approximate Reasoning (2015).
• Ron Daniel, "Large Scale Text Analytics with Apache Spark"; Presented at Text Analytics World
2015, San Francisco.
• L. Moreau, P. Groth, J. Cheney, T. Lebo, S. Miles, The rationale of PROV, Web Semantics:
Science, Services and Agents on the World Wide Web (2015).
• Marcin Wylot, Philippe Cudré-Mauroux, and Paul Groth. “Executing Provenance-Enabled Queries
over Web Data”, Proceedings of the 24th International World Wide Web Conference (WWW
2015), 2015.
• Elshaimaa Ali, Michael Lauruhn. Wikipedia-based Extraction of Lightweight Ontologies for
Concept Level Annotation, International Conference on Dublin Core and Metadata Applications
(DC 2014), Austin, Tx
• Ron Daniel, "Predicting Citation Counts"; Research Trends; June 2014.
WDSM – CHALLENGE – MICROSOFT - ELSEVIER
Join Elsevier if
you are
serious about
Bigdata and
Machine
Learning
Q&A

Contenu connexe

Tendances

Finding statistics2
Finding statistics2Finding statistics2
Finding statistics2lmk7
 
Article Impact (ALM and altmetrics)
Article Impact (ALM and altmetrics)Article Impact (ALM and altmetrics)
Article Impact (ALM and altmetrics)Richard Cave
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialPaul Groth
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at ElsevierPaul Groth
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure petrknoth
 
Pde 1st year Session 3 Feb 2018
Pde 1st year Session 3 Feb 2018Pde 1st year Session 3 Feb 2018
Pde 1st year Session 3 Feb 2018EISLibrarian
 
Using ImpactStory: an Introduction
Using ImpactStory: an IntroductionUsing ImpactStory: an Introduction
Using ImpactStory: an IntroductionSarahG_SS
 
Overview of the altmetrics landscape
Overview of the altmetrics landscapeOverview of the altmetrics landscape
Overview of the altmetrics landscapeRichard Cave
 
The Right Metrics for Generation Open [Open Access Week 2014]
The Right Metrics for Generation Open [Open Access Week 2014]The Right Metrics for Generation Open [Open Access Week 2014]
The Right Metrics for Generation Open [Open Access Week 2014]Impactstory Team
 
PDE1st year session 1 Oct 2017
PDE1st year session 1 Oct 2017PDE1st year session 1 Oct 2017
PDE1st year session 1 Oct 2017EISLibrarian
 
Sa discover text webinar
Sa discover text webinarSa discover text webinar
Sa discover text webinarQuestionPro
 
UKSG Conference 2016 Breakout Session - The new research data environment: im...
UKSG Conference 2016 Breakout Session - The new research data environment: im...UKSG Conference 2016 Breakout Session - The new research data environment: im...
UKSG Conference 2016 Breakout Session - The new research data environment: im...UKSG: connecting the knowledge community
 
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزيرمحاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزيرمركز البحوث الأقسام العلمية
 
BEng Product Design 1st year Session 2 Oct 2021
BEng Product Design 1st year Session 2 Oct 2021BEng Product Design 1st year Session 2 Oct 2021
BEng Product Design 1st year Session 2 Oct 2021EISLibrarian
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesSciBite Limited
 
Altmetrics 2014-4-15-slideshare
Altmetrics 2014-4-15-slideshareAltmetrics 2014-4-15-slideshare
Altmetrics 2014-4-15-slideshareCMHSL
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to ReuseAnita de Waard
 

Tendances (20)

Finding statistics2
Finding statistics2Finding statistics2
Finding statistics2
 
Article Impact (ALM and altmetrics)
Article Impact (ALM and altmetrics)Article Impact (ALM and altmetrics)
Article Impact (ALM and altmetrics)
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at Elsevier
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure
 
Pde 1st year Session 3 Feb 2018
Pde 1st year Session 3 Feb 2018Pde 1st year Session 3 Feb 2018
Pde 1st year Session 3 Feb 2018
 
NISO Webinar: Beyond Publish or Perish: Alternative Metrics for Scholarship
NISO Webinar: Beyond Publish or Perish: Alternative Metrics for ScholarshipNISO Webinar: Beyond Publish or Perish: Alternative Metrics for Scholarship
NISO Webinar: Beyond Publish or Perish: Alternative Metrics for Scholarship
 
Using ImpactStory: an Introduction
Using ImpactStory: an IntroductionUsing ImpactStory: an Introduction
Using ImpactStory: an Introduction
 
Overview of the altmetrics landscape
Overview of the altmetrics landscapeOverview of the altmetrics landscape
Overview of the altmetrics landscape
 
CCE2060 Oct 2017
CCE2060 Oct 2017CCE2060 Oct 2017
CCE2060 Oct 2017
 
The Right Metrics for Generation Open [Open Access Week 2014]
The Right Metrics for Generation Open [Open Access Week 2014]The Right Metrics for Generation Open [Open Access Week 2014]
The Right Metrics for Generation Open [Open Access Week 2014]
 
PDE1st year session 1 Oct 2017
PDE1st year session 1 Oct 2017PDE1st year session 1 Oct 2017
PDE1st year session 1 Oct 2017
 
Sa discover text webinar
Sa discover text webinarSa discover text webinar
Sa discover text webinar
 
UKSG Conference 2016 Breakout Session - The new research data environment: im...
UKSG Conference 2016 Breakout Session - The new research data environment: im...UKSG Conference 2016 Breakout Session - The new research data environment: im...
UKSG Conference 2016 Breakout Session - The new research data environment: im...
 
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزيرمحاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
 
BEng Product Design 1st year Session 2 Oct 2021
BEng Product Design 1st year Session 2 Oct 2021BEng Product Design 1st year Session 2 Oct 2021
BEng Product Design 1st year Session 2 Oct 2021
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
BioSharing - Update - Feb2016
BioSharing - Update - Feb2016BioSharing - Update - Feb2016
BioSharing - Update - Feb2016
 
Altmetrics 2014-4-15-slideshare
Altmetrics 2014-4-15-slideshareAltmetrics 2014-4-15-slideshare
Altmetrics 2014-4-15-slideshare
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to Reuse
 

Similaire à Elsevier - Smart Data and Algorithms for the Publishing Industry

Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
 
Novinky u Elsevier: Citace, metriky, spolupráce
Novinky u Elsevier: Citace, metriky, spolupráceNovinky u Elsevier: Citace, metriky, spolupráce
Novinky u Elsevier: Citace, metriky, spolupráceKnihovnaUTB
 
The New Dimensions in Scholcomm: How a global scholarly community collaborati...
The New Dimensions in Scholcomm: How a global scholarly community collaborati...The New Dimensions in Scholcomm: How a global scholarly community collaborati...
The New Dimensions in Scholcomm: How a global scholarly community collaborati...NASIG
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
 
Upgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureUpgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureBjörn Brembs
 
Discovery study detailed results 20140728
Discovery study detailed results 20140728Discovery study detailed results 20140728
Discovery study detailed results 20140728Michael Levine-Clark
 
ICIC 2014 New Product Introduction ProQuest
ICIC 2014 New Product Introduction ProQuestICIC 2014 New Product Introduction ProQuest
ICIC 2014 New Product Introduction ProQuestDr. Haxel Consult
 
AReS and Altmetrics: How we use them at ILRI
AReS and Altmetrics: How we use them at ILRIAReS and Altmetrics: How we use them at ILRI
AReS and Altmetrics: How we use them at ILRIILRI
 
Federating Research Profiling Data
Federating Research Profiling DataFederating Research Profiling Data
Federating Research Profiling Dataericmeeks
 
A comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery toolsA comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery toolsSusantaSethi3
 
Biomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterpriseBiomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterprisePhilip Bourne
 
What does open science mean? A stakeholder perspective
What does open science mean? A stakeholder perspectiveWhat does open science mean? A stakeholder perspective
What does open science mean? A stakeholder perspectiveLIBER Europe
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureMichele Pasin
 

Similaire à Elsevier - Smart Data and Algorithms for the Publishing Industry (20)

Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Novinky u Elsevier: Citace, metriky, spolupráce
Novinky u Elsevier: Citace, metriky, spolupráceNovinky u Elsevier: Citace, metriky, spolupráce
Novinky u Elsevier: Citace, metriky, spolupráce
 
Jonathan Breeze, Symplectic
Jonathan Breeze, SymplecticJonathan Breeze, Symplectic
Jonathan Breeze, Symplectic
 
BLC & Digital Science: Jonathan Breeze, Symplectic
BLC & Digital Science: Jonathan Breeze, SymplecticBLC & Digital Science: Jonathan Breeze, Symplectic
BLC & Digital Science: Jonathan Breeze, Symplectic
 
The New Dimensions in Scholcomm: How a global scholarly community collaborati...
The New Dimensions in Scholcomm: How a global scholarly community collaborati...The New Dimensions in Scholcomm: How a global scholarly community collaborati...
The New Dimensions in Scholcomm: How a global scholarly community collaborati...
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
Upgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureUpgrading the Scholarly Infrastructure
Upgrading the Scholarly Infrastructure
 
Discovery study detailed results 20140728
Discovery study detailed results 20140728Discovery study detailed results 20140728
Discovery study detailed results 20140728
 
ICIC 2014 New Product Introduction ProQuest
ICIC 2014 New Product Introduction ProQuestICIC 2014 New Product Introduction ProQuest
ICIC 2014 New Product Introduction ProQuest
 
AReS and Altmetrics: How we use them at ILRI
AReS and Altmetrics: How we use them at ILRIAReS and Altmetrics: How we use them at ILRI
AReS and Altmetrics: How we use them at ILRI
 
Bosman-Kramer Changing Research Workflows
Bosman-Kramer Changing Research WorkflowsBosman-Kramer Changing Research Workflows
Bosman-Kramer Changing Research Workflows
 
Federating Research Profiling Data
Federating Research Profiling DataFederating Research Profiling Data
Federating Research Profiling Data
 
A comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery toolsA comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery tools
 
The Future of Research Communications and e-Scholarship: Are we there yet?
The Future of Research Communications and e-Scholarship: Are we there yet?The Future of Research Communications and e-Scholarship: Are we there yet?
The Future of Research Communications and e-Scholarship: Are we there yet?
 
Biomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterpriseBiomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital Enterprise
 
What does open science mean? A stakeholder perspective
What does open science mean? A stakeholder perspectiveWhat does open science mean? A stakeholder perspective
What does open science mean? A stakeholder perspective
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
 
Data and Research Infrastructures and Open Science
Data and Research Infrastructures and Open ScienceData and Research Infrastructures and Open Science
Data and Research Infrastructures and Open Science
 

Dernier

办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 

Dernier (20)

办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 

Elsevier - Smart Data and Algorithms for the Publishing Industry

  • 1. Smart Data and Algorithms for the Publishing Industry Antonio Gulli
  • 2. PLATFORMS • Founded over 130 years ago • Over the last 50 years the majority of Noble Laureates have published with Elsevier • Employ over 7,000 employees in 24 countries • Published over 330,000 articles • Received over 1 million submissions • Work with over 30 million Scientists, students, health & information professionals • Over 53 million items indexed by Scopus • Over 400 millions articles known by Mendeley • Usage data on ScienceDirect, Scopus, Mendeley, EV, Scival, Pure and the submission platforms Elsevier – A data driven company
  • 3. Smart Data is a central element for the Publishing Industry: content, usage, transactions 3 READER EDITORAUTHOR REVIEWER TEACHERCOLLABORATORRESEARCHER ProductsCapabilities Data: MiningthesignalsResearchers One login, one cookie Data creation & Query Event notification Content creation Identity & Authentification Search & Discovery A/B Taxonomies Research data Editorial & Review Funding data Annotations …Docs Usage Data Appl. events Search queries Profiles www.elsevier.com journals.elsevier.com
  • 6. Scopus The largest abstract and citation database of peer-reviewed research literature. Its used by academics, government researchers and corporate R&D professionals who need to search, discover and analyze research. 21,900+ journals | 5,000 publishers | 56 million items | 3,000+ customers
  • 7. Scival • Every 2 weeks the entire SciVal dataset is updated • Combinations 34.4M Scopus articles and their 400M citations • roughly 90,000B metric combinations
  • 10. “Related and relevant” areas for multidisciplinary areas
  • 11. Recommenders @ Mendeley Experiments & Results • Offline Evaluation • Online A/B testing – 2 email tests – ~40-50k users – 6-8 algorithms – 40-50% Open Rate – 8-12% CTR
  • 12. EPFL/Recommend – Increase Click Through Rate and Article Accesses 12 Article page view in ScienceDirect Recommendation s • When customers look at articles on ScienceDirect, they’re also provided with links to other articles of interest
  • 13. EPFL: École Polytechnique Fédérale de Lausanne (Switzerland) Approach: • Analyze usage data & articles on 4 dimensions • Pre-compute recommendations on HPCC • Offer recommendations during article page view • A/B test variants for scoring / ranking results EPFL / Recommend 13
  • 14. Recommendation Engine development 14 During the pilot… • EPFL cycled through 6 variants while testing for best dimension combinations & weights • A/B testing was used on 6 variants, each measured during a fixed usage window • Compare to existing related articles Results • The 6th variant provided ~65% increase in CTR (click-through-rate) over Related Articles • Conclusion: Recommendation based on co-usage outperforms (the current) recommendation based on topical similarity... when properly tuned.
  • 16. Search – Migration Fast to Solr 16 • Hot migration from Fast to Solr optimizing an online metric (FTA – clicked) with proper A/B testing • Identification of offline proxy-metric for exploring the feature space
  • 17. 904Lab: Learning to Rank, interleaving & A/B/C
  • 21. What, again, is a Fingerprint? Entities: structured representation of unstructured data
  • 22. • Compute semantic content representation of scientific text • Support of multiple vocabularies / ontologies is important Engine’s value – agile adaptation to new thesauri Indexing Scientific Content with the Fingerprint Engine
  • 25. Scival: Summary tab 1. The Summary page gives you a quick impression of the nature of and trends within the Research Area. 3. The bottom of page provides a quick overview of the institutions, countries, authors and journals that make up the field. 2. The word cloud visualizes the most important keyphrases for the field. (based on Elsevier’s Fingerprinting technology).
  • 26. Scival : Institutions tab – map 1. The map view gives you a view on the world where you can see all citation and usage activity within your Research Area 2. View the geographical location of the top performing institutions and rising stars of this Research Area 3. ‘Top performers’ and ‘rising stars’ can be visualized using citation, output and usage metrics.
  • 27. Scival: Institution tab - chart 1. The chart view allows you to analyze top performers and rising stars within the selected Research Area over time. 3. The selected items can be compared against one another using both previously available metrics and the new usage metrics. 2. Here we have selected the top institutions based usage (views count)
  • 29. Platform capabilities Reporting Researcher profiles Impact metricsBenchmarking & analytics
  • 30. Clustering and tagging • Process takes 5 minutes for typical institution New coverage Clustering coverage Merge clusters Tag clusters
  • 31. Entity identification Articles Tags(Researchers, institutions, papers) 500,000 per day ~5,000+ per institution ~4,000 institutions ×
  • 32. Tagging accuracy Imperial Leicester Stories per day 13.2 7.6 Mentions per story 5.4 4.5 Stories with DOIs 17% 12% Mentions with DOIs 2% 3% Newsflo approachStandard approach
  • 34. 34 Scopus Author Identifier algorithm based on: • Affiliation • Address • Subject area • Source title • Dates of publication • Citations • Co-authors Searching in Scopus relies on disambiguating authors and their affiliations
  • 35. Authors like “Michael Smith” can be problematic 35 All of these records appear to be for the same “Michael Smith,” but they have separate Author IDs in Scopus
  • 36. Authors like “Wei Wu” are also a challenge 36HPCC In Elsevier Update This “Wei Wu” appears to have 1642 documents, but they span all of these subject areas:
  • 37. Entities in the Social Network of Science 37 The Researcher Disambiguation Program is the place where those entities are given a true identity, with the aim to reduce the errors in attribution Researchers Institutions Articles Journals Patents Funding bodies Grants Research domains Countries Labs Projects Research data sets Publishing cluster Usage cluster Editors Reviewers Author s Inventor s Funding cluster Opportunities Corporations Publishers Conferences Societies Res Eval Agencies Counter Typical needs of the entities: Develop research strategy Obtain funding Acquire talent Prioritize activities Working efficiently Measure/Demonstrat e societal impact Develop career Publish Find reviewers Build reputation Ranking … …/204 2.5k/20k 70m/70m …/16k …/11m …/10k 12m/65m 90k/millio ns Press clippings 1/3.5k 7k/30k 300k/1.2 m
  • 39. Anyone can get an API key and build an application with freely available content from ScienceDirect and Scopus http://dev.elsevier.com
  • 40. Scopus metrics are freely available for use Citation counts Journal metrics
  • 42. 42 • High Performance Computing Cluster Platform (HPCC) enables data integration on a scale not previously available and real-time answers to millions of users. Built for big data and proven for 10 years with enterprise customers. • Offers a single architecture, two data platforms (query and refinery) and a consistent data-intensive programming language (ECL) • ECL Parallel Programming Language optimized for business differentiating data intensive applications HPCC Systems: Proven with Enterprise Customers
  • 43. HPCC Architecture – Open source 43
  • 44. Recommendation Engine process at a glance 44HPCC In Elsevier Update 5 years of SD usage data/events All SD XML Articles Journal Rankings Thor Co- download matrix Similarity Attribute Ranking 6 billion events ~12M articles pii-739156 Twice weekly, move to daily Roxie pii-684259, pii_585346, pii_491635
  • 45. ELSEVIER LABS - INTRO ELSEVIER LABS - AREAS OF INTEREST • Information extraction for scholarly text • Knowledge Graphs • Entity linking / disambiguation • Machine Reading • Information retrieval / Search • Question answering • Large Scale Data Management • Deep Multimodal Networks for Image and Text Analysis • Future of research communication
  • 46. ELSEVIER LABS - INTRO RECENT PUBLICATIONS • Sara Magliacane, Philip Stutz, Paul Groth, Abraham Bernstein, foxPSL: A Fast, Optimized and eXtended PSL implementation, International Journal of Approximate Reasoning (2015). • Ron Daniel, "Large Scale Text Analytics with Apache Spark"; Presented at Text Analytics World 2015, San Francisco. • L. Moreau, P. Groth, J. Cheney, T. Lebo, S. Miles, The rationale of PROV, Web Semantics: Science, Services and Agents on the World Wide Web (2015). • Marcin Wylot, Philippe Cudré-Mauroux, and Paul Groth. “Executing Provenance-Enabled Queries over Web Data”, Proceedings of the 24th International World Wide Web Conference (WWW 2015), 2015. • Elshaimaa Ali, Michael Lauruhn. Wikipedia-based Extraction of Lightweight Ontologies for Concept Level Annotation, International Conference on Dublin Core and Metadata Applications (DC 2014), Austin, Tx • Ron Daniel, "Predicting Citation Counts"; Research Trends; June 2014.
  • 47. WDSM – CHALLENGE – MICROSOFT - ELSEVIER
  • 48. Join Elsevier if you are serious about Bigdata and Machine Learning
  • 49. Q&A