SlideShare une entreprise Scribd logo
1  sur  31
Rachael Lammey
Product Manager, CrossRef
UKSG 2015
CrossRef Text and Data Mining Services:
one year in
Not-for-profit association of scholarly publishers
All subjects, all business models
5,000+ organizations from all over the world
83 non-publisher affiliates, 2000 library affiliates
72 million + DOIs assigned to content items
10.1098/ rstl.
1665.0001
User clicks on
CrossRef DOI
reference link
in Journal A
Tani, N., N. Tomaru, M. Araki, AND K. Ohba. 1996. Genetic diversity and
differentiation in populations of Japanese stone pine (Pinus pumila) in
Japan. Canadian Journal of Forest Research 26: 1454–1462.[CrossRef]
DOI
directory
returns URL
User accesses
cited article in
Journal B
100,000,000
A Text and Data Mining Hub for Researchers
What is Text and Data Mining
(TDM)?
Text Mining is an interdisciplinary field combining techniques
from linguistics, computer science and statistics to build
tools that can efficiently retrieve and extract information
from digital text.
http://blogs.plos.org/everyone/2013/04/17/announcing-the-plos-text-mining-collection/
It uses powerful computers to find links between drugs
and side effects, or genes and diseases, that are hidden
within the vast scientific literature. These are discoveries
that a person scouring through papers one by one may
never notice.
http://www.theguardian.com/science/2012/may/23/text-mining-research-tool-forbidden
Why?• Researchers find it impractical to
negotiate multiple bilateral agreements
with hundreds of subscription-based
publishers in order to authorise TDM of
subscribed content.
• Subscription-based publishers find it
impractical to negotiate multiple bilateral
agreements with thousands of
researchers and institutions in order to
authorise TDM of subscribed content.
• All parties would benefit from support of
standard APIs and data representations in
order to enable TDM across both open
access and subscription-based publishers.
Build Cross-Publisher
API for TDM
Access To Full Text
Problem: Researchers want to get full text
content from publishers’ sites for OA or
subscribed content. Solution:
Solution: Common API (protocol) for requesting
machine readable full text from many different
publishers
Negotiating Permissions
Problem: Researchers want to know whether text
and data mining is allowed, and if not, get
permission.
Solution: Licensing information embedded in article
metadata and a registry for supplemental text and
data mining terms and conditions (licenses).
Text and Data Mining Steps
• Define problem
• Identify potential corpus to mine
• Discovery (full text links)
• Identification of subset which can be
accessed (license information)
• Download identified corpus
• Text and data mine corpus
The Basic Workflow
Publisher Participation
To enable their content for use by the service, publishers have
to provide CrossRef with two additional pieces of metadata:
• Full text URIs (to show where the full-text is located)
• License URIs (to show the Terms & Conditions under
which they can use it)
• Can implement rate limiting
CrossRef doesn’t charge publishers for participating in this
service.
Researcher Use
• The CrossRef REST API is the main aspect of this service
• It is designed to allow researchers to easily harvest full text
documents from all participating publishers regardless of their
business model (e.g. open access, subscription).
• It makes use of CrossRef DOI content negotiation to provide
researchers with links to the full text of content located on the
publisher’s site.
• The publisher remains responsible for actually delivering the full
text of the content requested
• CrossRef does not charge researchers for using the service
Publisher Metadata for CrossRef TDM:
Hindawi
Publisher Metadata for CrossRef TDM:
Elsevier
CrossRef TDM Demo
Click-Through
Service
Extended Workflow
Researcher
View
Publisher
View
Researcher queries DOI using CN + API
token
Publisher verifies API token
If token verified AND access control allows,
publisher returns full text
(frequency at publisher discretion)
Benefits
• Streamlines researcher access to distributed full text for
TDM
• Enables machine-to-machine, automated access for
recognized TDM (i.e. researchers won’t be locked out of
publisher sites)
• Enables article-level licensing info and easy mechanism
for supplemental T&Cs for text and data mining
(publishers discussing model license via STM)
Publishers
Over 14 million articles with full-text links and license
information deposited
Usable as is:
https://blogs.nd.edu/emorgan/
http://tdmsupport.crossref.org/
www.crossref.org
http://www.crossref.org/tdm/index.html
tdm@crossref.org
How can researchers use
the service?
• Modify TDM tools to make use of the API token
• Modify TDM tools to look for <lic_ref> elements
• Register with the click-through service and
accept/decline licenses (if applicable)
• Details at: http://tdmsupport.crossref.org/researchers/
Using the DOI as the basis for a common text and data mining
API provides several benefits. For example, the DOI provides:
•An easy way to de-duplicate documents that may be found on
several sites.
•Persistent provenance information.
•An easy way to document, share and compare coropra without
having to exchange the actual documents
•A mechanism to ensure the reproducibility of TDM results using
the source documents.
•A mechanism to track the impact of updates, corrections
retractions and withdrawals on corpora.
Why use the DOI?

Contenu connexe

Tendances

Reference linking and Cited-by
Reference linking and Cited-byReference linking and Cited-by
Reference linking and Cited-byCrossref
 
The Global reach of Crossref metadata
The Global reach of Crossref metadataThe Global reach of Crossref metadata
The Global reach of Crossref metadataCrossref
 
Working with Crossref and registering content
Working with Crossref and registering contentWorking with Crossref and registering content
Working with Crossref and registering contentCrossref
 
Introduction to Crossref: History, Mission, Members
Introduction to Crossref: History, Mission, MembersIntroduction to Crossref: History, Mission, Members
Introduction to Crossref: History, Mission, MembersCrossref
 
Managing errata and retractions with CrossMark
Managing errata and retractions with CrossMarkManaging errata and retractions with CrossMark
Managing errata and retractions with CrossMarkCrossref
 
Checking for originality: Crossref Similarity Check
Checking for originality: Crossref Similarity CheckChecking for originality: Crossref Similarity Check
Checking for originality: Crossref Similarity CheckCrossref
 
Barcelona 2014: CrossRef System and Support Update by Chuck Koscher
Barcelona 2014: CrossRef System and Support Update by Chuck KoscherBarcelona 2014: CrossRef System and Support Update by Chuck Koscher
Barcelona 2014: CrossRef System and Support Update by Chuck KoscherCrossref
 
Collecting and using funding data in your publications
Collecting and using funding data in your publicationsCollecting and using funding data in your publications
Collecting and using funding data in your publicationsCrossref
 
Managing changes to content: Crossmark
Managing changes to content: CrossmarkManaging changes to content: Crossmark
Managing changes to content: CrossmarkCrossref
 
Access the world’s research outputs through the CORE API
Access the world’s research outputs through the CORE API Access the world’s research outputs through the CORE API
Access the world’s research outputs through the CORE API Matteo Cancellieri
 
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey BilderCrossref
 
4. Crossref and Atypon
4. Crossref and Atypon4. Crossref and Atypon
4. Crossref and AtyponCrossref
 
Springer LAB: Implementing a discovery tool
Springer LAB: Implementing a discovery toolSpringer LAB: Implementing a discovery tool
Springer LAB: Implementing a discovery toolJason Price, PhD
 
Citation Analysis for the Free, Online Literature
Citation Analysis for the Free, Online LiteratureCitation Analysis for the Free, Online Literature
Citation Analysis for the Free, Online LiteratureBalachandar Radhakrishnan
 
Understanding Crossref Metadata
Understanding Crossref MetadataUnderstanding Crossref Metadata
Understanding Crossref MetadataCrossref
 
PoolParty SKOS and Linked Data
PoolParty SKOS and Linked DataPoolParty SKOS and Linked Data
PoolParty SKOS and Linked DataAndreas Blumauer
 
Cited-by Linking
Cited-by Linking Cited-by Linking
Cited-by Linking Crossref
 
Multiple Resolution and handling content available in multiple places
Multiple Resolution and handling content available in multiple placesMultiple Resolution and handling content available in multiple places
Multiple Resolution and handling content available in multiple placesCrossref
 

Tendances (20)

Reference linking and Cited-by
Reference linking and Cited-byReference linking and Cited-by
Reference linking and Cited-by
 
The Global reach of Crossref metadata
The Global reach of Crossref metadataThe Global reach of Crossref metadata
The Global reach of Crossref metadata
 
Working with Crossref and registering content
Working with Crossref and registering contentWorking with Crossref and registering content
Working with Crossref and registering content
 
Introduction to Crossref: History, Mission, Members
Introduction to Crossref: History, Mission, MembersIntroduction to Crossref: History, Mission, Members
Introduction to Crossref: History, Mission, Members
 
Managing errata and retractions with CrossMark
Managing errata and retractions with CrossMarkManaging errata and retractions with CrossMark
Managing errata and retractions with CrossMark
 
Checking for originality: Crossref Similarity Check
Checking for originality: Crossref Similarity CheckChecking for originality: Crossref Similarity Check
Checking for originality: Crossref Similarity Check
 
Barcelona 2014: CrossRef System and Support Update by Chuck Koscher
Barcelona 2014: CrossRef System and Support Update by Chuck KoscherBarcelona 2014: CrossRef System and Support Update by Chuck Koscher
Barcelona 2014: CrossRef System and Support Update by Chuck Koscher
 
Collecting and using funding data in your publications
Collecting and using funding data in your publicationsCollecting and using funding data in your publications
Collecting and using funding data in your publications
 
MENGGUNAKAN METADATA PADA CROSSREF
MENGGUNAKAN METADATA PADA CROSSREFMENGGUNAKAN METADATA PADA CROSSREF
MENGGUNAKAN METADATA PADA CROSSREF
 
CARA MENGELOLA PERUBAHAN PADA NASKAH
CARA MENGELOLA PERUBAHAN PADA NASKAHCARA MENGELOLA PERUBAHAN PADA NASKAH
CARA MENGELOLA PERUBAHAN PADA NASKAH
 
Managing changes to content: Crossmark
Managing changes to content: CrossmarkManaging changes to content: Crossmark
Managing changes to content: Crossmark
 
Access the world’s research outputs through the CORE API
Access the world’s research outputs through the CORE API Access the world’s research outputs through the CORE API
Access the world’s research outputs through the CORE API
 
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
 
4. Crossref and Atypon
4. Crossref and Atypon4. Crossref and Atypon
4. Crossref and Atypon
 
Springer LAB: Implementing a discovery tool
Springer LAB: Implementing a discovery toolSpringer LAB: Implementing a discovery tool
Springer LAB: Implementing a discovery tool
 
Citation Analysis for the Free, Online Literature
Citation Analysis for the Free, Online LiteratureCitation Analysis for the Free, Online Literature
Citation Analysis for the Free, Online Literature
 
Understanding Crossref Metadata
Understanding Crossref MetadataUnderstanding Crossref Metadata
Understanding Crossref Metadata
 
PoolParty SKOS and Linked Data
PoolParty SKOS and Linked DataPoolParty SKOS and Linked Data
PoolParty SKOS and Linked Data
 
Cited-by Linking
Cited-by Linking Cited-by Linking
Cited-by Linking
 
Multiple Resolution and handling content available in multiple places
Multiple Resolution and handling content available in multiple placesMultiple Resolution and handling content available in multiple places
Multiple Resolution and handling content available in multiple places
 

En vedette

Open Research Data Pilot in Horizon 2020
Open Research Data Pilot in Horizon 2020Open Research Data Pilot in Horizon 2020
Open Research Data Pilot in Horizon 2020Elena Simukovic
 
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...OpenAIRE
 
OpenAIRE services and tools - presentation at #DI4R2016
OpenAIRE services and tools - presentation at #DI4R2016OpenAIRE services and tools - presentation at #DI4R2016
OpenAIRE services and tools - presentation at #DI4R2016OpenAIRE
 
Horizon 2020 Open Access to Publications Mandate: OpenAIRE Webinar (Oct. 22, ...
Horizon 2020 Open Access to Publications Mandate: OpenAIRE Webinar (Oct. 22, ...Horizon 2020 Open Access to Publications Mandate: OpenAIRE Webinar (Oct. 22, ...
Horizon 2020 Open Access to Publications Mandate: OpenAIRE Webinar (Oct. 22, ...OpenAIRE
 
Horizon 2020 Open Access mandate - OpenAIRE webinar by Inge Van Nieuwerburgh
Horizon 2020 Open Access mandate - OpenAIRE webinar by Inge Van NieuwerburghHorizon 2020 Open Access mandate - OpenAIRE webinar by Inge Van Nieuwerburgh
Horizon 2020 Open Access mandate - OpenAIRE webinar by Inge Van NieuwerburghOpenAIRE
 
Horizon 2020 and the open research data pilot
Horizon 2020 and the open research data pilotHorizon 2020 and the open research data pilot
Horizon 2020 and the open research data pilotSarah Jones
 

En vedette (6)

Open Research Data Pilot in Horizon 2020
Open Research Data Pilot in Horizon 2020Open Research Data Pilot in Horizon 2020
Open Research Data Pilot in Horizon 2020
 
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
 
OpenAIRE services and tools - presentation at #DI4R2016
OpenAIRE services and tools - presentation at #DI4R2016OpenAIRE services and tools - presentation at #DI4R2016
OpenAIRE services and tools - presentation at #DI4R2016
 
Horizon 2020 Open Access to Publications Mandate: OpenAIRE Webinar (Oct. 22, ...
Horizon 2020 Open Access to Publications Mandate: OpenAIRE Webinar (Oct. 22, ...Horizon 2020 Open Access to Publications Mandate: OpenAIRE Webinar (Oct. 22, ...
Horizon 2020 Open Access to Publications Mandate: OpenAIRE Webinar (Oct. 22, ...
 
Horizon 2020 Open Access mandate - OpenAIRE webinar by Inge Van Nieuwerburgh
Horizon 2020 Open Access mandate - OpenAIRE webinar by Inge Van NieuwerburghHorizon 2020 Open Access mandate - OpenAIRE webinar by Inge Van Nieuwerburgh
Horizon 2020 Open Access mandate - OpenAIRE webinar by Inge Van Nieuwerburgh
 
Horizon 2020 and the open research data pilot
Horizon 2020 and the open research data pilotHorizon 2020 and the open research data pilot
Horizon 2020 and the open research data pilot
 

Similaire à CrossRef Text & Data Mining - UKSG 2015

Introduction to CrossRef Text and Data Mining Webinar
Introduction to CrossRef Text and Data Mining WebinarIntroduction to CrossRef Text and Data Mining Webinar
Introduction to CrossRef Text and Data Mining WebinarCrossref
 
The Reach of Crossref metadata - Crossref LIVE South Africa
The Reach of Crossref metadata - Crossref LIVE South AfricaThe Reach of Crossref metadata - Crossref LIVE South Africa
The Reach of Crossref metadata - Crossref LIVE South AfricaCrossref
 
Registering content to enable connections - Rachael Lammey
Registering content to enable connections - Rachael LammeyRegistering content to enable connections - Rachael Lammey
Registering content to enable connections - Rachael LammeyCrossref
 
Introduction to Crossref - Crossref LIVE Kuala Lumpur
Introduction to Crossref - Crossref LIVE Kuala LumpurIntroduction to Crossref - Crossref LIVE Kuala Lumpur
Introduction to Crossref - Crossref LIVE Kuala LumpurCrossref
 
Who is using your content?
Who is using your content? Who is using your content?
Who is using your content? Crossref
 
Crossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinarCrossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinarCrossref
 
Crossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinarCrossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinarVanessa Fairhurst
 
Crossref Services - LIVE Mumbai
Crossref Services - LIVE MumbaiCrossref Services - LIVE Mumbai
Crossref Services - LIVE MumbaiCrossref
 
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Chris Shillum
 
Introduction to Crossref - Crossref LIVE Bangkok
Introduction to Crossref - Crossref LIVE BangkokIntroduction to Crossref - Crossref LIVE Bangkok
Introduction to Crossref - Crossref LIVE BangkokCrossref
 
Crossref Content Registration - LIVE Mumbai
Crossref Content Registration - LIVE MumbaiCrossref Content Registration - LIVE Mumbai
Crossref Content Registration - LIVE MumbaiCrossref
 
A comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery toolsA comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery toolsSusantaSethi3
 
Orcid works metadata working group recommendations
Orcid works metadata working group recommendationsOrcid works metadata working group recommendations
Orcid works metadata working group recommendationsORCID, Inc
 
OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...Open Science Fair
 
A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining Chris Shillum
 
Crossref Overview - Russian webinar
Crossref Overview - Russian webinar Crossref Overview - Russian webinar
Crossref Overview - Russian webinar Crossref
 

Similaire à CrossRef Text & Data Mining - UKSG 2015 (20)

Introduction to CrossRef Text and Data Mining Webinar
Introduction to CrossRef Text and Data Mining WebinarIntroduction to CrossRef Text and Data Mining Webinar
Introduction to CrossRef Text and Data Mining Webinar
 
CORE APIv3
CORE APIv3CORE APIv3
CORE APIv3
 
The Reach of Crossref metadata - Crossref LIVE South Africa
The Reach of Crossref metadata - Crossref LIVE South AfricaThe Reach of Crossref metadata - Crossref LIVE South Africa
The Reach of Crossref metadata - Crossref LIVE South Africa
 
Registering content to enable connections - Rachael Lammey
Registering content to enable connections - Rachael LammeyRegistering content to enable connections - Rachael Lammey
Registering content to enable connections - Rachael Lammey
 
Patham "NISO-ODI (Open Discovery Initiative) Standards Update"
Patham "NISO-ODI (Open Discovery Initiative) Standards Update"Patham "NISO-ODI (Open Discovery Initiative) Standards Update"
Patham "NISO-ODI (Open Discovery Initiative) Standards Update"
 
Introduction to Crossref - Crossref LIVE Kuala Lumpur
Introduction to Crossref - Crossref LIVE Kuala LumpurIntroduction to Crossref - Crossref LIVE Kuala Lumpur
Introduction to Crossref - Crossref LIVE Kuala Lumpur
 
Who is using your content?
Who is using your content? Who is using your content?
Who is using your content?
 
Crossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinarCrossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinar
 
Crossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinarCrossref for Ambassadors - Introductory webinar
Crossref for Ambassadors - Introductory webinar
 
Crossref Services - LIVE Mumbai
Crossref Services - LIVE MumbaiCrossref Services - LIVE Mumbai
Crossref Services - LIVE Mumbai
 
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
 
Introduction to Crossref - Crossref LIVE Bangkok
Introduction to Crossref - Crossref LIVE BangkokIntroduction to Crossref - Crossref LIVE Bangkok
Introduction to Crossref - Crossref LIVE Bangkok
 
Crossref Content Registration - LIVE Mumbai
Crossref Content Registration - LIVE MumbaiCrossref Content Registration - LIVE Mumbai
Crossref Content Registration - LIVE Mumbai
 
A comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery toolsA comparative study between commercial and open source discovery tools
A comparative study between commercial and open source discovery tools
 
Orcid works metadata working group recommendations
Orcid works metadata working group recommendationsOrcid works metadata working group recommendations
Orcid works metadata working group recommendations
 
OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...
 
Expanding the Possible: What’s New and Upcoming in Standards and Technologies...
Expanding the Possible: What’s New and Upcoming in Standards and Technologies...Expanding the Possible: What’s New and Upcoming in Standards and Technologies...
Expanding the Possible: What’s New and Upcoming in Standards and Technologies...
 
NISO Open Discovery Initiative, ALA Midwinter
NISO Open Discovery Initiative, ALA MidwinterNISO Open Discovery Initiative, ALA Midwinter
NISO Open Discovery Initiative, ALA Midwinter
 
A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining
 
Crossref Overview - Russian webinar
Crossref Overview - Russian webinar Crossref Overview - Russian webinar
Crossref Overview - Russian webinar
 

Plus de Crossref

Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref
 
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021  Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021 Crossref
 
Seminario web ‘Crossmark’, en español
Seminario web ‘Crossmark’, en español Seminario web ‘Crossmark’, en español
Seminario web ‘Crossmark’, en español Crossref
 
Working with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to knowWorking with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to knowCrossref
 
Преимущества и варианты использования метаданных в Crossref / The Value and ...
Преимущества и варианты использования метаданных в Crossref /  The Value and ...Преимущества и варианты использования метаданных в Crossref /  The Value and ...
Преимущества и варианты использования метаданных в Crossref / The Value and ...Crossref
 
Seminario web ‘Similarity Check’, en español
Seminario web ‘Similarity Check’, en españolSeminario web ‘Similarity Check’, en español
Seminario web ‘Similarity Check’, en españolCrossref
 
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...Crossref
 
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...Crossref
 
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...Crossref
 
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...Crossref
 
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021Crossref
 
Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...
 Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ... Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...
Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...Crossref
 
Los Metadatos Para la Comunidad de Investigacion
Los Metadatos Para la Comunidad de InvestigacionLos Metadatos Para la Comunidad de Investigacion
Los Metadatos Para la Comunidad de InvestigacionCrossref
 
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...Crossref
 
Content Registration, Crossref ALJEBI, Indonesia
Content Registration, Crossref ALJEBI, IndonesiaContent Registration, Crossref ALJEBI, Indonesia
Content Registration, Crossref ALJEBI, IndonesiaCrossref
 
crossmark update
crossmark updatecrossmark update
crossmark updateCrossref
 
Participation reports webinar December 2020
Participation reports webinar December 2020Participation reports webinar December 2020
Participation reports webinar December 2020Crossref
 
Participation reports webinar November 2020
Participation reports webinar November 2020Participation reports webinar November 2020
Participation reports webinar November 2020Crossref
 
Introduction to Crossmark/Crossmark: O que é e como usar
Introduction to Crossmark/Crossmark: O que é e como usarIntroduction to Crossmark/Crossmark: O que é e como usar
Introduction to Crossmark/Crossmark: O que é e como usarCrossref
 
Crossref LIVE UK Online
Crossref LIVE UK OnlineCrossref LIVE UK Online
Crossref LIVE UK OnlineCrossref
 

Plus de Crossref (20)

Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
 
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021  Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
 
Seminario web ‘Crossmark’, en español
Seminario web ‘Crossmark’, en español Seminario web ‘Crossmark’, en español
Seminario web ‘Crossmark’, en español
 
Working with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to knowWorking with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to know
 
Преимущества и варианты использования метаданных в Crossref / The Value and ...
Преимущества и варианты использования метаданных в Crossref /  The Value and ...Преимущества и варианты использования метаданных в Crossref /  The Value and ...
Преимущества и варианты использования метаданных в Crossref / The Value and ...
 
Seminario web ‘Similarity Check’, en español
Seminario web ‘Similarity Check’, en españolSeminario web ‘Similarity Check’, en español
Seminario web ‘Similarity Check’, en español
 
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...
 
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...
 
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
 
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...
 
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021
 
Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...
 Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ... Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...
Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...
 
Los Metadatos Para la Comunidad de Investigacion
Los Metadatos Para la Comunidad de InvestigacionLos Metadatos Para la Comunidad de Investigacion
Los Metadatos Para la Comunidad de Investigacion
 
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
 
Content Registration, Crossref ALJEBI, Indonesia
Content Registration, Crossref ALJEBI, IndonesiaContent Registration, Crossref ALJEBI, Indonesia
Content Registration, Crossref ALJEBI, Indonesia
 
crossmark update
crossmark updatecrossmark update
crossmark update
 
Participation reports webinar December 2020
Participation reports webinar December 2020Participation reports webinar December 2020
Participation reports webinar December 2020
 
Participation reports webinar November 2020
Participation reports webinar November 2020Participation reports webinar November 2020
Participation reports webinar November 2020
 
Introduction to Crossmark/Crossmark: O que é e como usar
Introduction to Crossmark/Crossmark: O que é e como usarIntroduction to Crossmark/Crossmark: O que é e como usar
Introduction to Crossmark/Crossmark: O que é e como usar
 
Crossref LIVE UK Online
Crossref LIVE UK OnlineCrossref LIVE UK Online
Crossref LIVE UK Online
 

Dernier

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Dernier (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

CrossRef Text & Data Mining - UKSG 2015

  • 1. Rachael Lammey Product Manager, CrossRef UKSG 2015 CrossRef Text and Data Mining Services: one year in
  • 2. Not-for-profit association of scholarly publishers All subjects, all business models 5,000+ organizations from all over the world 83 non-publisher affiliates, 2000 library affiliates 72 million + DOIs assigned to content items
  • 4. User clicks on CrossRef DOI reference link in Journal A Tani, N., N. Tomaru, M. Araki, AND K. Ohba. 1996. Genetic diversity and differentiation in populations of Japanese stone pine (Pinus pumila) in Japan. Canadian Journal of Forest Research 26: 1454–1462.[CrossRef] DOI directory returns URL User accesses cited article in Journal B
  • 6. A Text and Data Mining Hub for Researchers
  • 7. What is Text and Data Mining (TDM)? Text Mining is an interdisciplinary field combining techniques from linguistics, computer science and statistics to build tools that can efficiently retrieve and extract information from digital text. http://blogs.plos.org/everyone/2013/04/17/announcing-the-plos-text-mining-collection/ It uses powerful computers to find links between drugs and side effects, or genes and diseases, that are hidden within the vast scientific literature. These are discoveries that a person scouring through papers one by one may never notice. http://www.theguardian.com/science/2012/may/23/text-mining-research-tool-forbidden
  • 8. Why?• Researchers find it impractical to negotiate multiple bilateral agreements with hundreds of subscription-based publishers in order to authorise TDM of subscribed content. • Subscription-based publishers find it impractical to negotiate multiple bilateral agreements with thousands of researchers and institutions in order to authorise TDM of subscribed content. • All parties would benefit from support of standard APIs and data representations in order to enable TDM across both open access and subscription-based publishers.
  • 9.
  • 11. Access To Full Text Problem: Researchers want to get full text content from publishers’ sites for OA or subscribed content. Solution: Solution: Common API (protocol) for requesting machine readable full text from many different publishers
  • 12. Negotiating Permissions Problem: Researchers want to know whether text and data mining is allowed, and if not, get permission. Solution: Licensing information embedded in article metadata and a registry for supplemental text and data mining terms and conditions (licenses).
  • 13. Text and Data Mining Steps • Define problem • Identify potential corpus to mine • Discovery (full text links) • Identification of subset which can be accessed (license information) • Download identified corpus • Text and data mine corpus
  • 15. Publisher Participation To enable their content for use by the service, publishers have to provide CrossRef with two additional pieces of metadata: • Full text URIs (to show where the full-text is located) • License URIs (to show the Terms & Conditions under which they can use it) • Can implement rate limiting CrossRef doesn’t charge publishers for participating in this service.
  • 16. Researcher Use • The CrossRef REST API is the main aspect of this service • It is designed to allow researchers to easily harvest full text documents from all participating publishers regardless of their business model (e.g. open access, subscription). • It makes use of CrossRef DOI content negotiation to provide researchers with links to the full text of content located on the publisher’s site. • The publisher remains responsible for actually delivering the full text of the content requested • CrossRef does not charge researchers for using the service
  • 17. Publisher Metadata for CrossRef TDM: Hindawi
  • 18. Publisher Metadata for CrossRef TDM: Elsevier
  • 24. Researcher queries DOI using CN + API token Publisher verifies API token If token verified AND access control allows, publisher returns full text (frequency at publisher discretion)
  • 25. Benefits • Streamlines researcher access to distributed full text for TDM • Enables machine-to-machine, automated access for recognized TDM (i.e. researchers won’t be locked out of publisher sites) • Enables article-level licensing info and easy mechanism for supplemental T&Cs for text and data mining (publishers discussing model license via STM)
  • 26. Publishers Over 14 million articles with full-text links and license information deposited
  • 30. How can researchers use the service? • Modify TDM tools to make use of the API token • Modify TDM tools to look for <lic_ref> elements • Register with the click-through service and accept/decline licenses (if applicable) • Details at: http://tdmsupport.crossref.org/researchers/
  • 31. Using the DOI as the basis for a common text and data mining API provides several benefits. For example, the DOI provides: •An easy way to de-duplicate documents that may be found on several sites. •Persistent provenance information. •An easy way to document, share and compare coropra without having to exchange the actual documents •A mechanism to ensure the reproducibility of TDM results using the source documents. •A mechanism to track the impact of updates, corrections retractions and withdrawals on corpora. Why use the DOI?

Notes de l'éditeur

  1. Questions at end. Talk a little bit about what CrossRef is then move on to talk about our text and data mining service.
  2. First just a few words about CrossRef for anyone who isn’t a member or might not be familiar with us as an organisation. CrossRef is a not-for-profit membership organisation of international scholarly publishers. We have 4000 member publishers, representing all disciplines - not just STM, and comprising commercial publishers, academic societies, open access publishers, university presses. We also have 83 affiliate members and 2000 library affiliates - these libraries and other organisations make use of the CrossRef database to look up DOIs and metadata. We are the largest DOI registration agency and have assigned nearly 63 million DOIs to date.
  3. Publishers were finding that web sites changed, content moved, and links that they had put into their articles stopped working. So they started a multi-publisher initiative to solve this problem of broken links. This is done using the DOI - the Digital Object Identifier, which I’m sure many of you are familiar with. A CrossRef DOI is simply a unique identifier for a piece of content. Once assigned, it doesn’t change. It is to all intents and purposes a meaningless number, but it allows that piece of content to be located on the web.
  4. And it works like this: publishers use CrossRef DOIs to link to content, usually from the references at the end of articles. Users click on those DOI-based links and are referred via the CrossRef database to the cited article at it’s correct location on the web. If content moves the publisher only has to update the CrossRef database once, and all of the publishers that are linking to their content using CrossRef DOIs will be redirected to the content in its new location.
  5. Every month there are around 90 million clicks on CrossRef DOI links, so 100 million citations resolved to content.
  6. The issue of Text and Data Mining has become very important and CrossRef is in a unique position to expand its current infrastructure (a registry of unique identifiers and metadata for scholarly content and thousands of members) to make TDM easier for researchers and their institutions and publishers. Technical solution - we aren’t addressing the issue of licensing. CrossRef services are based around collaboration – achieving things across the industry that it wouldn’t make sense for each publisher to implement individually.
  7. Why did CrossRef develop this service? Applies to OA content too. Let’s just illustrate these issues.
  8. Bilateral agreements aspect - In the past, researchers who wish to text and data mine published literature have no common or simple way of accessing the full text for the content they wish to mine. This is true both of subscription-based content as well as of open access content. Consequently, TDM users access the content in one or two ways: Negotiating with publishers to have the content delivered to them, either via physical media or bulk data transfer (e.g. FTP) “Screen-scraping” the publisher’s website. The first option doesn’t scale well across multiple Publishers and Researchers. It also presents synchronisation problems if the researchers want an ongoing feed of refreshed content. The issue with the second option is that “screen scraping” is an inefficient, fragile and error prone mechanism for identifying and downloading full text. Screen scrapers put a large performance burden on web sites and, at the same time, any slight changes to the web site can break the tool that is doing the screen scraping. CrossRef Text and Data Mining provides a common solution which works across Open Access and subscription-based publishers and is free for anyone to use.
  9. Application programming interface. Prootcol for requesting the information.
  10. Needs publishers to deposit full text links
  11. And links to license information
  12. CrossRef service trying to deal with these three steps. Discovery of where the full text is located, finding out if you have permission to mine it, and then pulling back that corpus of content in order to work on it.
  13. This needs to be added to the publisher XML – license information at the article-level. Examples on our support site.
  14. This needs to be added to the publisher XML – license information at the article-level. Examples on our support site.
  15. Publishers who require researchers to agree to a specific set of Terms and Conditions (T&amp;Cs) before they are allowed to text and data mine content that they otherwise have access to (e.g. through an existing subscription) will need to make use of the click-through service. The click-through service is a registry for supplemental text and data mining terms and conditions (licenses).
  16. So to put it all together…
  17. Working group which will migrate to a full CrossRef Committee when the service is officially launched seen over 100,000 deposits of full text links and license information, mainly from Hindawi, Elsevier &amp; KAMJE.
  18. Eric Lease Morgan
  19. Support site with info. Info on rate limiting on there too.
  20. Publishers and researchers in pilot. Launch in May
  21. Rate limiting too
  22. Processing the same document on multiple sites could easily skew text and data mining results and traditional techniques for eliminating duplicates (e.g. hashes, etc.) will not work reliably if the document in question exists in several representations (e.g. PDF, HTML, ePub ) and/or versions (e.g. accepted manuscript, version of record) Using the DOI as a key will allow researchers to retrieve and verify the provenance of the items in the TDM corpus, many years into the future when traditional HTTP URLs will have already broken