SlideShare une entreprise Scribd logo
1  sur  48
Crossref for Text &
Data Mining
Rachael Lammey
Product Manager, CrossRef
December 2015
Not-for-profit association of scholarly publishers
All subjects, all business models
5,000+ organizations from all over the world
83 non-publisher affiliates, 2000 library affiliates
76 million content items
About Crossref
10.1098/rstl.1665.0001
User clicks on Crossref
DOI reference link in
Journal A
Tani, N., N. Tomaru, M. Araki, AND K. Ohba. 1996. Genetic diversity and
differentiation in populations of Japanese stone pine (Pinus pumila) in Japan.
Canadian Journal of Forest Research 26: 1454–1462.[CrossRef]
Crossref DOI directory
returns URL
User accesses cited
article in Journal B
100,000,000
Crossref Services
• Cross-publisher reference linking
• Cross-publisher Cited-by linking
• Cross-publisher metadata feeds
• Cross-publisher plagiarism screening
• Cross-publisher update identification
• Cross-publisher funder identification
• Cross-publisher text and data mining
Using Crossref
for text mining
What is text and data mining?
Text Mining is an interdisciplinary field combining
techniques from linguistics, computer science and
statistics to build tools that can efficiently retrieve and
extract information from digital text.
http://blogs.plos.org/everyone/2013/04/17/announcing-the-plos-text-mining-collection/
It uses powerful computers to find links between
drugs and side effects, or genes and diseases,
that are hidden within the vast scientific literature.
These are discoveries that a person scouring
through papers one by one may never notice.
http://www.theguardian.com/science/2012/may/23/text-mining-research-tool-forbidden
http://www.jisc.ac.uk/media/documents/publications/textminingbp_rtf.rtf
Marc Weeber and colleagues used automated text mining tools to infer that the drug
thalidomide could treat several diseases it had not been associated with before. Thalidomide was
taken off the market 40 years ago, but is still the subject of research because it seems to benefit
leprosy patients via their immune systems. Weeber and Grietje Molema, an immunologist, used
text mining tools to search the literature for papers on thalidomide and then pick out those
containing concepts related to immunology. One concept, concerning thalidomide’s ability to
inhibit Interleukin-12 (IL-12), a chemical involved in the launch of an immune response, struck
Molema as particularly interesting. A second automated search for diseases that improve when
the action of IL-12 is blocked, revealed several not previously linked with thalidomide, including
chronic hepatitis, myasthenia gravis and a type of gastritis.
“Type in thalidomide and you get 2-3000 hits. Type in disease and you get 40,000 hits. With
automated text mining tools we only had to read 100-200 abstracts and 20 or 30 full papers.
We’ve created hypotheses for others to follow up” says Weeber.
Weeber et al. J Am Med Inform Assoc. 2003 10 252-259
http://www.forbes.com/sites/stevensalzberg/2014/03/23/why-google-flu-is-a-failure/
Why?
• Researchers find it impractical to negotiate multiple bilateral
agreements with hundreds of subscription-based publishers in
order to authorize TDM of subscribed content.
• Subscription-based publishers find it impractical to negotiate
multiple bilateral agreements with thousands of researchers and
institutions in order to authorize TDM of subscribed content.
• All parties would benefit from support of standard APIs and data
representations in order to enable TDM across both open access and
subscription-based publishers.
Botanical Publishing Board * Fisheries Sciences.Com * Florida
Entomological Society * Fondazione Annali Die Matematica Pura Ed
Applicata * Fondazione Eni Enrico Mattei (Feem) * Fondazione Pro
Herbario Mediterraneo * Food And Agriculture Organization Of
The United Nations (Fao) * Food Safety Commission, Cabinet
Office * Foot And Ankle Online Journal * Fordham University Press
* Forest Products Society * Forschungsinstitut Freie Berufe *
Forum: Carbohydrates Coming Of Age * Foundation Compositio
Mathematica * Foundation For Cellular And Molecular Medicine *
Foundation For Sickle Cell Disease Research * Foundation Of
Computer Science * Franco Angeli * Fraunhofer-Institut Fur
Materialfluss Und Logistik * French Chemistry Society * French
Physical Society * French-Vietnamese Association Of Pulmonology
Using the DOI as the basis for a common text and data mining API provides several
benefits. For example, the DOI provides:
•An easy way to de-duplicate documents that may be found on several sites.
•Persistent provenance information.
•An easy way to document, share and compare corpora without having to exchange
the actual documents
•A mechanism to ensure the reproducibility of TDM results using the source
documents.
•A mechanism to track the impact of updates, corrections retractions and
withdrawals on corpora.
Why use the DOI?
The TDM Workflow
Step 1: A researcher identifies the articles they are interested in:
The search engines they use bring back results from lots of different publishers. They can also
use Crossref to search.
The searches they run bring back results showing publications from a range of publishers, in
different locations and using different business models.
The challenge is to harvest all these articles in order to be able to mine them, without
engaging in individual transactions with each publisher.
How to do that?
Each of those articles has a DOI, or digital object identifier. Each DOI is unique and identifies the
paper. Researchers are familiar with DOIs and are used to working with them.
2. The researcher takes the DOIs that correspond to the articles they are interested in.
Search engines will allow them to download this as a list, the researcher does not need to go to
each paper to extract the DOI from it:
10.5555/12345678
10.5556/12345679
10.1016/12345680
10.8080/12345681
10.1155/12345682
10.1100/12345683
10.5555/12345684
10.1007/12345685
10.1111/12345686
10.2406/12345687
10.3994/12345688
10.5006/12345689
Click to download
3. The researcher gives this list to the Crossref REST API:
And that tells them
Where the full-text is located What they are allowed to do with it
What are they are allowed to do with it?
This is communicated by license information that publishers give to Crossref.
Some publishers ask researchers to agree to an additional license to be able to use their content
for mining. Crossref TDM allows researchers to log in with their ORCID ID and can view and accept
publisher licenses all in one place:
Again, this saves multiple transactions on the part of the researcher.
The publishers do not charge researchers for this, and Crossref does not charge researchers
for the service.
4. The researcher uses that information to go directly to each publisher via Crossref. It is a central
channel for them visit thousands of publishers via one request or transaction.
Where they will be identified in a number of
ways:
No identification (Open Access content)
IP recognition/log in credentials
IP recognition/log in credentials + Crossref
token (API key) from the TDM service
5. The full-text is then returned to the researcher, and they can use their tools to mine it
Researchers
: Common
API
DOI Content
Negotiation
http://dx.doi.org/10.5555-12345678
(Accept: text/html)
http://dx.doi.org/10.5555-12345678
(Accept: application/bibjson+json)
Rate Limiting
(optional)
Crossref TDM HTTP Headers
CR-TDM-Rate-Limit: 1500
(the rate limit ceiling per window on requests)
CR-TDM-Rate-Limit-Remaining: 1387
(number of requests left for the current window)
CR-TDM-Rate-Limit-Reset: 1378072800
(the remaining time in UTC epoch seconds before the rate
limit resets and a new window is started)
*this is a technique used by many APIs, including Twitter’s
Common API Summary
• Content Negotiation (Required)
• New Metadata (Required)
• Full text URIs
• License URIs
• Rate Limiting Headers (optional)
New metadata
https://apps.crossref.org/docs/tdm/full-text-uris-technical-details/
1. Full-text links
https://apps.crossref.org/docs/tdm/license-uris-
technical-details/
2. License information
Example: Hindawi
<ai:program name="AccessIndicators">
<ai:license_ref>http://creativecommons.org/licenses/by/3.0/</ai:license_ref>
</ai:program>
<doi_data>
<doi>10.1155/2014/969265</doi>
<timestamp>20140401090031</timestamp>
<resource>http://www.hindawi.com/journals/aaa/2014/969265/</resource>
<collection property="text-mining">
<item>
<resource mime_type="application/pdf">
http://downloads.hindawi.com/journals/aaa/2014/969265.pdf
</resource>
</item>
<item>
<resource mime_type="application/xml">
http://downloads.hindawi.com/journals/aaa/2014/969265.xml
</resource>
</item>
Stop here if
• You are an open access publisher
• You include TDM as a part of your
subscription license/T&Cs.
Click-through
service
(optional)
Researcher
View
Publisher
View
Researcher queries DOI using CN + API token
Publisher verifies API token
If token verified AND access control allows,
publisher returns full text
(frequency at publisher discretion)
Benefits
• Streamlines researcher access to distributed full text for
TDM
• Enables machine-to-machine, automated access for
recognized TDM (i.e. researchers won’t be locked out of publisher
sites)
• Enables article-level licensing info and easy mechanism
for supplemental T&Cs for text and data mining
(publishers discussing model license via STM)
Implementation
Publishers
There are two additional metadata elements that publishers will need
to deposit to support TDM via CrossRef. These are:
•Full Text URIs: One or more URIs that point to full text
representations of the content identified by your CrossRef DOIs.
•License URIs: One or more URIs pointing at licenses that govern
how the full text content can be used.
•A .csv upload option is available to populate backfiles
•OPTIONAL: Add publisher TDM terms and conditions to the click-
through service
Researchers
• Modify TDM tools to make use of the API token
• Modify TDM tools to look for <lic_ref> elements
• Register with the click-through service and accept/decline
licenses (if applicable)
http://tdmsupport.crossref.org/
Publishers
Articles with full-text links and license information deposited: 15
million from over 200 DOI prefixes
Cost? Free to researchers and the public
No cost for publishers for 2015
Register interest at: http://www.crossref.org/tdm/contact_form.html
Usable as is:
https://blogs.nd.edu/emorgan/
https://github.com/ropensci/rcrossref
www.crossref.org
http://www.crossref.org/tdm/index.html
tdm@crossref.org
Thank you!

Contenu connexe

Tendances

CrossRef System Update
CrossRef System UpdateCrossRef System Update
CrossRef System UpdateCrossref
 
Understanding Crossref Metadata
Understanding Crossref MetadataUnderstanding Crossref Metadata
Understanding Crossref MetadataCrossref
 
introduction to crossmark lastest
introduction to crossmark lastestintroduction to crossmark lastest
introduction to crossmark lastestCrossref
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data MiningCrossref
 
Chuck Koscher: The Metadata Engine #crossref15
Chuck Koscher: The Metadata Engine #crossref15Chuck Koscher: The Metadata Engine #crossref15
Chuck Koscher: The Metadata Engine #crossref15Crossref
 
Beyond openurl
Beyond openurlBeyond openurl
Beyond openurlCrossref
 
CrossMark How To
CrossMark How ToCrossMark How To
CrossMark How ToCrossref
 
Managing errata and retractions with CrossMark
Managing errata and retractions with CrossMarkManaging errata and retractions with CrossMark
Managing errata and retractions with CrossMarkCrossref
 
Maintaining your metadata
Maintaining your metadataMaintaining your metadata
Maintaining your metadataCrossref
 
Getting started with Reference Linking
Getting started with Reference LinkingGetting started with Reference Linking
Getting started with Reference LinkingCrossref
 
Multiple Resolution and handling content available in multiple places
Multiple Resolution and handling content available in multiple placesMultiple Resolution and handling content available in multiple places
Multiple Resolution and handling content available in multiple placesCrossref
 
FundRef Update - Charleston Conference 2013
FundRef Update - Charleston Conference 2013FundRef Update - Charleston Conference 2013
FundRef Update - Charleston Conference 2013Chris Shillum
 
Managing plagiarism: Similarity Check
Managing plagiarism: Similarity CheckManaging plagiarism: Similarity Check
Managing plagiarism: Similarity CheckCrossref
 
Geoffrey Bilder: Strategic Initiatives Update #crossref15
Geoffrey Bilder: Strategic Initiatives Update #crossref15Geoffrey Bilder: Strategic Initiatives Update #crossref15
Geoffrey Bilder: Strategic Initiatives Update #crossref15Crossref
 
crossmark update
crossmark updatecrossmark update
crossmark updateCrossref
 
New member webinar 052418
New member webinar 052418New member webinar 052418
New member webinar 052418Crossref
 
Crossref Metadata and Metadata Services
Crossref Metadata and Metadata ServicesCrossref Metadata and Metadata Services
Crossref Metadata and Metadata ServicesCrossref
 
Crossref/OASPA Publishers
Crossref/OASPA PublishersCrossref/OASPA Publishers
Crossref/OASPA PublishersCrossref
 
2013 CrossRef Annual Meeting System Update Chuck Koscher
2013 CrossRef Annual Meeting System Update Chuck Koscher2013 CrossRef Annual Meeting System Update Chuck Koscher
2013 CrossRef Annual Meeting System Update Chuck KoscherCrossref
 
Update on Crossref Services - Rachael Lammey
Update on Crossref Services - Rachael LammeyUpdate on Crossref Services - Rachael Lammey
Update on Crossref Services - Rachael LammeyCrossref
 

Tendances (20)

CrossRef System Update
CrossRef System UpdateCrossRef System Update
CrossRef System Update
 
Understanding Crossref Metadata
Understanding Crossref MetadataUnderstanding Crossref Metadata
Understanding Crossref Metadata
 
introduction to crossmark lastest
introduction to crossmark lastestintroduction to crossmark lastest
introduction to crossmark lastest
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
 
Chuck Koscher: The Metadata Engine #crossref15
Chuck Koscher: The Metadata Engine #crossref15Chuck Koscher: The Metadata Engine #crossref15
Chuck Koscher: The Metadata Engine #crossref15
 
Beyond openurl
Beyond openurlBeyond openurl
Beyond openurl
 
CrossMark How To
CrossMark How ToCrossMark How To
CrossMark How To
 
Managing errata and retractions with CrossMark
Managing errata and retractions with CrossMarkManaging errata and retractions with CrossMark
Managing errata and retractions with CrossMark
 
Maintaining your metadata
Maintaining your metadataMaintaining your metadata
Maintaining your metadata
 
Getting started with Reference Linking
Getting started with Reference LinkingGetting started with Reference Linking
Getting started with Reference Linking
 
Multiple Resolution and handling content available in multiple places
Multiple Resolution and handling content available in multiple placesMultiple Resolution and handling content available in multiple places
Multiple Resolution and handling content available in multiple places
 
FundRef Update - Charleston Conference 2013
FundRef Update - Charleston Conference 2013FundRef Update - Charleston Conference 2013
FundRef Update - Charleston Conference 2013
 
Managing plagiarism: Similarity Check
Managing plagiarism: Similarity CheckManaging plagiarism: Similarity Check
Managing plagiarism: Similarity Check
 
Geoffrey Bilder: Strategic Initiatives Update #crossref15
Geoffrey Bilder: Strategic Initiatives Update #crossref15Geoffrey Bilder: Strategic Initiatives Update #crossref15
Geoffrey Bilder: Strategic Initiatives Update #crossref15
 
crossmark update
crossmark updatecrossmark update
crossmark update
 
New member webinar 052418
New member webinar 052418New member webinar 052418
New member webinar 052418
 
Crossref Metadata and Metadata Services
Crossref Metadata and Metadata ServicesCrossref Metadata and Metadata Services
Crossref Metadata and Metadata Services
 
Crossref/OASPA Publishers
Crossref/OASPA PublishersCrossref/OASPA Publishers
Crossref/OASPA Publishers
 
2013 CrossRef Annual Meeting System Update Chuck Koscher
2013 CrossRef Annual Meeting System Update Chuck Koscher2013 CrossRef Annual Meeting System Update Chuck Koscher
2013 CrossRef Annual Meeting System Update Chuck Koscher
 
Update on Crossref Services - Rachael Lammey
Update on Crossref Services - Rachael LammeyUpdate on Crossref Services - Rachael Lammey
Update on Crossref Services - Rachael Lammey
 

Similaire à Introduction to CrossRef Text and Data Mining Webinar

CrossRef Text and Data Mining
CrossRef Text and Data MiningCrossRef Text and Data Mining
CrossRef Text and Data MiningCrossref
 
CrossRef Text & Data Mining - UKSG 2015
CrossRef Text & Data Mining - UKSG 2015CrossRef Text & Data Mining - UKSG 2015
CrossRef Text & Data Mining - UKSG 2015Crossref
 
UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...
UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...
UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...UKSG: connecting the knowledge community
 
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey BilderCrossref
 
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Chris Shillum
 
Biocatalogue Talk Slides
Biocatalogue Talk SlidesBiocatalogue Talk Slides
Biocatalogue Talk SlidesBioCatalogue
 
Revelations about relations in connecting research: content types, data and i...
Revelations about relations in connecting research: content types, data and i...Revelations about relations in connecting research: content types, data and i...
Revelations about relations in connecting research: content types, data and i...Jisc
 
Multi-agent interactions on the Web through Linked Data Notifications
Multi-agent interactions on the Web through Linked Data NotificationsMulti-agent interactions on the Web through Linked Data Notifications
Multi-agent interactions on the Web through Linked Data NotificationsJean-Paul Calbimonte
 
OpenID Foundation Research & Education Working Group Update - October 22, 2018
OpenID Foundation Research & Education Working Group Update - October 22, 2018OpenID Foundation Research & Education Working Group Update - October 22, 2018
OpenID Foundation Research & Education Working Group Update - October 22, 2018OpenIDFoundation
 
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceWebinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceLucidworks
 
Crossref Content Registration - LIVE Mumbai
Crossref Content Registration - LIVE MumbaiCrossref Content Registration - LIVE Mumbai
Crossref Content Registration - LIVE MumbaiCrossref
 
VODAN Africa IN.pptx
VODAN Africa IN.pptxVODAN Africa IN.pptx
VODAN Africa IN.pptxGetu Tadele
 
Application integration with the W3C Linked Data standards
Application integration with the W3C Linked Data standardsApplication integration with the W3C Linked Data standards
Application integration with the W3C Linked Data standardsNandana Mihindukulasooriya
 
Internet and open source concepts
Internet and open source conceptsInternet and open source concepts
Internet and open source conceptsSachidananda M H
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsRobert Grossman
 
Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies LIBIS
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 

Similaire à Introduction to CrossRef Text and Data Mining Webinar (20)

CrossRef Text and Data Mining
CrossRef Text and Data MiningCrossRef Text and Data Mining
CrossRef Text and Data Mining
 
CrossRef Text & Data Mining - UKSG 2015
CrossRef Text & Data Mining - UKSG 2015CrossRef Text & Data Mining - UKSG 2015
CrossRef Text & Data Mining - UKSG 2015
 
UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...
UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...
UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Ra...
 
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
2013 CrossRef Workshops Text Data Mining Geoffrey Bilder
 
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
 
Biocatalogue Talk Slides
Biocatalogue Talk SlidesBiocatalogue Talk Slides
Biocatalogue Talk Slides
 
Revelations about relations in connecting research: content types, data and i...
Revelations about relations in connecting research: content types, data and i...Revelations about relations in connecting research: content types, data and i...
Revelations about relations in connecting research: content types, data and i...
 
Multi-agent interactions on the Web through Linked Data Notifications
Multi-agent interactions on the Web through Linked Data NotificationsMulti-agent interactions on the Web through Linked Data Notifications
Multi-agent interactions on the Web through Linked Data Notifications
 
OpenID Foundation Research & Education Working Group Update - October 22, 2018
OpenID Foundation Research & Education Working Group Update - October 22, 2018OpenID Foundation Research & Education Working Group Update - October 22, 2018
OpenID Foundation Research & Education Working Group Update - October 22, 2018
 
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceWebinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
 
Crossref Content Registration - LIVE Mumbai
Crossref Content Registration - LIVE MumbaiCrossref Content Registration - LIVE Mumbai
Crossref Content Registration - LIVE Mumbai
 
VODAN Africa IN.pptx
VODAN Africa IN.pptxVODAN Africa IN.pptx
VODAN Africa IN.pptx
 
Application integration with the W3C Linked Data standards
Application integration with the W3C Linked Data standardsApplication integration with the W3C Linked Data standards
Application integration with the W3C Linked Data standards
 
Internet and open source concepts
Internet and open source conceptsInternet and open source concepts
Internet and open source concepts
 
Why we need oa infrastructure - STM Association Beyond Open Access Seminar
Why we need oa infrastructure - STM Association Beyond Open Access SeminarWhy we need oa infrastructure - STM Association Beyond Open Access Seminar
Why we need oa infrastructure - STM Association Beyond Open Access Seminar
 
Roy "Accelerating ML/AI Based R&D through Text & Data Mining"
Roy "Accelerating ML/AI Based R&D through Text & Data Mining"Roy "Accelerating ML/AI Based R&D through Text & Data Mining"
Roy "Accelerating ML/AI Based R&D through Text & Data Mining"
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Digitisation and institutional repositories 2
Digitisation and institutional repositories 2Digitisation and institutional repositories 2
Digitisation and institutional repositories 2
 

Plus de Crossref

Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref
 
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021  Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021 Crossref
 
Seminario web ‘Crossmark’, en español
Seminario web ‘Crossmark’, en español Seminario web ‘Crossmark’, en español
Seminario web ‘Crossmark’, en español Crossref
 
Working with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to knowWorking with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to knowCrossref
 
Преимущества и варианты использования метаданных в Crossref / The Value and ...
Преимущества и варианты использования метаданных в Crossref /  The Value and ...Преимущества и варианты использования метаданных в Crossref /  The Value and ...
Преимущества и варианты использования метаданных в Crossref / The Value and ...Crossref
 
Seminario web ‘Similarity Check’, en español
Seminario web ‘Similarity Check’, en españolSeminario web ‘Similarity Check’, en español
Seminario web ‘Similarity Check’, en españolCrossref
 
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...Crossref
 
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...Crossref
 
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...Crossref
 
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...Crossref
 
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021Crossref
 
Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...
 Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ... Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...
Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...Crossref
 
Los Metadatos Para la Comunidad de Investigacion
Los Metadatos Para la Comunidad de InvestigacionLos Metadatos Para la Comunidad de Investigacion
Los Metadatos Para la Comunidad de InvestigacionCrossref
 
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...Crossref
 
Content Registration, Crossref ALJEBI, Indonesia
Content Registration, Crossref ALJEBI, IndonesiaContent Registration, Crossref ALJEBI, Indonesia
Content Registration, Crossref ALJEBI, IndonesiaCrossref
 
Participation reports webinar December 2020
Participation reports webinar December 2020Participation reports webinar December 2020
Participation reports webinar December 2020Crossref
 
Participation reports webinar November 2020
Participation reports webinar November 2020Participation reports webinar November 2020
Participation reports webinar November 2020Crossref
 
Introduction to Crossmark/Crossmark: O que é e como usar
Introduction to Crossmark/Crossmark: O que é e como usarIntroduction to Crossmark/Crossmark: O que é e como usar
Introduction to Crossmark/Crossmark: O que é e como usarCrossref
 
Crossref LIVE UK Online
Crossref LIVE UK OnlineCrossref LIVE UK Online
Crossref LIVE UK OnlineCrossref
 
Registro y actualización de contenido en Crossref | Content Registration at C...
Registro y actualización de contenido en Crossref | Content Registration at C...Registro y actualización de contenido en Crossref | Content Registration at C...
Registro y actualización de contenido en Crossref | Content Registration at C...Crossref
 

Plus de Crossref (20)

Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
 
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021  Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
Crossref LIVE Chinese网络研讨会——Crossref简介 – 14 Oct 2021
 
Seminario web ‘Crossmark’, en español
Seminario web ‘Crossmark’, en español Seminario web ‘Crossmark’, en español
Seminario web ‘Crossmark’, en español
 
Working with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to knowWorking with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to know
 
Преимущества и варианты использования метаданных в Crossref / The Value and ...
Преимущества и варианты использования метаданных в Crossref /  The Value and ...Преимущества и варианты использования метаданных в Crossref /  The Value and ...
Преимущества и варианты использования метаданных в Crossref / The Value and ...
 
Seminario web ‘Similarity Check’, en español
Seminario web ‘Similarity Check’, en españolSeminario web ‘Similarity Check’, en español
Seminario web ‘Similarity Check’, en español
 
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...
Crossref LIVE Indonesia: One Search Platform (Drs. Muhammad Syarif Bando pres...
 
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...
Crossref LIVE Indonesia: The Future of Indonesian Journal Policy (with Dr. Lu...
 
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
 
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...
Crossref LIVE Indonesia: Content Registration at Crossref, CRLIVE-ID 14 July ...
 
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021
Crossref LIVE Indonesia: An Introduction to Crossref, CRLIVE-ID 13 July 2021
 
Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...
 Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ... Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...
Crossref İçerik Kaydı Webinarı, Türkçe | Content Registration at Crossref , ...
 
Los Metadatos Para la Comunidad de Investigacion
Los Metadatos Para la Comunidad de InvestigacionLos Metadatos Para la Comunidad de Investigacion
Los Metadatos Para la Comunidad de Investigacion
 
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
تسجيل المحتوي مع كروس رف – ندوة عبر الانترنت باللغة العربية | Content Registr...
 
Content Registration, Crossref ALJEBI, Indonesia
Content Registration, Crossref ALJEBI, IndonesiaContent Registration, Crossref ALJEBI, Indonesia
Content Registration, Crossref ALJEBI, Indonesia
 
Participation reports webinar December 2020
Participation reports webinar December 2020Participation reports webinar December 2020
Participation reports webinar December 2020
 
Participation reports webinar November 2020
Participation reports webinar November 2020Participation reports webinar November 2020
Participation reports webinar November 2020
 
Introduction to Crossmark/Crossmark: O que é e como usar
Introduction to Crossmark/Crossmark: O que é e como usarIntroduction to Crossmark/Crossmark: O que é e como usar
Introduction to Crossmark/Crossmark: O que é e como usar
 
Crossref LIVE UK Online
Crossref LIVE UK OnlineCrossref LIVE UK Online
Crossref LIVE UK Online
 
Registro y actualización de contenido en Crossref | Content Registration at C...
Registro y actualización de contenido en Crossref | Content Registration at C...Registro y actualización de contenido en Crossref | Content Registration at C...
Registro y actualización de contenido en Crossref | Content Registration at C...
 

Dernier

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Dernier (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Introduction to CrossRef Text and Data Mining Webinar

  • 1. Crossref for Text & Data Mining Rachael Lammey Product Manager, CrossRef December 2015
  • 2. Not-for-profit association of scholarly publishers All subjects, all business models 5,000+ organizations from all over the world 83 non-publisher affiliates, 2000 library affiliates 76 million content items About Crossref
  • 3.
  • 5. User clicks on Crossref DOI reference link in Journal A Tani, N., N. Tomaru, M. Araki, AND K. Ohba. 1996. Genetic diversity and differentiation in populations of Japanese stone pine (Pinus pumila) in Japan. Canadian Journal of Forest Research 26: 1454–1462.[CrossRef] Crossref DOI directory returns URL User accesses cited article in Journal B
  • 7. Crossref Services • Cross-publisher reference linking • Cross-publisher Cited-by linking • Cross-publisher metadata feeds • Cross-publisher plagiarism screening • Cross-publisher update identification • Cross-publisher funder identification • Cross-publisher text and data mining
  • 9. What is text and data mining? Text Mining is an interdisciplinary field combining techniques from linguistics, computer science and statistics to build tools that can efficiently retrieve and extract information from digital text. http://blogs.plos.org/everyone/2013/04/17/announcing-the-plos-text-mining-collection/ It uses powerful computers to find links between drugs and side effects, or genes and diseases, that are hidden within the vast scientific literature. These are discoveries that a person scouring through papers one by one may never notice. http://www.theguardian.com/science/2012/may/23/text-mining-research-tool-forbidden
  • 10. http://www.jisc.ac.uk/media/documents/publications/textminingbp_rtf.rtf Marc Weeber and colleagues used automated text mining tools to infer that the drug thalidomide could treat several diseases it had not been associated with before. Thalidomide was taken off the market 40 years ago, but is still the subject of research because it seems to benefit leprosy patients via their immune systems. Weeber and Grietje Molema, an immunologist, used text mining tools to search the literature for papers on thalidomide and then pick out those containing concepts related to immunology. One concept, concerning thalidomide’s ability to inhibit Interleukin-12 (IL-12), a chemical involved in the launch of an immune response, struck Molema as particularly interesting. A second automated search for diseases that improve when the action of IL-12 is blocked, revealed several not previously linked with thalidomide, including chronic hepatitis, myasthenia gravis and a type of gastritis. “Type in thalidomide and you get 2-3000 hits. Type in disease and you get 40,000 hits. With automated text mining tools we only had to read 100-200 abstracts and 20 or 30 full papers. We’ve created hypotheses for others to follow up” says Weeber. Weeber et al. J Am Med Inform Assoc. 2003 10 252-259
  • 12. Why? • Researchers find it impractical to negotiate multiple bilateral agreements with hundreds of subscription-based publishers in order to authorize TDM of subscribed content. • Subscription-based publishers find it impractical to negotiate multiple bilateral agreements with thousands of researchers and institutions in order to authorize TDM of subscribed content. • All parties would benefit from support of standard APIs and data representations in order to enable TDM across both open access and subscription-based publishers.
  • 13. Botanical Publishing Board * Fisheries Sciences.Com * Florida Entomological Society * Fondazione Annali Die Matematica Pura Ed Applicata * Fondazione Eni Enrico Mattei (Feem) * Fondazione Pro Herbario Mediterraneo * Food And Agriculture Organization Of The United Nations (Fao) * Food Safety Commission, Cabinet Office * Foot And Ankle Online Journal * Fordham University Press * Forest Products Society * Forschungsinstitut Freie Berufe * Forum: Carbohydrates Coming Of Age * Foundation Compositio Mathematica * Foundation For Cellular And Molecular Medicine * Foundation For Sickle Cell Disease Research * Foundation Of Computer Science * Franco Angeli * Fraunhofer-Institut Fur Materialfluss Und Logistik * French Chemistry Society * French Physical Society * French-Vietnamese Association Of Pulmonology
  • 14.
  • 15. Using the DOI as the basis for a common text and data mining API provides several benefits. For example, the DOI provides: •An easy way to de-duplicate documents that may be found on several sites. •Persistent provenance information. •An easy way to document, share and compare corpora without having to exchange the actual documents •A mechanism to ensure the reproducibility of TDM results using the source documents. •A mechanism to track the impact of updates, corrections retractions and withdrawals on corpora. Why use the DOI?
  • 17. Step 1: A researcher identifies the articles they are interested in: The search engines they use bring back results from lots of different publishers. They can also use Crossref to search.
  • 18. The searches they run bring back results showing publications from a range of publishers, in different locations and using different business models. The challenge is to harvest all these articles in order to be able to mine them, without engaging in individual transactions with each publisher.
  • 19. How to do that? Each of those articles has a DOI, or digital object identifier. Each DOI is unique and identifies the paper. Researchers are familiar with DOIs and are used to working with them.
  • 20. 2. The researcher takes the DOIs that correspond to the articles they are interested in. Search engines will allow them to download this as a list, the researcher does not need to go to each paper to extract the DOI from it: 10.5555/12345678 10.5556/12345679 10.1016/12345680 10.8080/12345681 10.1155/12345682 10.1100/12345683 10.5555/12345684 10.1007/12345685 10.1111/12345686 10.2406/12345687 10.3994/12345688 10.5006/12345689 Click to download
  • 21. 3. The researcher gives this list to the Crossref REST API: And that tells them Where the full-text is located What they are allowed to do with it
  • 22. What are they are allowed to do with it? This is communicated by license information that publishers give to Crossref. Some publishers ask researchers to agree to an additional license to be able to use their content for mining. Crossref TDM allows researchers to log in with their ORCID ID and can view and accept publisher licenses all in one place: Again, this saves multiple transactions on the part of the researcher. The publishers do not charge researchers for this, and Crossref does not charge researchers for the service.
  • 23. 4. The researcher uses that information to go directly to each publisher via Crossref. It is a central channel for them visit thousands of publishers via one request or transaction. Where they will be identified in a number of ways: No identification (Open Access content) IP recognition/log in credentials IP recognition/log in credentials + Crossref token (API key) from the TDM service
  • 24. 5. The full-text is then returned to the researcher, and they can use their tools to mine it
  • 30. Crossref TDM HTTP Headers CR-TDM-Rate-Limit: 1500 (the rate limit ceiling per window on requests) CR-TDM-Rate-Limit-Remaining: 1387 (number of requests left for the current window) CR-TDM-Rate-Limit-Reset: 1378072800 (the remaining time in UTC epoch seconds before the rate limit resets and a new window is started) *this is a technique used by many APIs, including Twitter’s
  • 31. Common API Summary • Content Negotiation (Required) • New Metadata (Required) • Full text URIs • License URIs • Rate Limiting Headers (optional)
  • 35. Example: Hindawi <ai:program name="AccessIndicators"> <ai:license_ref>http://creativecommons.org/licenses/by/3.0/</ai:license_ref> </ai:program> <doi_data> <doi>10.1155/2014/969265</doi> <timestamp>20140401090031</timestamp> <resource>http://www.hindawi.com/journals/aaa/2014/969265/</resource> <collection property="text-mining"> <item> <resource mime_type="application/pdf"> http://downloads.hindawi.com/journals/aaa/2014/969265.pdf </resource> </item> <item> <resource mime_type="application/xml"> http://downloads.hindawi.com/journals/aaa/2014/969265.xml </resource> </item>
  • 36. Stop here if • You are an open access publisher • You include TDM as a part of your subscription license/T&Cs.
  • 40. Researcher queries DOI using CN + API token Publisher verifies API token If token verified AND access control allows, publisher returns full text (frequency at publisher discretion)
  • 41. Benefits • Streamlines researcher access to distributed full text for TDM • Enables machine-to-machine, automated access for recognized TDM (i.e. researchers won’t be locked out of publisher sites) • Enables article-level licensing info and easy mechanism for supplemental T&Cs for text and data mining (publishers discussing model license via STM)
  • 43. Publishers There are two additional metadata elements that publishers will need to deposit to support TDM via CrossRef. These are: •Full Text URIs: One or more URIs that point to full text representations of the content identified by your CrossRef DOIs. •License URIs: One or more URIs pointing at licenses that govern how the full text content can be used. •A .csv upload option is available to populate backfiles •OPTIONAL: Add publisher TDM terms and conditions to the click- through service
  • 44. Researchers • Modify TDM tools to make use of the API token • Modify TDM tools to look for <lic_ref> elements • Register with the click-through service and accept/decline licenses (if applicable)
  • 46. Publishers Articles with full-text links and license information deposited: 15 million from over 200 DOI prefixes Cost? Free to researchers and the public No cost for publishers for 2015 Register interest at: http://www.crossref.org/tdm/contact_form.html

Notes de l'éditeur

  1. Questions at end. Talk a little bit about what CrossRef is then move on to talk about our text and data mining service.
  2. First just a few words about CrossRef for anyone who isn’t a member or might not be familiar with us as an organisation. CrossRef is a not-for-profit membership organisation of international scholarly publishers. We have 4000 member publishers, representing all disciplines - not just STM, and comprising commercial publishers, academic societies, open access publishers, university presses. We also have 83 affiliate members and 2000 library affiliates - these libraries and other organisations make use of the CrossRef database to look up DOIs and metadata. We are the largest DOI registration agency and have assigned nearly 63 million DOIs to date.
  3. CrossRef was founded 15 years ago to solve the problem of broken links. The web is all about links, but links break. This is annoying if you’re browsing the web and want to follow an interesting link, but in the context of scholarly publishing it becomes more than annoying - if you can’t follow a citation from one paper to another you’re being hampered in your research. CItation linking is one of the greatest benefits of online publishing, but it really does need to be reliable
  4. and publishers were finding that web sites changed, content moved, and links that they had put into their articles stopped working. So they started a multi-publisher initiative to solve this problem of broken links. This is done using the DOI - the Digital Object Identifier, which I’m sure many of you are familiar with. A CrossRef DOI is simply a unique identifier for a piece of content. Once assigned, it doesn’t change. It is to all intents and purposes a meaningless number, but it allows that piece of content to be located on the web.
  5. And it works like this: publishers use CrossRef DOIs to link to content, usually from the references at the end of articles. Users click on those DOI-based links and are referred via the CrossRef database to the cited article at it’s correct location on the web. If content moves the publisher only has to update the CrossRef database once, and all of the publishers that are linking to their content using CrossRef DOIs will be redirected to the content in its new location.
  6. Every month there are around 100 million clicks on CrossRef DOI links, so 100 million citations resolved to content.
  7. The issue of Text and Data Mining has become very important and we feel that CrossRef is in a unique position to expand its current infrastructure (a registry of unique identifiers and metadata for scholarly content and thousands of members) to make TDM easier for researchers and their institutions and publishers. Technical solution - we aren’t addressing the issue of licencing.
  8. Looking at positives. Finding treatments to diseases that may not have been found before.
  9. But urge caution – Google Flu!
  10. Why did CrossRef develop this service? Applies to OA content too. Let’s just illustrate these issues.
  11. Researcher to illustrate that plus some of the publishers we represent. TDM is about scale.
  12. Bilateral agreements aspect - In the past, researchers who wish to text and data mine published literature have no common or simple way of accessing the full text for the content they wish to mine. This is true both of subscription-based content as well as of open access content. Consequently, TDM users access the content in one or two ways: Negotiating with publishers to have the content delivered to them, either via physical media or bulk data transfer (e.g. FTP) “Screen-scraping” the publisher’s website. The first option doesn’t scale well across multiple Publishers and Researchers. It also presents synchronisation problems if the researchers want an ongoing feed of refreshed content. The issue with the second option is that “screen scraping” is an inefficient, fragile and error prone mechanism for identifying and downloading full text. Screen scrapers put a large performance burden on web sites and, at the same time, any slight changes to the web site can break the tool that is doing the screen scraping. CrossRef Text and Data Mining provides a common solution which works across Open Access and subscription-based publishers and is free for anyone to use.
  13. Processing the same document on multiple sites could easily skew text and data mining results and traditional techniques for eliminating duplicates (e.g. hashes, etc.) will not work reliably if the document in question exists in several representations (e.g. PDF, HTML, ePub ) and/or versions (e.g. accepted manuscript, version of record) Using the DOI as a key will allow researchers to retrieve and verify the provenance of the items in the TDM corpus, many years into the future when traditional HTTP URLs will have already broken
  14. Wide range of papers from a wide range of publishers – spread of business models and geographical locations.
  15. Explain API = basically an interface that software uses to interact with other software.
  16. I should be able to show a text extraction tool or a clip of an extraction tool working to convert PDF to XML for the purposes of mining.
  17. The CrossRef Common API is the main aspect of this service and is designed to allow researchers to easily harvest full text documents from all participating publishers regardless of their business model (e.g. open access, subscription). It makes use of CrossRef DOI content negotiation to provide researchers with links to the full text of content located on the publisher’s site. The publisher remains responsible for actually delivering the full text of the content requested. Thus, open access publishers can simply deliver the requested content while subscription based publishers continue to support subscriptions using their existing access control systems.
  18. API works with content negotiation – what is content negotiation
  19. Content negotiation allows a user to request a particular representation of a web resource. DOI resolvers use content negotation to provide different representations of metadata associated with DOIs.A content negotiated request to a DOI resolver is much like a standard HTTP request, except server-driven negotiation will take place based on the list of acceptable content types a client provides. Here, they’re asking for text
  20. Here they’re asking for XML – and can also request PDF too as we know a lot of publishers may only have back content in PDF and that’s fine.
  21. Set of standard HTTP headers that can be used by servers to convey rate-limiting information to automated TDM tools. Well-behaved TDM tools can simply look for these headers when they query publisher sites in order to understand how best to adjust their behaviour so as not to effect the performance of the site. The headers allow a publisher to define a “rate limit window”- which is basically a time span (e.g. a minute, and hour, a day).
  22. In order for researchers to use the CrossRef API, Publishers need to add new metadata to their CrossRef DOI deposits.
  23. One or more URIs pointing at licenses that govern how the full text content can be used.
  24. This needs to be added to the publisher XML – license information at the article-level. Examples on our support site.
  25. Publishers who require researchers to agree to a specific set of Terms and Conditions (T&amp;Cs) before they are allowed to text and data mine content that they otherwise have access to (e.g. through an existing subscription) will need to make use of the click-through service.
  26. So to put it all together…
  27. If you are an open access publisher or if your existing subscription licenses already allow TDM of subscribed full text, then the registration of the above metadata deposit is the ONLY thing you need to do in order to enable TDM of your content via the CrossRef Metadata API. Rate limiting.
  28. Rate limiting too
  29. Support site with info. Info on rate limiting on there too.
  30. Working group which will migrate to a full CrossRef Committee when the service is officially launched seen over 100,000 deposits of full text links and license information, mainly from Hindawi but some from AIP and IEEE as well.
  31. Eric Lease Morgan
  32. Publishers and researchers in pilot. Launch in May