SlideShare une entreprise Scribd logo
1  sur  20
How can repositories support the
text-mining of their content and
why?
@openminted_eu
Dr. Petr Knoth and Dr. Nancy Pontika
Knowledge Media institute, The Open University
United Kingdom
Twitter: @oacore
Why should repositories
support TDM?
@openminted_eu
@openminted_eu
In the UK
Repositories and TDM
@openminted_eu
Institutional
Repositories
Subject
Repositories
Publishers/
OA journals
Other sources:
Research
Networking
Services
Primary Research
Data...
Text Mining Services
TDM & Repositories
Managers
@openminted_eu
• Established and maintain a close collaboration with
researchers
• Extensive experience in advocacy, i.e. open access
• Knowledgeable about the repository’s collection
• Participate in the Academic Institution’s Research
Committees
• Knowledgeable of your repository’s collection
• Familiarity with Copyright issues and Creative Commons
Licenses
How can repositories support
TDM?
TDM is all about processing text and data at
scale. The role of repositories is to facilitate the
aggregation of research papers at a full-text level
(and beyond) effectively enabling TDM services
to operate seamlessly on all available research
content.
7
What is the problem?
@openminted_eu
• A small study (Knoth, 2013)
• 83 repositories - mainly Eprints with PDF research
outputs
• 1,461,016 metadata records
metadata linked
to content
content
downloadable
content
machine
readable
Mean 54.1% 34.4% 27.6%
Median 39.5% 16.7% 13.0%
Standard
deviation
39.2% 34.2% 31.0%
How is content aggregated
today?
@openminted_eu
• DC over OAI-PMH: vast majority of repositories, never
intended to support content harvesting. The main problem:
linking metadata with content.
“The nature of a resource identifier is outside the scope of the OAI-
PMH. To facilitate access to the resource associated with harvested
metadata, repositories should use an element in metadata records
to establish a linkage between the record (and the identifier of its
item) and the identifier (URL, URN, DOI, etc.) of the associated
resource. The mandatory Dublin Core format provides the identifier
element that should be used for this purpose.”
How is content aggregated
today?
@openminted_eu
• RIOXX: Just one identifier, recommends the identifier
points to the actual resource being described.
• OpenAIRE Guidelines: identifier links to either the
resource or a jump-off page. Does allow multiple
identifiers.
• ResourceSync
• CrossRef: comercial publishers/journals
The content referencing
problem
@openminted_eu
Principle 1: content
referencing
Repositories should always establish a link from
the metadata record to the item the metadata
record describes using a dereferencable identifier
pointing to the version held locally in the
repository. The dereferencable identifier should
be provided in the appropriate metadata element
in the used metadata format (i.e. dc:identifier in
the case of Dublin Core). If multiple identifiers are
used, it is recommended listing the local
dereferencable identifier first.
1
The accessibility of
repositories to harvesting
systems
@openminted_eu
Principle 2: Content
accessibility to machines
Repositories must provide universal access to
machines with the same level of access as
humans have. It is the role of repositories to
allow aggregators to harvest the entire content of
the repository in a reasonable time to enable
acquiring and maintain up-to-date information
about the repository content.
1
What can repositories do?
@openminted_eu
• Ensure correct referencing of content from metadata:
• Dereferencable link which resolves to content
• Locally held (content under its control)
• Using a standard repository platform can help
• Check robots.txt
• Register your repository
• Advocate for good pdf (media) quality of deposited content
• Use monitoring tools
• CORE Repository Dashboard
• OpenAIRE Repository Manager Dashboard
• Machine readable licensing
beyond Open Access
MAKING SENSE OF
LARGE VOLUMES OF
SCIENTIFIC CONTENT
1
Interested in how to TDM
research papers?
@openminted_eu
We have 3 more
talks tomorrow!
Developer track 1, 11:00
Mining Open Access
publications with CORE
Interested in how to TDM
research papers?
@openminted_eu
We have 3 more
talks tomorrow!
Developer track 1, 11:20
Oxford vs Cambridge
Contest: Collecting Open
Research Evaluation
Metrics for University
Ranking
Interested in how to TDM
research papers?
@openminted_eu
We have 3 more
talks tomorrow!
Papers 4, 4:00
Exploring
Semantometrics:
full text-based
research
evaluation for
open repositories
Thank you
Dr. Pert Knoth,, Research Fellow
petr.knoth@open.ac.uk
Dr. Nancy Pontika, Open Access Aggregation
Officer
nancy.pontika@open.ac.uk
.
2

Contenu connexe

Tendances

Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
OpenAIRE
 
OAI and Publishers’ metadata: Using the static repositories approach to discl...
OAI and Publishers’ metadata: Using the static repositories approach to discl...OAI and Publishers’ metadata: Using the static repositories approach to discl...
OAI and Publishers’ metadata: Using the static repositories approach to discl...
R. John Robertson
 

Tendances (20)

Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]
 
Using OpenURL Activity Data Project 03 Aug 2011
Using OpenURL Activity Data Project 03 Aug 2011Using OpenURL Activity Data Project 03 Aug 2011
Using OpenURL Activity Data Project 03 Aug 2011
 
Jisc on repositories unleashing data - Daniela Duca
Jisc on repositories unleashing data - Daniela DucaJisc on repositories unleashing data - Daniela Duca
Jisc on repositories unleashing data - Daniela Duca
 
Unlocking Thesis Data - Stephen Grace, University of East London
Unlocking Thesis Data - Stephen Grace, University of East LondonUnlocking Thesis Data - Stephen Grace, University of East London
Unlocking Thesis Data - Stephen Grace, University of East London
 
OpenAIRE compatibility for repositories - Webinar on the OpenAIRE Guidelines
OpenAIRE compatibility for repositories - Webinar on the OpenAIRE GuidelinesOpenAIRE compatibility for repositories - Webinar on the OpenAIRE Guidelines
OpenAIRE compatibility for repositories - Webinar on the OpenAIRE Guidelines
 
OpenAIRE Broker Service and the Dashboard for Content Providers
OpenAIRE Broker Service and the Dashboard for Content ProvidersOpenAIRE Broker Service and the Dashboard for Content Providers
OpenAIRE Broker Service and the Dashboard for Content Providers
 
Open Access: funders' policies and recent updates
Open Access: funders' policies and recent updatesOpen Access: funders' policies and recent updates
Open Access: funders' policies and recent updates
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
 
OpenAIRE presentation - Open Access Week 2014 @EKT Conference (Greece)
OpenAIRE presentation - Open Access Week 2014 @EKT Conference (Greece)OpenAIRE presentation - Open Access Week 2014 @EKT Conference (Greece)
OpenAIRE presentation - Open Access Week 2014 @EKT Conference (Greece)
 
COBWEB Project Status
COBWEB Project StatusCOBWEB Project Status
COBWEB Project Status
 
The Tropical Rain Forest Information Center
The Tropical Rain Forest Information CenterThe Tropical Rain Forest Information Center
The Tropical Rain Forest Information Center
 
Embl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishEmbl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildish
 
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
 
OpenAIRE Dashboard for Content Providers: Open Science as-a-Service for liter...
OpenAIRE Dashboard for Content Providers: Open Science as-a-Service for liter...OpenAIRE Dashboard for Content Providers: Open Science as-a-Service for liter...
OpenAIRE Dashboard for Content Providers: Open Science as-a-Service for liter...
 
CORE Repositories Dashboard
CORE Repositories DashboardCORE Repositories Dashboard
CORE Repositories Dashboard
 
Storage for research-data webinar - Deakin University
Storage for research-data webinar - Deakin UniversityStorage for research-data webinar - Deakin University
Storage for research-data webinar - Deakin University
 
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
 
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...
 
OAI and Publishers’ metadata: Using the static repositories approach to discl...
OAI and Publishers’ metadata: Using the static repositories approach to discl...OAI and Publishers’ metadata: Using the static repositories approach to discl...
OAI and Publishers’ metadata: Using the static repositories approach to discl...
 
Voa3r Identification Analysis Technical Requirements
Voa3r Identification Analysis Technical RequirementsVoa3r Identification Analysis Technical Requirements
Voa3r Identification Analysis Technical Requirements
 

En vedette

En vedette (16)

Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century Research
 
Open Access Publishing, Threat or Opportunity?
Open Access Publishing, Threat or Opportunity?Open Access Publishing, Threat or Opportunity?
Open Access Publishing, Threat or Opportunity?
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
 
The PLUTo project @iEvoBio 2014
The PLUTo project @iEvoBio 2014The PLUTo project @iEvoBio 2014
The PLUTo project @iEvoBio 2014
 
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
Subscription costs versus open access costs, & Dissolving journals' boundaries
Subscription costs versus open access costs, & Dissolving journals' boundariesSubscription costs versus open access costs, & Dissolving journals' boundaries
Subscription costs versus open access costs, & Dissolving journals' boundaries
 
SocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meetingSocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meeting
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
 
Open Access for Early Career Researchers
Open Access for Early Career ResearchersOpen Access for Early Career Researchers
Open Access for Early Career Researchers
 
Research publication support for scholars in Brazil: Rising to the challenge
Research publication support for scholars in Brazil: Rising to the challengeResearch publication support for scholars in Brazil: Rising to the challenge
Research publication support for scholars in Brazil: Rising to the challenge
 
Open Access: Which Side Are You On
Open Access: Which Side Are You OnOpen Access: Which Side Are You On
Open Access: Which Side Are You On
 
Fifty shades of green and gold: open access to scholarly information
Fifty shades of green and gold: open access to scholarly informationFifty shades of green and gold: open access to scholarly information
Fifty shades of green and gold: open access to scholarly information
 

Similaire à How can repositories support the text-mining of their content and why?

How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
openminted_eu
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publications
petrknoth
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
Open Science Fair
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
floyd taag
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
floyd taag
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
floyd taag
 
-Open Archives Initiatives(final)
-Open Archives Initiatives(final)-Open Archives Initiatives(final)
-Open Archives Initiatives(final)
floyd taag
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...
petrknoth
 

Similaire à How can repositories support the text-mining of their content and why? (20)

How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
 
A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publications
 
OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...
 
Next generation repositories
Next generation repositoriesNext generation repositories
Next generation repositories
 
Patham "NISO-ODI (Open Discovery Initiative) Standards Update"
Patham "NISO-ODI (Open Discovery Initiative) Standards Update"Patham "NISO-ODI (Open Discovery Initiative) Standards Update"
Patham "NISO-ODI (Open Discovery Initiative) Standards Update"
 
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
-Open Archives Initiatives(final)
-Open Archives Initiatives(final)-Open Archives Initiatives(final)
-Open Archives Initiatives(final)
 
Open archives initiatives(final)
 Open archives initiatives(final) Open archives initiatives(final)
Open archives initiatives(final)
 
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
 
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
 
OA Repositories for DE in Myanmar presentation
OA Repositories for DE in Myanmar presentationOA Repositories for DE in Myanmar presentation
OA Repositories for DE in Myanmar presentation
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...
 
Open Science, Open Data: towards a new transparent and reproducible ecosystem
Open Science, Open Data:   towards a new transparent and reproducible ecosystemOpen Science, Open Data:   towards a new transparent and reproducible ecosystem
Open Science, Open Data: towards a new transparent and reproducible ecosystem
 

Plus de Nancy Pontika

Plus de Nancy Pontika (19)

Closing the scientific literature access gap with CORE - how to gain free acc...
Closing the scientific literature access gap with CORE - how to gain free acc...Closing the scientific literature access gap with CORE - how to gain free acc...
Closing the scientific literature access gap with CORE - how to gain free acc...
 
The future of scholarly communications professionals
The future of scholarly communications professionalsThe future of scholarly communications professionals
The future of scholarly communications professionals
 
CORE: Recommender and Publisher Connector
CORE: Recommender and Publisher Connector CORE: Recommender and Publisher Connector
CORE: Recommender and Publisher Connector
 
CORE Recommender: a plug in suggesting open access content
CORE Recommender: a plug in suggesting open access contentCORE Recommender: a plug in suggesting open access content
CORE Recommender: a plug in suggesting open access content
 
General introduction to Open Data Policies H2020, influence of OD policies on...
General introduction to Open Data Policies H2020, influence of OD policies on...General introduction to Open Data Policies H2020, influence of OD policies on...
General introduction to Open Data Policies H2020, influence of OD policies on...
 
Open Science: Tools and platforms
Open Science: Tools and platformsOpen Science: Tools and platforms
Open Science: Tools and platforms
 
Understanding Open Science: Definitions and framework
Understanding Open Science: Definitions and framework Understanding Open Science: Definitions and framework
Understanding Open Science: Definitions and framework
 
What is Open Science
What is Open ScienceWhat is Open Science
What is Open Science
 
Open Science, Why not?
Open Science, Why not?Open Science, Why not?
Open Science, Why not?
 
Open Science: Application and Benefits
Open Science: Application and BenefitsOpen Science: Application and Benefits
Open Science: Application and Benefits
 
Fostering Open Science to Research Using a Taxonomy and an eLearning Portal
Fostering Open Science to Research Using a Taxonomy and an eLearning PortalFostering Open Science to Research Using a Taxonomy and an eLearning Portal
Fostering Open Science to Research Using a Taxonomy and an eLearning Portal
 
Benefits of Open Access to Early Career Researchers
Benefits of Open Access to Early Career Researchers Benefits of Open Access to Early Career Researchers
Benefits of Open Access to Early Career Researchers
 
What young researchers can do to promote open access
What young researchers can do to promote open accessWhat young researchers can do to promote open access
What young researchers can do to promote open access
 
Developing Infrastructure to Support Closer Collaboration of Aggregators with...
Developing Infrastructure to Support Closer Collaboration of Aggregators with...Developing Infrastructure to Support Closer Collaboration of Aggregators with...
Developing Infrastructure to Support Closer Collaboration of Aggregators with...
 
Putting Open Access into Practice
Putting Open Access into Practice Putting Open Access into Practice
Putting Open Access into Practice
 
Reusing Open Access content & HEFCE policy on Open Access
 Reusing Open Access content & HEFCE policy on Open Access Reusing Open Access content & HEFCE policy on Open Access
Reusing Open Access content & HEFCE policy on Open Access
 
REF2020 and Open Access : How to comply?
REF2020 and Open Access : How to comply?REF2020 and Open Access : How to comply?
REF2020 and Open Access : How to comply?
 
Managing Open Access in the Library
Managing Open Access in the Library Managing Open Access in the Library
Managing Open Access in the Library
 
Open Access Publishing: Understanding the implications for the Arts and Human...
Open Access Publishing: Understanding the implications for the Arts and Human...Open Access Publishing: Understanding the implications for the Arts and Human...
Open Access Publishing: Understanding the implications for the Arts and Human...
 

Dernier

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 

Dernier (20)

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 

How can repositories support the text-mining of their content and why?

  • 1. How can repositories support the text-mining of their content and why? @openminted_eu Dr. Petr Knoth and Dr. Nancy Pontika Knowledge Media institute, The Open University United Kingdom Twitter: @oacore
  • 2. Why should repositories support TDM? @openminted_eu
  • 4. Repositories and TDM @openminted_eu Institutional Repositories Subject Repositories Publishers/ OA journals Other sources: Research Networking Services Primary Research Data... Text Mining Services
  • 5.
  • 6. TDM & Repositories Managers @openminted_eu • Established and maintain a close collaboration with researchers • Extensive experience in advocacy, i.e. open access • Knowledgeable about the repository’s collection • Participate in the Academic Institution’s Research Committees • Knowledgeable of your repository’s collection • Familiarity with Copyright issues and Creative Commons Licenses
  • 7. How can repositories support TDM? TDM is all about processing text and data at scale. The role of repositories is to facilitate the aggregation of research papers at a full-text level (and beyond) effectively enabling TDM services to operate seamlessly on all available research content. 7
  • 8. What is the problem? @openminted_eu • A small study (Knoth, 2013) • 83 repositories - mainly Eprints with PDF research outputs • 1,461,016 metadata records metadata linked to content content downloadable content machine readable Mean 54.1% 34.4% 27.6% Median 39.5% 16.7% 13.0% Standard deviation 39.2% 34.2% 31.0%
  • 9. How is content aggregated today? @openminted_eu • DC over OAI-PMH: vast majority of repositories, never intended to support content harvesting. The main problem: linking metadata with content. “The nature of a resource identifier is outside the scope of the OAI- PMH. To facilitate access to the resource associated with harvested metadata, repositories should use an element in metadata records to establish a linkage between the record (and the identifier of its item) and the identifier (URL, URN, DOI, etc.) of the associated resource. The mandatory Dublin Core format provides the identifier element that should be used for this purpose.”
  • 10. How is content aggregated today? @openminted_eu • RIOXX: Just one identifier, recommends the identifier points to the actual resource being described. • OpenAIRE Guidelines: identifier links to either the resource or a jump-off page. Does allow multiple identifiers. • ResourceSync • CrossRef: comercial publishers/journals
  • 12. Principle 1: content referencing Repositories should always establish a link from the metadata record to the item the metadata record describes using a dereferencable identifier pointing to the version held locally in the repository. The dereferencable identifier should be provided in the appropriate metadata element in the used metadata format (i.e. dc:identifier in the case of Dublin Core). If multiple identifiers are used, it is recommended listing the local dereferencable identifier first. 1
  • 13. The accessibility of repositories to harvesting systems @openminted_eu
  • 14. Principle 2: Content accessibility to machines Repositories must provide universal access to machines with the same level of access as humans have. It is the role of repositories to allow aggregators to harvest the entire content of the repository in a reasonable time to enable acquiring and maintain up-to-date information about the repository content. 1
  • 15. What can repositories do? @openminted_eu • Ensure correct referencing of content from metadata: • Dereferencable link which resolves to content • Locally held (content under its control) • Using a standard repository platform can help • Check robots.txt • Register your repository • Advocate for good pdf (media) quality of deposited content • Use monitoring tools • CORE Repository Dashboard • OpenAIRE Repository Manager Dashboard • Machine readable licensing
  • 16. beyond Open Access MAKING SENSE OF LARGE VOLUMES OF SCIENTIFIC CONTENT 1
  • 17. Interested in how to TDM research papers? @openminted_eu We have 3 more talks tomorrow! Developer track 1, 11:00 Mining Open Access publications with CORE
  • 18. Interested in how to TDM research papers? @openminted_eu We have 3 more talks tomorrow! Developer track 1, 11:20 Oxford vs Cambridge Contest: Collecting Open Research Evaluation Metrics for University Ranking
  • 19. Interested in how to TDM research papers? @openminted_eu We have 3 more talks tomorrow! Papers 4, 4:00 Exploring Semantometrics: full text-based research evaluation for open repositories
  • 20. Thank you Dr. Pert Knoth,, Research Fellow petr.knoth@open.ac.uk Dr. Nancy Pontika, Open Access Aggregation Officer nancy.pontika@open.ac.uk . 2

Notes de l'éditeur

  1. Mining individual repositories is not intersteing. TDM is about processing at scale. The role of repositories is: …
  2. So why am I talking about what the role of the repositories is? Well I think we have a slight problem here … We have done a study to …
  3. The main problem: linking metadata with content.
  4. OpenAIRE guidelines: https://guidelines.openaire.eu/en/latest/literature/field_resourceidentifier.html The ideal use of this element is to use a direct link or a link to a jump-off page (persistent URL) fromdc:identifier in the metadata record to the digital resource or a jump-off page.
  5. <dc:identifier> field: The aim of the Dublin Core Metadata tags is to ensure online interoperability of metadata standards. The importance of the <dc:identifier> tag is that it describes the resource of the harvested output. CORE expects in this field to find the direct URL of the PDF. When the information in this field is not presented properly, the CORE crawler needs to crawl for the PDF and the success of finding it cannot be guaranteed. This also causes additional server processing time and bandwidth both for the harvester and the hosting institution. There are also three additional points that need to be considered with regards to the <dc:identifier>; a) this field should describe an absolute path to the file, b) it should contain an appropriate file name extension, for example “.pdf” and c) the full-text items should be stored under the same repository domain.
  6. The problem is not multiple metadata formats, but the fact that none of them is good enough! Thinking that by supporting the guidelines you allow content aggregation is an issue. Locally means within the repositories control. <dc:identifier> field: The aim of the Dublin Core Metadata tags is to ensure online interoperability of metadata standards. The importance of the <dc:identifier> tag is that it describes the resource of the harvested output. CORE expects in this field to find the direct URL of the PDF. When the information in this field is not presented properly, the CORE crawler needs to crawl for the PDF and the success of finding it cannot be guaranteed. This also causes additional server processing time and bandwidth both for the harvester and the hosting institution. There are also three additional points that need to be considered with regards to the <dc:identifier>; a) this field should describe an absolute path to the file, b) it should contain an appropriate file name extension, for example “.pdf” and c) the full-text items should be stored under the same repository domain.
  7. Arxiv has now a slightly nicer robots.txt where anoyone is allowed access with a 15s delay. Still not doable …
  8. Platform: For those who haven’t deployed a repository yet, it is highly advised that the repository platform is not built in house, but one of the industry standard platforms is chosen. The benefits of choosing one of the existing platforms is that they provide frequent content updates, constant support and extend repository functionality through plug-ins.
  9. Our ultimate goal is to put in place infrastructure that will enable anyone to make sense of large volumes of scientific data. The infrastructure is open and transparent.
  10. If you are interested in how we makes sense of the large volumes of scientific content.
  11. If you are interested in how we makes sense of the large volumes of scientific content.
  12. If you are interested in how we makes sense of the large volumes of scientific content.