SlideShare a Scribd company logo
1 of 32
www.eudat.eu
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Data Discoverability and
Persistent Identifiers
Christine Staiger and Sofiane Bendoukha
SURFsara and DKRZ
EUDAT Summer School, Herkalion, 2017
PIDs in EUDAT
PID Training
PIDs in EUDAT – Why?
PID Training
• Managing increasing numbers of data objects
• Sharing data from different sources amongst
researchers
• Data needs to be (globally) identifiable and
addressable  reuse of data
• Data citation
• Linking data from different sources
 Pooling datasets, B2SAFE
• Challenges
• Object locations change over time
• Object migration between repositories
What do we want from data?
• Findable – Easy to find by both humans and computer
systems  Metadata
• Accessible – Stored for long term, accessed and/or
downloaded with well-defined license and access
• Interoperable – Ready to be combined with other datasets by
humans as well as computer systems;
• Reusable – Ready to be used for future research and to be
processed further using computational methods.
• The FAIR guiding Principles for scientific data management
and stewardship, doi:10.1038/sdata.2016.18
PID Training
What do we need?
PID Training
Data
PID
Metadata
• Persistent Identifier: reference and identify object, either
metadata or data object
• Synchronise PID, Data and Metadata during creation,
maintenance, update and deletion of a digital object!
What do we need?
PID Training
Data
PID
Metadata
Data
PID
Metadata
Data
PID
Metadata
Data
PID
Metadata
Data
PID
Metadata Raw data
Published data
Analysed data
What do we know about Persistent Identifiers?
• A Persistent Identifier (PID) is an identifier that is effectively
permanently assigned to a resource.
PID Training
identifier resource location
a permanent name or
identity
although it may change
over time
• Pointers to data resources
• Globally unique
• Exist infinitely long (the PID, not necessarily the data)
Simple data life cycle, linearised
PID Training
Publish data online, data is accessed by others
Published online: http://www.test.com/test.html
Other users may cite, access, re-use this url
Relocate the resource at http://www.example.com/
Other users are not informed -> 404
Publish
online
Move to
another
location
used by
another
researcher
Data Life Cycle with PID system
Publish
online
Move to
another
location
Used by
another
researcher
PIDs are subject to the same life cycle
Resolve
PID
Resolve
PID
Resolve
PID
Register
PID
Update PID
Get PID
Details
Handle
Resolution
service
PID system
Publish data online, data is accessed by others
Advantages and Disadvantages
PID Training
Pro:
• Static reference,
even if data moves or
changes
• Network of persistent links
Data – metadata relations
Provenance chains
Con:
• Extra effort
• What to identify?
• Coordination across
organisations and people
• Organisational discipline to
ensure persistence
Use cases
PID Training
Use Case 1: Data publication
PID Training
• PIDs point to landing page of the digital repository
showing metadata
• “Real” data can be downloaded from this page with another link
• E.g. B2SHARE, FigShare, Zenodo, …
• PID
http://hdl.handle.net/11304/3265434c-4b34-11e4-81ac-dcbd1b51435e
resolves to landing page
https://b2share.eudat.eu/records/feafb12e810c489b9e878949c6c35345
B2SHARE
The persistent identifier
for the collection
B2SHARE
The persistent identifier
for files
Use case 2: Modeling Relationships
• Use tightly coupled metadata
• Part of/has part relationships
• (B2SAFE) Link replicas
• Model cohort-patient
relationship
• Model patient-samples
relationship
Which metadata to store with the PID
and which in an extra catalogue ?
PID: prefix1/suffix1
PID: prefix2/suffix2
Metadata:
key1: Location2
key2: prefix1/suffix1
PID: prefix3/suffix3Metadata:
key1: Location1
key2: prefix2/suffix2
key3; prefix3/suffix3
Metadata:
key1: Location3
key2: prefix1/suffix1
PID Training
Use case 3: Enabling data workflows
PID Training
Mapping data models
PID Training
TraIT data ontology
PID Training
Chao (Cico) Zhang, VU
Sanne Ablen, VU
Jochem Bijlard, VU
Christine Staiger, SURFsara
PID Training
Use Case 4: Enabling workflows
• Execute program hidden behind a PID
• Way to refer to workflows  reproducibility
PIDs in EUDAT – Why?
PID Training
Create PIDs
ID1/25
ID1/26
ID2/41
ID2/42
ID1/25
ID1/26
ID2/41
ID2/42
Link PIDs and metadata
In human readable way
Resolve PIDs to
real location od data
The data managers’ workflow
Data manager User
Create PID
ID1/25
ID1/25
ID1/25
ID1/25
Location in B2SAFE
Fetch data
PID systems
PID Training
Resolution and the PID pattern
PID Training
Exercise
- Resolve the PIDs
- What happens if you resolve a PID with a
foreign resolver?
http://hdl.handle.net/21.T12995/PID-
training
Resolving PIDs
Global Registry
E.g. Handle
system
3. Client gets request
to resolve hdl:123/456
1. Client sends request to Global to resolve
0.NA/123 (prefix handle for 123/456)
2. Global Responds with Service
Information for 123
#1
#1
#2
#3
Secondary Site A
Secondary Site B
Local Service
#1 #2
Primary Site
4. Server responds with
handle data
Service Information
Local Handle Service
IP xc xc xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xc
xc
xc
..
..
..
xc
xc
xc
..
..
..
...
xcccxv
xccx
xccx
xcccxv
xccx
xccx
xcccxv
xccx
xccx
PID systems and issuing authorities
PID Training
EPICDOI HANDLE
GDWG
SURFsara
CSC
grnet
DataCite
CrossRef
DONA Foundation
PID systems and issuing authorities
PID Training
• URN:NBN
• Policies: PID is persistent and the data it is dereferenced to
• Wants to be independent from transfer protocols
 Currently all identifiers start with http
Might change in the future
• DOI
• Policies: PID is persistent, data not
• Based on the handle system
• Datacite, Crossref are prefix issuing authorities
• Requires extra metadata, stored in another database
• Both:
• PIDs point to a landing page, not the file itself
 Taylored towards data citation
• User needs to provide a minimum set of metadata (Dublin Core)
PID systems and issuing authorities
PID Training
• ePIC (European PID consortium)
• Policies: PID is persistent, data is not
• PIDs can point to anything
• Based on the handle system
• Taylored towards data identification
and resolving
• DONA foundation (www.dona.net)
• Maintains global handle registry
• Partners:
• CNRI (developer of the handle system)
• GDWG (main partner in ePIC)
• International DOI foundation (IDF)
The Handle system
• Metadata: You can create your own keyword-value pairs and store
them with the PID
• EUDAT Policies:
• Handles to be maintained beyond project life time
• Enforce stability of PIDs to justify trust in them
• Handles can point to anything
• Handles can also be removed, they are not per se persistent
…
Great flexibility for adjusting the system towards your own
needs
EUDAT provides implementations for replica tracking
You have to think even more carefully about how you want to
facilitate data management
PID Training
For whom?
• PIDs allow to make a distinction between data users and data
managers
• Data users get a PID and can directly access the data, or the
metadata stored with the PID
• Pipelines can programmatically access the metadata and start
specific applications
• Requires some serious thoughts about data organisation and
developing the code to put data policies into practice, including
code maintenance
For bigger research groups or consortia working in a distributed
data environment
For repositories who are in need of a host for their PIDs
PID Training
Step by Step:
Using the B2HANDLE python library
• Register data with a Handle
• GET the details of a Handle
• Modify a Handle record
• Link two files on PID level
• Reverse look-up (not possible via normal Handle API)
PID Training
www.eudat.eu
Authors Contributors
This work is licensed under the Creative Commons CC-BY 4.0 licence
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures.
Contract No. 654065
Themis Zamani, GRNET
Willem Elbers, CLARIN
Christine Staiger, SURFsara
Sofiane Bendoukha, DKRZ
Ellen Leenarts, DANS
Kostas Kavoussanakis, EPCC
Thank you

More Related Content

What's hot

B2STAGE- how to shift large amounts of data| www.eudat.eu |
B2STAGE- how to shift large amounts of data| www.eudat.eu | B2STAGE- how to shift large amounts of data| www.eudat.eu |
B2STAGE- how to shift large amounts of data| www.eudat.eu | EUDAT
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVEUDAT
 
Cambridge University Geospatial Metadata Workshop 20110524
Cambridge University Geospatial Metadata Workshop 20110524Cambridge University Geospatial Metadata Workshop 20110524
Cambridge University Geospatial Metadata Workshop 20110524EDINA, University of Edinburgh
 
Repository Fringe 2016 - Survey Documentation and Analysis
Repository Fringe 2016 - Survey Documentation and AnalysisRepository Fringe 2016 - Survey Documentation and Analysis
Repository Fringe 2016 - Survey Documentation and AnalysisEDINA, University of Edinburgh
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!EDINA, University of Edinburgh
 
Research Data Management (RDM) Initiatives at the University of Edinburgh
Research Data Management (RDM) Initiatives at the University of EdinburghResearch Data Management (RDM) Initiatives at the University of Edinburgh
Research Data Management (RDM) Initiatives at the University of EdinburghEDINA, University of Edinburgh
 
Shibboleth Access Management Federations and Secure SDI: ESDIN Experience
Shibboleth Access Management Federations and Secure SDI: ESDIN Experience Shibboleth Access Management Federations and Secure SDI: ESDIN Experience
Shibboleth Access Management Federations and Secure SDI: ESDIN Experience EDINA, University of Edinburgh
 
The EU INSPIRE Directive and what it might mean for UK academia
The EU INSPIRE Directive and what it might mean for UK academiaThe EU INSPIRE Directive and what it might mean for UK academia
The EU INSPIRE Directive and what it might mean for UK academiaEDINA, University of Edinburgh
 
The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...
The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...
The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...EDINA, University of Edinburgh
 
Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014EDINA, University of Edinburgh
 
Northumbria University Geospatial Metadata Workshop 20110505
Northumbria University Geospatial Metadata Workshop 20110505Northumbria University Geospatial Metadata Workshop 20110505
Northumbria University Geospatial Metadata Workshop 20110505EDINA, University of Edinburgh
 
Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617EDINA, University of Edinburgh
 
Now we are six: Integrating Edinburgh DataShare into local and internet in...
Now we are six: Integrating Edinburgh DataShare into local and internet in...Now we are six: Integrating Edinburgh DataShare into local and internet in...
Now we are six: Integrating Edinburgh DataShare into local and internet in...Robin Rice
 
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareRobin Rice
 
Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415EDINA, University of Edinburgh
 
Introduction to Edinburgh University Data Library and national data services
Introduction to Edinburgh University Data Library and national data servicesIntroduction to Edinburgh University Data Library and national data services
Introduction to Edinburgh University Data Library and national data servicesEDINA, University of Edinburgh
 
Introduction to data and support services for Political Data Analysis
Introduction to data and support services for Political Data AnalysisIntroduction to data and support services for Political Data Analysis
Introduction to data and support services for Political Data AnalysisEDINA, University of Edinburgh
 
Data Library Services at the University of Edinburgh
Data Library Services at the University of EdinburghData Library Services at the University of Edinburgh
Data Library Services at the University of EdinburghRobin Rice
 

What's hot (20)

B2STAGE- how to shift large amounts of data| www.eudat.eu |
B2STAGE- how to shift large amounts of data| www.eudat.eu | B2STAGE- how to shift large amounts of data| www.eudat.eu |
B2STAGE- how to shift large amounts of data| www.eudat.eu |
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROV
 
Cambridge University Geospatial Metadata Workshop 20110524
Cambridge University Geospatial Metadata Workshop 20110524Cambridge University Geospatial Metadata Workshop 20110524
Cambridge University Geospatial Metadata Workshop 20110524
 
Repository Fringe 2016 - Survey Documentation and Analysis
Repository Fringe 2016 - Survey Documentation and AnalysisRepository Fringe 2016 - Survey Documentation and Analysis
Repository Fringe 2016 - Survey Documentation and Analysis
 
Geospatial Metadata Workshop
Geospatial Metadata WorkshopGeospatial Metadata Workshop
Geospatial Metadata Workshop
 
Glasgow University Geo Metadata Workshop
Glasgow University Geo Metadata WorkshopGlasgow University Geo Metadata Workshop
Glasgow University Geo Metadata Workshop
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!
 
Research Data Management (RDM) Initiatives at the University of Edinburgh
Research Data Management (RDM) Initiatives at the University of EdinburghResearch Data Management (RDM) Initiatives at the University of Edinburgh
Research Data Management (RDM) Initiatives at the University of Edinburgh
 
Shibboleth Access Management Federations and Secure SDI: ESDIN Experience
Shibboleth Access Management Federations and Secure SDI: ESDIN Experience Shibboleth Access Management Federations and Secure SDI: ESDIN Experience
Shibboleth Access Management Federations and Secure SDI: ESDIN Experience
 
The EU INSPIRE Directive and what it might mean for UK academia
The EU INSPIRE Directive and what it might mean for UK academiaThe EU INSPIRE Directive and what it might mean for UK academia
The EU INSPIRE Directive and what it might mean for UK academia
 
The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...
The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...
The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...
 
Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014
 
Northumbria University Geospatial Metadata Workshop 20110505
Northumbria University Geospatial Metadata Workshop 20110505Northumbria University Geospatial Metadata Workshop 20110505
Northumbria University Geospatial Metadata Workshop 20110505
 
Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617
 
Now we are six: Integrating Edinburgh DataShare into local and internet in...
Now we are six: Integrating Edinburgh DataShare into local and internet in...Now we are six: Integrating Edinburgh DataShare into local and internet in...
Now we are six: Integrating Edinburgh DataShare into local and internet in...
 
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
 
Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415
 
Introduction to Edinburgh University Data Library and national data services
Introduction to Edinburgh University Data Library and national data servicesIntroduction to Edinburgh University Data Library and national data services
Introduction to Edinburgh University Data Library and national data services
 
Introduction to data and support services for Political Data Analysis
Introduction to data and support services for Political Data AnalysisIntroduction to data and support services for Political Data Analysis
Introduction to data and support services for Political Data Analysis
 
Data Library Services at the University of Edinburgh
Data Library Services at the University of EdinburghData Library Services at the University of Edinburgh
Data Library Services at the University of Edinburgh
 

Similar to Data Discoverability and Persistent Identifiers - EUDAT Summer School (Christine Staiger, SURFsara, Sofiane Bendoukha, DKRZ)

Introduction to Persistent Identifiers| www.eudat.eu |
Introduction to Persistent Identifiers| www.eudat.eu | Introduction to Persistent Identifiers| www.eudat.eu |
Introduction to Persistent Identifiers| www.eudat.eu | EUDAT
 
Persistent Identifiers in EUDAT Services. B2HANDLE Python library | www.eudat...
Persistent Identifiers in EUDAT Services. B2HANDLE Python library | www.eudat...Persistent Identifiers in EUDAT Services. B2HANDLE Python library | www.eudat...
Persistent Identifiers in EUDAT Services. B2HANDLE Python library | www.eudat...EUDAT
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)Christophe Debruyne
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 
GDPR Compliance Made Easy with Data Virtualization
GDPR Compliance Made Easy with Data VirtualizationGDPR Compliance Made Easy with Data Virtualization
GDPR Compliance Made Easy with Data VirtualizationDenodo
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATTony Ross-Hellauer
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATOpenAIRE
 
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | EUDAT
 
GDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationGDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationDenodo
 
Data Preservation Service Area
Data Preservation Service AreaData Preservation Service Area
Data Preservation Service AreaEUDAT
 
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)OpenAIRE
 
2010 CLARA Nijmegen - Data Seal of Approval tutorial
2010 CLARA Nijmegen - Data Seal of Approval tutorial2010 CLARA Nijmegen - Data Seal of Approval tutorial
2010 CLARA Nijmegen - Data Seal of Approval tutorialDirk Roorda
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Demystifying Data Virtualization (ASEAN)
Demystifying Data Virtualization (ASEAN)Demystifying Data Virtualization (ASEAN)
Demystifying Data Virtualization (ASEAN)Denodo
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo
 
Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster LEARN Project
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarFAIRDOM
 
EUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdfEUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdfEUDAT
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsDenodo
 

Similar to Data Discoverability and Persistent Identifiers - EUDAT Summer School (Christine Staiger, SURFsara, Sofiane Bendoukha, DKRZ) (20)

Introduction to Persistent Identifiers| www.eudat.eu |
Introduction to Persistent Identifiers| www.eudat.eu | Introduction to Persistent Identifiers| www.eudat.eu |
Introduction to Persistent Identifiers| www.eudat.eu |
 
Persistent Identifiers in EUDAT Services. B2HANDLE Python library | www.eudat...
Persistent Identifiers in EUDAT Services. B2HANDLE Python library | www.eudat...Persistent Identifiers in EUDAT Services. B2HANDLE Python library | www.eudat...
Persistent Identifiers in EUDAT Services. B2HANDLE Python library | www.eudat...
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
GDPR Compliance Made Easy with Data Virtualization
GDPR Compliance Made Easy with Data VirtualizationGDPR Compliance Made Easy with Data Virtualization
GDPR Compliance Made Easy with Data Virtualization
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
 
GDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationGDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data Virtualization
 
Data Preservation Service Area
Data Preservation Service AreaData Preservation Service Area
Data Preservation Service Area
 
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
 
2010 CLARA Nijmegen - Data Seal of Approval tutorial
2010 CLARA Nijmegen - Data Seal of Approval tutorial2010 CLARA Nijmegen - Data Seal of Approval tutorial
2010 CLARA Nijmegen - Data Seal of Approval tutorial
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Demystifying Data Virtualization (ASEAN)
Demystifying Data Virtualization (ASEAN)Demystifying Data Virtualization (ASEAN)
Demystifying Data Virtualization (ASEAN)
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
 
Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
 
EUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdfEUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdf
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 

More from EUDAT

EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdfEUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdfEUDAT
 
EUDAT Booklet Mar22 (2).pdf
EUDAT Booklet Mar22 (2).pdfEUDAT Booklet Mar22 (2).pdf
EUDAT Booklet Mar22 (2).pdfEUDAT
 
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdfEUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdfEUDAT
 
EUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdfEUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdfEUDAT
 
EUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdfEUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdfEUDAT
 
EUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdfEUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdfEUDAT
 
EUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdfEUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdfEUDAT
 
EUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdfEUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdfEUDAT
 
Rob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT servicesRob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT servicesEUDAT
 
Ariyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentationAriyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentationEUDAT
 
Using B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto PilotUsing B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto PilotEUDAT
 
OpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last weekOpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last weekEUDAT
 
European Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshopEuropean Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshopEUDAT
 
Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...EUDAT
 
FAIRness of training materials
FAIRness of training materialsFAIRness of training materials
FAIRness of training materialsEUDAT
 
Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...EUDAT
 
Draft Governance Framework for the EOSC
Draft Governance Framework for the EOSCDraft Governance Framework for the EOSC
Draft Governance Framework for the EOSCEUDAT
 
Building Interoperable AAI for Researchers
Building Interoperable AAI for ResearchersBuilding Interoperable AAI for Researchers
Building Interoperable AAI for ResearchersEUDAT
 
ENVRIPLUS Data for Science Theme
ENVRIPLUS Data for Science ThemeENVRIPLUS Data for Science Theme
ENVRIPLUS Data for Science ThemeEUDAT
 
Data for Science Service Portfolio
Data for Science Service PortfolioData for Science Service Portfolio
Data for Science Service PortfolioEUDAT
 

More from EUDAT (20)

EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdfEUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
 
EUDAT Booklet Mar22 (2).pdf
EUDAT Booklet Mar22 (2).pdfEUDAT Booklet Mar22 (2).pdf
EUDAT Booklet Mar22 (2).pdf
 
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdfEUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
 
EUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdfEUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdf
 
EUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdfEUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdf
 
EUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdfEUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdf
 
EUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdfEUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdf
 
EUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdfEUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdf
 
Rob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT servicesRob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT services
 
Ariyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentationAriyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentation
 
Using B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto PilotUsing B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto Pilot
 
OpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last weekOpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last week
 
European Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshopEuropean Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshop
 
Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...
 
FAIRness of training materials
FAIRness of training materialsFAIRness of training materials
FAIRness of training materials
 
Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...
 
Draft Governance Framework for the EOSC
Draft Governance Framework for the EOSCDraft Governance Framework for the EOSC
Draft Governance Framework for the EOSC
 
Building Interoperable AAI for Researchers
Building Interoperable AAI for ResearchersBuilding Interoperable AAI for Researchers
Building Interoperable AAI for Researchers
 
ENVRIPLUS Data for Science Theme
ENVRIPLUS Data for Science ThemeENVRIPLUS Data for Science Theme
ENVRIPLUS Data for Science Theme
 
Data for Science Service Portfolio
Data for Science Service PortfolioData for Science Service Portfolio
Data for Science Service Portfolio
 

Recently uploaded

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 

Recently uploaded (20)

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 

Data Discoverability and Persistent Identifiers - EUDAT Summer School (Christine Staiger, SURFsara, Sofiane Bendoukha, DKRZ)

  • 1. www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 Data Discoverability and Persistent Identifiers Christine Staiger and Sofiane Bendoukha SURFsara and DKRZ EUDAT Summer School, Herkalion, 2017
  • 2. PIDs in EUDAT PID Training
  • 3. PIDs in EUDAT – Why? PID Training • Managing increasing numbers of data objects • Sharing data from different sources amongst researchers • Data needs to be (globally) identifiable and addressable  reuse of data • Data citation • Linking data from different sources  Pooling datasets, B2SAFE • Challenges • Object locations change over time • Object migration between repositories
  • 4. What do we want from data? • Findable – Easy to find by both humans and computer systems  Metadata • Accessible – Stored for long term, accessed and/or downloaded with well-defined license and access • Interoperable – Ready to be combined with other datasets by humans as well as computer systems; • Reusable – Ready to be used for future research and to be processed further using computational methods. • The FAIR guiding Principles for scientific data management and stewardship, doi:10.1038/sdata.2016.18 PID Training
  • 5. What do we need? PID Training Data PID Metadata • Persistent Identifier: reference and identify object, either metadata or data object • Synchronise PID, Data and Metadata during creation, maintenance, update and deletion of a digital object!
  • 6. What do we need? PID Training Data PID Metadata Data PID Metadata Data PID Metadata Data PID Metadata Data PID Metadata Raw data Published data Analysed data
  • 7. What do we know about Persistent Identifiers? • A Persistent Identifier (PID) is an identifier that is effectively permanently assigned to a resource. PID Training identifier resource location a permanent name or identity although it may change over time • Pointers to data resources • Globally unique • Exist infinitely long (the PID, not necessarily the data)
  • 8. Simple data life cycle, linearised PID Training Publish data online, data is accessed by others Published online: http://www.test.com/test.html Other users may cite, access, re-use this url Relocate the resource at http://www.example.com/ Other users are not informed -> 404 Publish online Move to another location used by another researcher
  • 9. Data Life Cycle with PID system Publish online Move to another location Used by another researcher PIDs are subject to the same life cycle Resolve PID Resolve PID Resolve PID Register PID Update PID Get PID Details Handle Resolution service PID system Publish data online, data is accessed by others
  • 10. Advantages and Disadvantages PID Training Pro: • Static reference, even if data moves or changes • Network of persistent links Data – metadata relations Provenance chains Con: • Extra effort • What to identify? • Coordination across organisations and people • Organisational discipline to ensure persistence
  • 12. Use Case 1: Data publication PID Training • PIDs point to landing page of the digital repository showing metadata • “Real” data can be downloaded from this page with another link • E.g. B2SHARE, FigShare, Zenodo, … • PID http://hdl.handle.net/11304/3265434c-4b34-11e4-81ac-dcbd1b51435e resolves to landing page https://b2share.eudat.eu/records/feafb12e810c489b9e878949c6c35345
  • 15. Use case 2: Modeling Relationships • Use tightly coupled metadata • Part of/has part relationships • (B2SAFE) Link replicas • Model cohort-patient relationship • Model patient-samples relationship Which metadata to store with the PID and which in an extra catalogue ? PID: prefix1/suffix1 PID: prefix2/suffix2 Metadata: key1: Location2 key2: prefix1/suffix1 PID: prefix3/suffix3Metadata: key1: Location1 key2: prefix2/suffix2 key3; prefix3/suffix3 Metadata: key1: Location3 key2: prefix1/suffix1
  • 16. PID Training Use case 3: Enabling data workflows
  • 19. TraIT data ontology PID Training Chao (Cico) Zhang, VU Sanne Ablen, VU Jochem Bijlard, VU Christine Staiger, SURFsara
  • 20. PID Training Use Case 4: Enabling workflows • Execute program hidden behind a PID • Way to refer to workflows  reproducibility
  • 21. PIDs in EUDAT – Why? PID Training Create PIDs ID1/25 ID1/26 ID2/41 ID2/42 ID1/25 ID1/26 ID2/41 ID2/42 Link PIDs and metadata In human readable way Resolve PIDs to real location od data
  • 22. The data managers’ workflow Data manager User Create PID ID1/25 ID1/25 ID1/25 ID1/25 Location in B2SAFE Fetch data
  • 24. Resolution and the PID pattern PID Training Exercise - Resolve the PIDs - What happens if you resolve a PID with a foreign resolver? http://hdl.handle.net/21.T12995/PID- training
  • 25. Resolving PIDs Global Registry E.g. Handle system 3. Client gets request to resolve hdl:123/456 1. Client sends request to Global to resolve 0.NA/123 (prefix handle for 123/456) 2. Global Responds with Service Information for 123 #1 #1 #2 #3 Secondary Site A Secondary Site B Local Service #1 #2 Primary Site 4. Server responds with handle data Service Information Local Handle Service IP xc xc xc xc xc xc xc xc xc xc xc xc xc xc xc xc xc xc xc xc xc xc xc xc .. .. .. xc xc xc .. .. .. xc xc xc .. .. .. ... xcccxv xccx xccx xcccxv xccx xccx xcccxv xccx xccx
  • 26. PID systems and issuing authorities PID Training EPICDOI HANDLE GDWG SURFsara CSC grnet DataCite CrossRef DONA Foundation
  • 27. PID systems and issuing authorities PID Training • URN:NBN • Policies: PID is persistent and the data it is dereferenced to • Wants to be independent from transfer protocols  Currently all identifiers start with http Might change in the future • DOI • Policies: PID is persistent, data not • Based on the handle system • Datacite, Crossref are prefix issuing authorities • Requires extra metadata, stored in another database • Both: • PIDs point to a landing page, not the file itself  Taylored towards data citation • User needs to provide a minimum set of metadata (Dublin Core)
  • 28. PID systems and issuing authorities PID Training • ePIC (European PID consortium) • Policies: PID is persistent, data is not • PIDs can point to anything • Based on the handle system • Taylored towards data identification and resolving • DONA foundation (www.dona.net) • Maintains global handle registry • Partners: • CNRI (developer of the handle system) • GDWG (main partner in ePIC) • International DOI foundation (IDF)
  • 29. The Handle system • Metadata: You can create your own keyword-value pairs and store them with the PID • EUDAT Policies: • Handles to be maintained beyond project life time • Enforce stability of PIDs to justify trust in them • Handles can point to anything • Handles can also be removed, they are not per se persistent … Great flexibility for adjusting the system towards your own needs EUDAT provides implementations for replica tracking You have to think even more carefully about how you want to facilitate data management PID Training
  • 30. For whom? • PIDs allow to make a distinction between data users and data managers • Data users get a PID and can directly access the data, or the metadata stored with the PID • Pipelines can programmatically access the metadata and start specific applications • Requires some serious thoughts about data organisation and developing the code to put data policies into practice, including code maintenance For bigger research groups or consortia working in a distributed data environment For repositories who are in need of a host for their PIDs PID Training
  • 31. Step by Step: Using the B2HANDLE python library • Register data with a Handle • GET the details of a Handle • Modify a Handle record • Link two files on PID level • Reverse look-up (not possible via normal Handle API) PID Training
  • 32. www.eudat.eu Authors Contributors This work is licensed under the Creative Commons CC-BY 4.0 licence EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 Themis Zamani, GRNET Willem Elbers, CLARIN Christine Staiger, SURFsara Sofiane Bendoukha, DKRZ Ellen Leenarts, DANS Kostas Kavoussanakis, EPCC Thank you

Editor's Notes

  1. Data generation is getting easier/cheaper. And we produce more and more data, which can potentially reused. How keep an overview over data, its location and its up-to-datedness. Especially when working in consortia where there is a more or less clear distinction between data users, data producers, data admins. Users need to be provided with the current location of the data and maybe also their previous versions, which may lay on a different server. Data producers upload data, but data managers who have more insight into storage might move the data around and migrate to new storage facilities. Data might be linked to other data, how to maintain those links? At the same time there is a Shift from data generation to data processing & analysis . A new way to do science. As a result the number of data output is increasing. A new data world. So as to make the data world a better place for science we must have sοme data principles in mind. The idea of these data principles are: Findable – Easy to find by both humans and computer systems  Metadata Accessible – Stored for long term, accessed and/or downloaded with well-defined license and access Interoperable – Ready to be combined with other datasets by humans as well as computer systems; Reusable – Ready to be used for future research and to be processed further using computational methods.
  2. We need infrastructure to keep data, metadata and their PID synchronised throughout the data and research life cycle. This becomes even more complicated when we want to keep relations between data files. At what level do we need the linking?
  3. We need infrastructure to keep data, metadata and their PID synchronised throughout the data and research life cycle. This becomes even more complicated when we want to keep relations between data files. At what level do we need the linking? PIDs are part of the data management. The technology can also be used to ensure the FAIR principles. To what extent and at which level we will investigate in this tutorial.
  4. An identifier is a unique name, identity applied to something so that something can be easily referenced. - Points to a resource: The resource is a black box. The type of the url doesn't matter. It may be a file, a metadata record, a code collection. - globally unique: Once it is created, the resource is globally addressable. You wont find an identifier with the same name that points to a different resource. - PID is persistent over time : There is a persistent relationship between the pid and the same resource over time The infrastructure can support access to resources as they move from one repo to another.
  5. In order to understand PIDs and the PID service lets see and discuss a real example. The simplest data life cycle. You have a research output and you want to share with other researchers. The transitional way to store your data is to upload to a site, a repository, a directory. In order to access it you bookmark or share a URL. A) Published online: http://www.test.com/test.html. B) Other users may cite, access, re-use this url As long as nothing changes about the way the data is accessed, this works fine. But one day you decide to move the resource to another location c) Relocate the resource at http://www.example.com/ Other users are not informed about this relocation and when they are trying to acccess the resource - at the first location – they always get a Page Not Found response. So this arrangement has proven to be fragile. Lets see what will happen when you decide to connect to a PID service.
  6. From the moment you decide to use PID, data and PIDs are strictly connected. As we have already discussed, managing data online includes managing the persistent identifier for the data so that it continues to provide information about whatever it identifies—no matter where it is stored online Data has its own lifecycle. PIDs are subject to the same life cycle. When you decide to publish something online at the same time you must register for a PID When you decide to move the data to another location at the same time you must inform the service and update the PID By the time a PID is created it is always resolvable by the handle resolution service.
  7. Safely archive data, cite published data from others
  8. What actually happened? The B2SHARE service made a request for a two new PIDs. It created a DOI for the collection. This DOI is resolvable with the DOI resolver and will always resolve to the landing page in B2SHARE for the collection. B2SHARE also created a Handle for the collection, which also resolves to the landing page.
  9. Moreover B2SHARE creates a PID(Handle) for each data file in the collection. These PIDs resolve directly to the file and can be used to automatically download the files in an unambiguous way. The PID entry also contains the md5 checksum. This field can be easily queried with e.g. the B2HANDLE python library and be employed for integrity checks.
  10. You can use PIDs to store extra metadata (system specific) metadata. We have seen already the use of Checksums in B2SHARE. But we can also put links to other files and folders in the metadata and introducing relationships between singe data objects and collections.
  11. Metadata for users resides in transMart: metadata and highly processed data Users select data and a workflow here The raw data or less processed data lies in various repositories and may also be moved between repositories Rather than linking in transmart to a location in the repository, tranSMART uses PIDs, which can be updated when data moves By this PID data can be transferred from repository to a workflow engine Galaxy: Workflow engine, starts programs on data, pushes results back to TranSMART
  12. Metadata catalogue and Repositories like EGA have different underlying data models. Storage for raw data has no data model at all just keeps bitstreams safe.  Identify which data products you need in the pipelines, in case data only addressable by location, decouple location and addressing by introducing PIDs.
  13. CED – collected experiment data TraIT developed a machine readable metadata template for fetching data, incorporating local IDs from repositories and globally resolvable PIDs.
  14. Data is not only data. We can refer to code with a PID, e.g. a python file here. By this we can refer to previously executed workflows and ensure reproducibility.
  15. B2SAFE, B2SHARE generate PIDs (for different purposes) B2STAGE uses PIDs to retrieve data from B2SAFE. B2FIND exposes metadata to the user and refers to data PID.
  16. Example workflow for the afternoon.
  17. PIDs consist of a prefix and suffix. Prefix is unique in the PID system and provided by PID system administration. Prefix denotes the administrativ domain, comes with a password. Only with the proper authorization you are allowed to change a PID
  18. PID systems do not only differ technically, but also policy-wise. The HANDLE system is the underlying technical system for DOIs and EPIC persistent identifiers EPIC employs the Handle system and overlays it with its own policies like “Never delete a PID” The EPIC consortium and its members are prefix issuing authorities The EPIC partners make sure that EPIC PIDs are always available (mirroring) DOI also employs the Handle system DOI uses different policies: e.g. they ask for a specific set of metadata (Author, year, title, … Dublin Core) that is stored in an extra database DataCite and Crossref are partners and prefix issuing authorities
  19. PID systems do not only differ technically, but also policy-wise.
  20. PID systems do not only differ technically, but also policy-wise.