SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
DOIs, provenance & vocabularies
Nicholas Car
Data Architect
nicholas.car@ga.gov.au
DOIs, Provenance & Vocabs
Outline
Three different extensions to regular GN use:
1. DOI and other identifier use
2. Provenance formulation and recording
3. Vocabulary use
DOIs, Provenance & Vocabs
DOIs, other identifiers and GA
64af9ff3-71dd-431a-bc94-9d2280acef79
DOIs and other identifiers
• GN uses UUIDs for records
• Strengths:
• Universally unique so:
• Able to be generated by or outside GN
• Transferable
• Indefinitely stable
DOIs, Provenance & Vocabs
DOIs and other identifiers
• GN uses UUIDs for records
• Strengths:
• Universally unique so:
• Able to be generated by or outside GN
• Transferable
• Indefinitely stable
 Alex can generate catalogue records using custom code and
post them into GA’s eCat. He can generate the UUIDs rather
than have eCat do it so he can know what they are before
submission.
DOIs, Provenance & Vocabs
DOIs and other identifiers
• GN uses UUIDs for records
• Strengths:
• Universally unique so:
• Able to be generated by or outside GN
• Transferable
• Indefinitely stable
 Alex can generate catalogue records using custom code and
post them into GA’s eCat. He can generate the UUIDs rather
than have eCat do it so he can know what they are before
submission.
 Jingbo can move records between catalogues at the NCI and
still use the same UUIDs for them
DOIs, Provenance & Vocabs
DOIs and other identifiers
• GN uses UUIDs for records
• Strengths:
• Universally unique so:
• Able to be generated by or outside GN
• Transferable
• Indefinitely stable
• Weaknesses:
• Not meaningful
• Not part of an identifier scheme
• Not resolvable by themselves
DOIs, Provenance & Vocabs
DOIs and other identifiers
• GN uses UUIDs for records
• Weaknesses:
• Not meaningful
• Not part of an identifier scheme
• Not resolvable by themselves
 data.gov.au, not using GN, provides UUIDs and meaningful
aliases for datasets, e.g.
 “Offshore reconnaissance geophysical techniques”
 http://data.gov.au/dataset/cdecf261-84a7-4911-a645-
2d7113e97d0b
 http://data.gov.au/dataset/offshore-reconnaissance-
geophysical-techniques
DOIs, Provenance & Vocabs
DOIs and other identifiers
• What are DOIs?
• a persistent identifier used to uniquely identify digital
objects, standardized by the ISO
• Uses the Handle network: highly persistent
• Popular and widely understood
• Has many convenience resolver systems, e.g.
https://doi.org/{DOI}
(https://doi.org/10.4225/25/58a3ff6e07d21)
• IGSNs are another DOI-like identifier
DOIs, Provenance & Vocabs
DOIs and other identifiers
• GA uses DOIs for important datasets and our own eCat IDs
for all datasets, e.g.:
• “Radiometric Thorium Equivalent grid of Warrachie, SA”
• UUID: 64af9ff3-71dd-431a-bc94-9d2280acef79
• eCatID: 106850
• Our landing page: http://www.ga.gov.au/metadata-
gateway/metadata/record/106850
• DOI: https://doi.org/10.4225/25/58a3ff6e07d21
DOIs, Provenance & Vocabs
GA’s DOI directions
• Our eCat ID will remain our authoritative ID
• Due to their embedded presence & simplicity
• GN configured to mint them
• We will promote eCat IDs & other IDs like DOIs, not UUIDs
• GN landing page’s “Permalink” button will reveal a DOI
• If it exists for a record
• If not, an eCat-based URI including the eCat ID
• UUIDs only used under the hood
• For GN functions like crosslinks
• We may support other ID schema in the future, like IGSNs
• We require architecture outside GN for URI ID redirection
DOIs, Provenance & Vocabs
Provenance
GA’s provenance model
• We use PROV
DOIs, Provenance & Vocabs
GA’s provenance model
• We use PROV
DOIs, Provenance & Vocabs
GA’s provenance model
• We use PROV
• We do not use ISO19115 Lineage
• Designed for satellite data processing
• Limited to history of the catalogued item only
• Not database/graph (de-normalised wrt many objects)
DOIs, Provenance & Vocabs
GA’s provenance model
• We use PROV
• We do not use ISO19115 Lineage
• Some provenance stored in our GN eCat
• We also link across multiple systems
• Example: GN  ARGUS
• Datasets  Surveys’ metadata online
DOIs, Provenance & Vocabs
GA’s provenance model
• We use PROV
• We do not use ISO19115 Lineage
• Some provenance stored in our GN eCat
• We also link across multiple systems
• We have had to define our dataset  dataset provenance
relationships in ISO19115:
• PROV: wasDerivedFrom
• ISO -1: AssociationTypeCode dependency
• PROV: wasRevisionOf
• ISO -1: AssociationTypeCode revisionOf
• PROV: hadPrimarySource
• ISO -1: AssociationTypeCode source
DOIs, Provenance & Vocabs
GA’s provenance model
• We use PROV
• We do not use ISO19115 Lineage
• Some provenance stored in our GN eCat
• We also link across multiple systems
• We have had to define our dataset  dataset provenance
relationships in ISO19115
• We can have Dataset  other thing relationships
• ARGUS example:
• PROV: Dataset prov:wasGeneratedBy Activity
• ISO -1: Dataset ? Activity (not in GN)
DOIs, Provenance & Vocabs
Vocabularies
Vocabularies
• Items in GN stored with keywords and the thesaurus they
come from:
DOIs, Provenance & Vocabs
<mri:descriptiveKeywords>
<mri:MD_Keywords>
<mri:keyword>
<gco:CharacterString>Offshore Areas</gco:CharacterString>
</mri:keyword>
<mri:type>
<mri:MD_KeywordTypeCode
codeList="http://asdd.ga.gov.au/asdd/profileinfo/
gmxCodelists.xml#MD_KeywordTypeCode"
codeListValue="theme">
theme
</mri:MD_KeywordTypeCode>
</mri:type>
</mri:MD_Keywords>
</mri:descriptiveKeywords>
Vocabularies
• Items in GN stored with keywords and the thesaurus they
come from:
DOIs, Provenance & Vocabs
<mri:descriptiveKeywords>
<mri:MD_Keywords>
<mri:keyword>
<gco:CharacterString>Earth Sciences</gco:CharacterString>
</mri:keyword>
<mri:thesaurusName>
<cit:CI_Citation>
<cit:title>
<gco:CharacterString>
Australian and New Zealand Standard Research Classification
(ANZSRC)
</gco:CharacterString>
</cit:title>
...
Vocabularies
• Items in GN stored with keywords and the thesaurus they
come from:
DOIs, Provenance & Vocabs
...
<cit:CI_OnlineResource>
<cit:linkage>
<gco:CharacterString>
http://www.abs.gov.au/ausstats/abs@.nsf/mf/1297.0
</gco:CharacterString>
</cit:linkage>
</cit:CI_OnlineResource>
...
Vocabularies
• Items in GN stored with keywords and the thesaurus they
come from
• GA is moving to using online SKOS-based vocabs for all code
lists
• E.g. “GA Data Classification”
• Broad GA categorisation for all data
• Will be compulsory, as ANZSRC, enforced by GN
• Can use specialised terms in other vocabs
• GN will offer term selection
• Live from online voc, not stored XML
DOIs, Provenance & Vocabs
Vocabularies
• Items in GN stored with keywords and the thesaurus they
come from
• GA is moving to using online SKOS-based vocabs for all code
lists
• We are keen to work with others testing GN/SPARQL
service integration
DOIs, Provenance & Vocabs
Vocabularies
• Items in GN stored with keywords and the thesaurus they
come from
• GA is moving to using online SKOS-based vocabs for all code
lists
• Remediation of existing keywords anticipated
• Automated KW testing for term tidy-up
• Abstract text mining with Natural Language Processing to
add to KWs
• Bulk addition, based on business knowledge of record
data
• E.g. thematic tagging based on GA section
DOIs, Provenance & Vocabs
Vocabularies
• Items in GN stored with keywords and the thesaurus they
come from
• GA is moving to using online SKOS-based vocabs for all code
lists
• Remediation of existing keywords anticipated
• Automated KW testing for term tidy-up
• Abstract text mining with Natural Language Processing to
add to KWs
• Bulk addition, based on business knowledge of record
data
• Reverse vocab application
• Existing free text terms  vocabs
DOIs, Provenance & Vocabs
Vocabularies
• Items in GN stored with keywords and the thesaurus they
come from
• GA is moving to using online SKOS-based vocabs for all code
lists
• Remediation of existing keywords anticipated
• We will be registering our vocabs themselves as datasets in
eCat!
DOIs, Provenance & Vocabs
Afterword
• Lots of extension work at GA using GN
• Inter systems linking growing
• Semantic Richness beyond ISO19115 growing
• GN still the only catalogue system for the foreseeable future
• Other GN initiatives at GA, for another CoP meeting!
DOIs, Provenance & Vocabs

Contenu connexe

Similaire à DOIs, provenance & vocabularies - Nicholas Car (CSIRO)

EPAS + Cloud = Oracle Compatible Postgres in Minutes
EPAS + Cloud = Oracle Compatible Postgres in MinutesEPAS + Cloud = Oracle Compatible Postgres in Minutes
EPAS + Cloud = Oracle Compatible Postgres in MinutesEDB
 
Which postgres is_right_for_me_20130517
Which postgres is_right_for_me_20130517Which postgres is_right_for_me_20130517
Which postgres is_right_for_me_20130517EDB
 
Introduction to Microservices with Docker and Kubernetes
Introduction to Microservices with Docker and KubernetesIntroduction to Microservices with Docker and Kubernetes
Introduction to Microservices with Docker and KubernetesDavid Charles
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
Managing Changes to the Database Across the Project Life Cycle (presented by ...
Managing Changes to the Database Across the Project Life Cycle (presented by ...Managing Changes to the Database Across the Project Life Cycle (presented by ...
Managing Changes to the Database Across the Project Life Cycle (presented by ...eZ Systems
 
Managing changes to eZPublish Database
Managing changes to eZPublish DatabaseManaging changes to eZPublish Database
Managing changes to eZPublish DatabaseGaetano Giunta
 
How to obtain the Cloudera Data Engineer Certification
How to obtain the Cloudera Data Engineer CertificationHow to obtain the Cloudera Data Engineer Certification
How to obtain the Cloudera Data Engineer Certificationelephantscale
 
Middleware in Golang: InVision's Rye
Middleware in Golang: InVision's RyeMiddleware in Golang: InVision's Rye
Middleware in Golang: InVision's RyeCale Hoopes
 
Building Software Backend (Web API)
Building Software Backend (Web API)Building Software Backend (Web API)
Building Software Backend (Web API)Alexander Goida
 
Globe global search system oer asia_chibajapan_2012_10_15
Globe global search system oer asia_chibajapan_2012_10_15Globe global search system oer asia_chibajapan_2012_10_15
Globe global search system oer asia_chibajapan_2012_10_15FBergeron
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Content Registration at Crossref - LIVE Bangkok
Content Registration at Crossref - LIVE BangkokContent Registration at Crossref - LIVE Bangkok
Content Registration at Crossref - LIVE BangkokCrossref
 
Building the Global Open Knowledgebase (ER&L 2013)
Building the Global Open Knowledgebase (ER&L 2013)Building the Global Open Knowledgebase (ER&L 2013)
Building the Global Open Knowledgebase (ER&L 2013)GOKb Project
 
DevOps by examples @ devopsheroes 2016
DevOps by examples @ devopsheroes 2016DevOps by examples @ devopsheroes 2016
DevOps by examples @ devopsheroes 2016Giulio Vian
 
EAD3 Progress Report 2014-08-13
EAD3 Progress Report 2014-08-13EAD3 Progress Report 2014-08-13
EAD3 Progress Report 2014-08-13Michael Rush
 
Crossref Content Registration - LIVE Mumbai
Crossref Content Registration - LIVE MumbaiCrossref Content Registration - LIVE Mumbai
Crossref Content Registration - LIVE MumbaiCrossref
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...Redis Labs
 

Similaire à DOIs, provenance & vocabularies - Nicholas Car (CSIRO) (20)

EPAS + Cloud = Oracle Compatible Postgres in Minutes
EPAS + Cloud = Oracle Compatible Postgres in MinutesEPAS + Cloud = Oracle Compatible Postgres in Minutes
EPAS + Cloud = Oracle Compatible Postgres in Minutes
 
Which postgres is_right_for_me_20130517
Which postgres is_right_for_me_20130517Which postgres is_right_for_me_20130517
Which postgres is_right_for_me_20130517
 
Introduction to Microservices with Docker and Kubernetes
Introduction to Microservices with Docker and KubernetesIntroduction to Microservices with Docker and Kubernetes
Introduction to Microservices with Docker and Kubernetes
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Managing Changes to the Database Across the Project Life Cycle (presented by ...
Managing Changes to the Database Across the Project Life Cycle (presented by ...Managing Changes to the Database Across the Project Life Cycle (presented by ...
Managing Changes to the Database Across the Project Life Cycle (presented by ...
 
Managing changes to eZPublish Database
Managing changes to eZPublish DatabaseManaging changes to eZPublish Database
Managing changes to eZPublish Database
 
How to obtain the Cloudera Data Engineer Certification
How to obtain the Cloudera Data Engineer CertificationHow to obtain the Cloudera Data Engineer Certification
How to obtain the Cloudera Data Engineer Certification
 
Middleware in Golang: InVision's Rye
Middleware in Golang: InVision's RyeMiddleware in Golang: InVision's Rye
Middleware in Golang: InVision's Rye
 
Building Software Backend (Web API)
Building Software Backend (Web API)Building Software Backend (Web API)
Building Software Backend (Web API)
 
Second Thoughts about Metadata Standards for Data
Second Thoughts about Metadata Standards for DataSecond Thoughts about Metadata Standards for Data
Second Thoughts about Metadata Standards for Data
 
Globe global search system oer asia_chibajapan_2012_10_15
Globe global search system oer asia_chibajapan_2012_10_15Globe global search system oer asia_chibajapan_2012_10_15
Globe global search system oer asia_chibajapan_2012_10_15
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Content Registration at Crossref - LIVE Bangkok
Content Registration at Crossref - LIVE BangkokContent Registration at Crossref - LIVE Bangkok
Content Registration at Crossref - LIVE Bangkok
 
Building the Global Open Knowledgebase (ER&L 2013)
Building the Global Open Knowledgebase (ER&L 2013)Building the Global Open Knowledgebase (ER&L 2013)
Building the Global Open Knowledgebase (ER&L 2013)
 
DevOps by examples @ devopsheroes 2016
DevOps by examples @ devopsheroes 2016DevOps by examples @ devopsheroes 2016
DevOps by examples @ devopsheroes 2016
 
EAD3 Progress Report 2014-08-13
EAD3 Progress Report 2014-08-13EAD3 Progress Report 2014-08-13
EAD3 Progress Report 2014-08-13
 
Crossref Content Registration - LIVE Mumbai
Crossref Content Registration - LIVE MumbaiCrossref Content Registration - LIVE Mumbai
Crossref Content Registration - LIVE Mumbai
 
G3 talk rld_2
G3 talk rld_2G3 talk rld_2
G3 talk rld_2
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
 

Plus de ARDC

Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADAARDC
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and StandardsARDC
 
Data Sharing and Release Legislation
Data Sharing and Release Legislation   Data Sharing and Release Legislation
Data Sharing and Release Legislation ARDC
 
Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)ARDC
 
Investigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveInvestigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveARDC
 
NCRIS and the health domain
NCRIS and the health domainNCRIS and the health domain
NCRIS and the health domainARDC
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataARDC
 
Clinical trials data sharing
Clinical trials data sharingClinical trials data sharing
Clinical trials data sharingARDC
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studiesARDC
 
Introduction to vision and scope
Introduction to vision and scopeIntroduction to vision and scope
Introduction to vision and scopeARDC
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things dataARDC
 
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC
 
Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128ARDC
 
Research data management and sharing of medical data
Research data management and sharing of medical dataResearch data management and sharing of medical data
Research data management and sharing of medical dataARDC
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataARDC
 
Applying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesApplying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesARDC
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018ARDC
 
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintReady, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintARDC
 
How FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataHow FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataARDC
 
Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018ARDC
 

Plus de ARDC (20)

Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADA
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and Standards
 
Data Sharing and Release Legislation
Data Sharing and Release Legislation   Data Sharing and Release Legislation
Data Sharing and Release Legislation
 
Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)
 
Investigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveInvestigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspective
 
NCRIS and the health domain
NCRIS and the health domainNCRIS and the health domain
NCRIS and the health domain
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research data
 
Clinical trials data sharing
Clinical trials data sharingClinical trials data sharing
Clinical trials data sharing
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studies
 
Introduction to vision and scope
Introduction to vision and scopeIntroduction to vision and scope
Introduction to vision and scope
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
 
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
 
Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128
 
Research data management and sharing of medical data
Research data management and sharing of medical dataResearch data management and sharing of medical data
Research data management and sharing of medical data
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) data
 
Applying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesApplying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and Challenges
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018
 
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintReady, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
 
How FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataHow FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of data
 
Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018
 

Dernier

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 

Dernier (20)

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 

DOIs, provenance & vocabularies - Nicholas Car (CSIRO)

  • 1. DOIs, provenance & vocabularies Nicholas Car Data Architect nicholas.car@ga.gov.au DOIs, Provenance & Vocabs
  • 2. Outline Three different extensions to regular GN use: 1. DOI and other identifier use 2. Provenance formulation and recording 3. Vocabulary use DOIs, Provenance & Vocabs
  • 3. DOIs, other identifiers and GA 64af9ff3-71dd-431a-bc94-9d2280acef79
  • 4. DOIs and other identifiers • GN uses UUIDs for records • Strengths: • Universally unique so: • Able to be generated by or outside GN • Transferable • Indefinitely stable DOIs, Provenance & Vocabs
  • 5. DOIs and other identifiers • GN uses UUIDs for records • Strengths: • Universally unique so: • Able to be generated by or outside GN • Transferable • Indefinitely stable  Alex can generate catalogue records using custom code and post them into GA’s eCat. He can generate the UUIDs rather than have eCat do it so he can know what they are before submission. DOIs, Provenance & Vocabs
  • 6. DOIs and other identifiers • GN uses UUIDs for records • Strengths: • Universally unique so: • Able to be generated by or outside GN • Transferable • Indefinitely stable  Alex can generate catalogue records using custom code and post them into GA’s eCat. He can generate the UUIDs rather than have eCat do it so he can know what they are before submission.  Jingbo can move records between catalogues at the NCI and still use the same UUIDs for them DOIs, Provenance & Vocabs
  • 7. DOIs and other identifiers • GN uses UUIDs for records • Strengths: • Universally unique so: • Able to be generated by or outside GN • Transferable • Indefinitely stable • Weaknesses: • Not meaningful • Not part of an identifier scheme • Not resolvable by themselves DOIs, Provenance & Vocabs
  • 8. DOIs and other identifiers • GN uses UUIDs for records • Weaknesses: • Not meaningful • Not part of an identifier scheme • Not resolvable by themselves  data.gov.au, not using GN, provides UUIDs and meaningful aliases for datasets, e.g.  “Offshore reconnaissance geophysical techniques”  http://data.gov.au/dataset/cdecf261-84a7-4911-a645- 2d7113e97d0b  http://data.gov.au/dataset/offshore-reconnaissance- geophysical-techniques DOIs, Provenance & Vocabs
  • 9. DOIs and other identifiers • What are DOIs? • a persistent identifier used to uniquely identify digital objects, standardized by the ISO • Uses the Handle network: highly persistent • Popular and widely understood • Has many convenience resolver systems, e.g. https://doi.org/{DOI} (https://doi.org/10.4225/25/58a3ff6e07d21) • IGSNs are another DOI-like identifier DOIs, Provenance & Vocabs
  • 10. DOIs and other identifiers • GA uses DOIs for important datasets and our own eCat IDs for all datasets, e.g.: • “Radiometric Thorium Equivalent grid of Warrachie, SA” • UUID: 64af9ff3-71dd-431a-bc94-9d2280acef79 • eCatID: 106850 • Our landing page: http://www.ga.gov.au/metadata- gateway/metadata/record/106850 • DOI: https://doi.org/10.4225/25/58a3ff6e07d21 DOIs, Provenance & Vocabs
  • 11. GA’s DOI directions • Our eCat ID will remain our authoritative ID • Due to their embedded presence & simplicity • GN configured to mint them • We will promote eCat IDs & other IDs like DOIs, not UUIDs • GN landing page’s “Permalink” button will reveal a DOI • If it exists for a record • If not, an eCat-based URI including the eCat ID • UUIDs only used under the hood • For GN functions like crosslinks • We may support other ID schema in the future, like IGSNs • We require architecture outside GN for URI ID redirection DOIs, Provenance & Vocabs
  • 13. GA’s provenance model • We use PROV DOIs, Provenance & Vocabs
  • 14. GA’s provenance model • We use PROV DOIs, Provenance & Vocabs
  • 15. GA’s provenance model • We use PROV • We do not use ISO19115 Lineage • Designed for satellite data processing • Limited to history of the catalogued item only • Not database/graph (de-normalised wrt many objects) DOIs, Provenance & Vocabs
  • 16. GA’s provenance model • We use PROV • We do not use ISO19115 Lineage • Some provenance stored in our GN eCat • We also link across multiple systems • Example: GN  ARGUS • Datasets  Surveys’ metadata online DOIs, Provenance & Vocabs
  • 17. GA’s provenance model • We use PROV • We do not use ISO19115 Lineage • Some provenance stored in our GN eCat • We also link across multiple systems • We have had to define our dataset  dataset provenance relationships in ISO19115: • PROV: wasDerivedFrom • ISO -1: AssociationTypeCode dependency • PROV: wasRevisionOf • ISO -1: AssociationTypeCode revisionOf • PROV: hadPrimarySource • ISO -1: AssociationTypeCode source DOIs, Provenance & Vocabs
  • 18. GA’s provenance model • We use PROV • We do not use ISO19115 Lineage • Some provenance stored in our GN eCat • We also link across multiple systems • We have had to define our dataset  dataset provenance relationships in ISO19115 • We can have Dataset  other thing relationships • ARGUS example: • PROV: Dataset prov:wasGeneratedBy Activity • ISO -1: Dataset ? Activity (not in GN) DOIs, Provenance & Vocabs
  • 20. Vocabularies • Items in GN stored with keywords and the thesaurus they come from: DOIs, Provenance & Vocabs <mri:descriptiveKeywords> <mri:MD_Keywords> <mri:keyword> <gco:CharacterString>Offshore Areas</gco:CharacterString> </mri:keyword> <mri:type> <mri:MD_KeywordTypeCode codeList="http://asdd.ga.gov.au/asdd/profileinfo/ gmxCodelists.xml#MD_KeywordTypeCode" codeListValue="theme"> theme </mri:MD_KeywordTypeCode> </mri:type> </mri:MD_Keywords> </mri:descriptiveKeywords>
  • 21. Vocabularies • Items in GN stored with keywords and the thesaurus they come from: DOIs, Provenance & Vocabs <mri:descriptiveKeywords> <mri:MD_Keywords> <mri:keyword> <gco:CharacterString>Earth Sciences</gco:CharacterString> </mri:keyword> <mri:thesaurusName> <cit:CI_Citation> <cit:title> <gco:CharacterString> Australian and New Zealand Standard Research Classification (ANZSRC) </gco:CharacterString> </cit:title> ...
  • 22. Vocabularies • Items in GN stored with keywords and the thesaurus they come from: DOIs, Provenance & Vocabs ... <cit:CI_OnlineResource> <cit:linkage> <gco:CharacterString> http://www.abs.gov.au/ausstats/abs@.nsf/mf/1297.0 </gco:CharacterString> </cit:linkage> </cit:CI_OnlineResource> ...
  • 23. Vocabularies • Items in GN stored with keywords and the thesaurus they come from • GA is moving to using online SKOS-based vocabs for all code lists • E.g. “GA Data Classification” • Broad GA categorisation for all data • Will be compulsory, as ANZSRC, enforced by GN • Can use specialised terms in other vocabs • GN will offer term selection • Live from online voc, not stored XML DOIs, Provenance & Vocabs
  • 24. Vocabularies • Items in GN stored with keywords and the thesaurus they come from • GA is moving to using online SKOS-based vocabs for all code lists • We are keen to work with others testing GN/SPARQL service integration DOIs, Provenance & Vocabs
  • 25. Vocabularies • Items in GN stored with keywords and the thesaurus they come from • GA is moving to using online SKOS-based vocabs for all code lists • Remediation of existing keywords anticipated • Automated KW testing for term tidy-up • Abstract text mining with Natural Language Processing to add to KWs • Bulk addition, based on business knowledge of record data • E.g. thematic tagging based on GA section DOIs, Provenance & Vocabs
  • 26. Vocabularies • Items in GN stored with keywords and the thesaurus they come from • GA is moving to using online SKOS-based vocabs for all code lists • Remediation of existing keywords anticipated • Automated KW testing for term tidy-up • Abstract text mining with Natural Language Processing to add to KWs • Bulk addition, based on business knowledge of record data • Reverse vocab application • Existing free text terms  vocabs DOIs, Provenance & Vocabs
  • 27. Vocabularies • Items in GN stored with keywords and the thesaurus they come from • GA is moving to using online SKOS-based vocabs for all code lists • Remediation of existing keywords anticipated • We will be registering our vocabs themselves as datasets in eCat! DOIs, Provenance & Vocabs
  • 28.
  • 29. Afterword • Lots of extension work at GA using GN • Inter systems linking growing • Semantic Richness beyond ISO19115 growing • GN still the only catalogue system for the foreseeable future • Other GN initiatives at GA, for another CoP meeting! DOIs, Provenance & Vocabs