SlideShare a Scribd company logo
1 of 22
How to describe a dataset.
Interoperability issues
Valeria Pesce
Global Forum on Agricultural Research
Definition of “dataset”
The term “dataset” has been defined in several ways, all of which
further specify or extend the basic concept of “a collection of data”.
Definition given by the W3C Government Linked Data Working Group:
A dataset is “a collection of data, published or curated by a
single source, and available for access or download in one or
more formats”
The “instances” of the dataset “available for access or
download in one or more formats” are called
“distributions”. A dataset can have many distributions.
Examples of distributions include a downloadable CSV
file, an API or an RSS feed.
Definition of “interoperability”
“Data interoperability is a feature of datasets -
and of information services that give access to
datasets - whereby data can easily be retrieved,
processed, re-used, and re-packaged
(“operated”) by other systems.”
Interim Proceedings of International Expert Consultation on “Building the CIARD
Framework for Data and Information Sharing”, CIARD (2011)
software applications
datasets have to be machine-readable
What applications need
Besides information common to any type of resource (name, author /
owner, date…), applications have to find enough metadata about
datasets to understand:
1. the specific coverage of the dataset (type of data, thematic
coverage, geographic coverage)
2. the necessary technical specifications to retrieve and parse a
distribution of the dataset (format, protocol etc.)
3. the conditions for re-use (rights, licenses)
4. the “dimensions” covered by the dataset (e.g. temperature,
time, salinity, gene, coordinates)
5. the semantics of the dimensions (units of measure, time
granularity, syntax, reference taxonomies)
Partial answers in existing vocabularies
• DCAT vocabulary
– RDF vocabulary for describing any dataset
– Datasets can be standalone or part of a “catalog”
– Datasets are accessible through several “distributions”
– “Other, complementary vocabularies may be used together with DCAT to provide
more detailed format-specific information. For example, properties from the VoID
vocabulary can be used if that dataset is in RDF format.”
• VOID vocabulary
– RDF vocabulary for expressing metadata about RDF datasets
• (SDMX ) DataCube vocabulary
– RDF vocabulary for describing statistical datasets
– Useful for attaching metadata about the “data structure” to any dataset that
doesn’t follow a known published standard
Coverage of a dataset
• This can be handled by common Dublin Core properties like subject and
coverage.
• DCAT re-uses these DC properties.
Issue 1: No specific property for the type of data covered in a dataset
The values of these properties have to be understood by machines:
- The value should be standardized, possibly a URI
- The URI should be de-referenceable to a thing
- The thing should be part of an authority list / taxonomy
Issue 3: There is no authority vocabulary for types of data
Issue 1
Issue 2
Conditions for re-use
• DCAT re-uses the license DC property at the level of
distributions
• DCAT re-uses the rights DC property at bith the level
of dataset and the level of distribution
dc:license > dc:LicenseDocument
dc:rights > dc:RightsStatement
W3C DCAT > DCAT AP
DCAT core
Technical properties
The necessary technical specifications to retrieve and
parse a distribution of a dataset (format, protocol etc.)
• DCAT re-uses the DC format property;
Issue No property for protocol
The values of these properties have to be understood by
machines, possibly URIs:
Issue2 No comprehensive RDF authority lists for these
values (partial: DC Types; non-RDF: IANA types)
Issue 1
Issue 2
VOID
VOID can help with the protocol metadata but only for
RDF datasets:
- Property for data dump: dataDump
- Property for SPARQL endpoint: sparqlEndpoint
“Dimensions” and their semantics
DCAT does not describe the dimensions of a dataset,
except for a reference to a standard if the dataset
dimensions can be defined by a formalized standard
(e.g. an XML schema or an RDF vocabulary or an ISO
standard)
dc:conformsTo > dc:Standard
Statistical vocabularies can help
with the description of the dimensions
SDMX: data structure and dimensions
SDMX: Statistical Data and Metadata Exchange
The data structure definition is a description of all the metadata needed to
understand the data set structure.
This includes:
• identification of the dimensions (Dimension) according to standard
statistical terminology,
• the key structure (KeyDescriptor),
• the code-lists (CodeList) that enumerate valid values for each dimension
• coded attribute (CodedAttribute), information about whether attributes
are required or optional and coded or free text.
Given the metadata in the data structure definition, all of the data in the
data set becomes meaningful.
DataCube: simplified SDMX in RDF
DataCube: simplified SDMX in RDF
Reference to a concept scheme
DataCube: simplified SDMX in RDF
“Semantic role” of the property
DataCube: simplified SDMX in RDF
“Semantic role” of
Combining different vocabularies
Name
URL
Owner
Content type
Topic(s)
Language
Metadata set(s)
Data structure
Distribution(s)
[…]
DATASET
Name
Protocol
Endpoint URL
Media type
Format
Size
DISTRIBUTION
DCAT model
Dimensions
Attributes
Measures
Value lists
DATA STRUCTURE
DataCube model
Catalog: the directory
Vocabulary(ies)
SPARQL endpoint
Data dump
Serialization format
Number of triples
RDF dataset info
VOID properties
If one or more known
published metadata sets
are used, just fill
“metadata set(s)”,
otherwise link to a “data
structure” with custom
“dimensions”
IF media type has RDF
or SPARQL response
Tools for managing dataset metadata
• CKAN maintained by the Open Knowledge Foundation
Uses most of DCAT. Doesn’t describe dimensions.
Also provides a global dataset hub called the Datahub
• Dataverse created by Harvard University
Uses a custom vocabulary. Doesn’t describe dimensions.
• Commercial solutions
• Repositories and catalogs:
OpenAIRE, DataCite (using re3data to search repositories) and Dryad
use their own vocabularies.
• CIARD RING
Uses full DCAT AP with some extended properties (protocol, data
type) and local taxonomies with URIs mapped when possible to
authorities.
Next steps: adding DataCube properties for dimensions.
Major outstanding issues
• Some missing properties in existing vocabularies:
 approach vocabulary owners OR extend vocabularies
• Missing vocabularies for protocols, formats
 approach standardizing bodies?
 perhaps specific dataset formats?
• Need for more standardized semantics for
dimensions:
 Joint discussions with the RDA Data Type Registries WG?
• Lack of interoperability metadata in existing tools
References
• W3C DCAT: http://www.w3.org/TR/vocab-dcat/
• DCAT AP: https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat-
application-profile-data-portals-europe-final
• DataCube: http://purl.org/linked-data/cube#
• VOID: http://rdfs.org/ns/void-guide
• VIVO Datastar: http://sourceforge.net/projects/vivo/files/Datastar%20ontology/
• CERIF for datasets: https://cerif4datasets.wordpress.com/c4d-deliverables/
• CKAN: http://ckan.org/
• Datahub: http://datahub.io/
• DataCite: http://search.datacite.org/ui?q=subject%3Aagriculture
• Re3data: http://www.re3data.org
• Dryad: http://datadryad.org/
• OpenAIRE: https://www.openaire.eu/
Thank you
Valeria Pesce
Global Forum on Agricultural Research

More Related Content

What's hot

Absolute syntax
Absolute syntax Absolute syntax
Absolute syntax PALLAB DAS
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyDebashisnaskar
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to MetadataEUDAT
 
Metadata harvesting
Metadata harvestingMetadata harvesting
Metadata harvestingAndrewLIS688
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingHealth Catalyst
 
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.`Shweta Bhavsar
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareSkillspeed
 
Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata HarvestingNikesh Narayanan
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Hritika Raj
 
LIBRARY_CLASSIFICATION_-_ASSIGNMENT.pptx
LIBRARY_CLASSIFICATION_-_ASSIGNMENT.pptxLIBRARY_CLASSIFICATION_-_ASSIGNMENT.pptx
LIBRARY_CLASSIFICATION_-_ASSIGNMENT.pptxMonalisaMathan1
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architectureuncleRhyme
 

What's hot (20)

Absolute syntax
Absolute syntax Absolute syntax
Absolute syntax
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to Metadata
 
Metadata harvesting
Metadata harvestingMetadata harvesting
Metadata harvesting
 
Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
 
Metadata ppt
Metadata pptMetadata ppt
Metadata ppt
 
OAI-PMH
OAI-PMHOAI-PMH
OAI-PMH
 
Ppt
PptPpt
Ppt
 
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in Healthcare
 
Dublin core Presentation
Dublin core PresentationDublin core Presentation
Dublin core Presentation
 
Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
Knowledge organization system
Knowledge organization systemKnowledge organization system
Knowledge organization system
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
LIBRARY_CLASSIFICATION_-_ASSIGNMENT.pptx
LIBRARY_CLASSIFICATION_-_ASSIGNMENT.pptxLIBRARY_CLASSIFICATION_-_ASSIGNMENT.pptx
LIBRARY_CLASSIFICATION_-_ASSIGNMENT.pptx
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 

Viewers also liked

Attivio Predictions 2017
Attivio Predictions 2017Attivio Predictions 2017
Attivio Predictions 2017Attivio
 
Semantic challenges in sharing dataset metadata and creating federated datase...
Semantic challenges in sharing dataset metadata and creating federated datase...Semantic challenges in sharing dataset metadata and creating federated datase...
Semantic challenges in sharing dataset metadata and creating federated datase...Valeria Pesce
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesValeria Pesce
 
Data Modeling & Data Integration
Data Modeling & Data IntegrationData Modeling & Data Integration
Data Modeling & Data IntegrationDATAVERSITY
 
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMark Kromer
 
A global linked and open data infrastructure for agricultural development
A global linked and open data infrastructure for agricultural developmentA global linked and open data infrastructure for agricultural development
A global linked and open data infrastructure for agricultural developmentValeria Pesce
 
Cognitive Search for Knowledge Management
Cognitive Search for Knowledge ManagementCognitive Search for Knowledge Management
Cognitive Search for Knowledge ManagementAttivio
 
Data discovery through federated dataset catalogs
Data discovery through federated dataset catalogsData discovery through federated dataset catalogs
Data discovery through federated dataset catalogsValeria Pesce
 
Semantics for food and agriculture: the GODAN Action map of data standards
Semantics for food and agriculture: the GODAN Action map of data standardsSemantics for food and agriculture: the GODAN Action map of data standards
Semantics for food and agriculture: the GODAN Action map of data standardsValeria Pesce
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesHortonworks
 
Inventory of data standards for food & agriculture
Inventory of data standards for food & agricultureInventory of data standards for food & agriculture
Inventory of data standards for food & agricultureValeria Pesce
 
Sharing Agricultural Events Information: When and where is that workshop?
Sharing Agricultural Events Information: When and where is that workshop?Sharing Agricultural Events Information: When and where is that workshop?
Sharing Agricultural Events Information: When and where is that workshop?Gauri Salokhe
 
The agINFRA Linked Data layer
The agINFRA Linked Data layerThe agINFRA Linked Data layer
The agINFRA Linked Data layerValeria Pesce
 
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Amazon Web Services
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Mark Tabladillo
 

Viewers also liked (15)

Attivio Predictions 2017
Attivio Predictions 2017Attivio Predictions 2017
Attivio Predictions 2017
 
Semantic challenges in sharing dataset metadata and creating federated datase...
Semantic challenges in sharing dataset metadata and creating federated datase...Semantic challenges in sharing dataset metadata and creating federated datase...
Semantic challenges in sharing dataset metadata and creating federated datase...
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabularies
 
Data Modeling & Data Integration
Data Modeling & Data IntegrationData Modeling & Data Integration
Data Modeling & Data Integration
 
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
 
A global linked and open data infrastructure for agricultural development
A global linked and open data infrastructure for agricultural developmentA global linked and open data infrastructure for agricultural development
A global linked and open data infrastructure for agricultural development
 
Cognitive Search for Knowledge Management
Cognitive Search for Knowledge ManagementCognitive Search for Knowledge Management
Cognitive Search for Knowledge Management
 
Data discovery through federated dataset catalogs
Data discovery through federated dataset catalogsData discovery through federated dataset catalogs
Data discovery through federated dataset catalogs
 
Semantics for food and agriculture: the GODAN Action map of data standards
Semantics for food and agriculture: the GODAN Action map of data standardsSemantics for food and agriculture: the GODAN Action map of data standards
Semantics for food and agriculture: the GODAN Action map of data standards
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial Services
 
Inventory of data standards for food & agriculture
Inventory of data standards for food & agricultureInventory of data standards for food & agriculture
Inventory of data standards for food & agriculture
 
Sharing Agricultural Events Information: When and where is that workshop?
Sharing Agricultural Events Information: When and where is that workshop?Sharing Agricultural Events Information: When and where is that workshop?
Sharing Agricultural Events Information: When and where is that workshop?
 
The agINFRA Linked Data layer
The agINFRA Linked Data layerThe agINFRA Linked Data layer
The agINFRA Linked Data layer
 
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 

Similar to How to describe a dataset. Interoperability issues

Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordMark Wilkinson
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsAhmad Assaf
 
The JISC DC Application Profiles: Some thoughts on requirements and scope
The JISC DC Application Profiles: Some thoughts on requirements and scopeThe JISC DC Application Profiles: Some thoughts on requirements and scope
The JISC DC Application Profiles: Some thoughts on requirements and scopeEduserv Foundation
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
 
DC-2008 Architecture Forum Open session
DC-2008 Architecture Forum Open sessionDC-2008 Architecture Forum Open session
DC-2008 Architecture Forum Open sessionMikael Nilsson
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET Journal
 
Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)mhb120
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataversevty
 
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesRichard.Sapon-White
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse vty
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Julie Allinson
 
Validation: Requirements and approaches
Validation: Requirements and approachesValidation: Requirements and approaches
Validation: Requirements and approachesDave Reynolds
 

Similar to How to describe a dataset. Interoperability issues (20)

Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data Portals
 
The JISC DC Application Profiles: Some thoughts on requirements and scope
The JISC DC Application Profiles: Some thoughts on requirements and scopeThe JISC DC Application Profiles: Some thoughts on requirements and scope
The JISC DC Application Profiles: Some thoughts on requirements and scope
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
 
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS TechnologiesEasily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
DC-2008 Architecture Forum Open session
DC-2008 Architecture Forum Open sessionDC-2008 Architecture Forum Open session
DC-2008 Architecture Forum Open session
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description Framework
 
Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemes
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft Roadmap
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 
General concepts: DDI
General concepts: DDIGeneral concepts: DDI
General concepts: DDI
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29
 
Validation: Requirements and approaches
Validation: Requirements and approachesValidation: Requirements and approaches
Validation: Requirements and approaches
 
No sql databases
No sql databasesNo sql databases
No sql databases
 

More from Valeria Pesce

Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...
Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...
Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...Valeria Pesce
 
Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...
Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...
Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...Valeria Pesce
 
Farmers' data rights - Some findings
Farmers' data rights - Some findingsFarmers' data rights - Some findings
Farmers' data rights - Some findingsValeria Pesce
 
The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agricultureValeria Pesce
 
Publishing Germplasm Vocabularies as Linked Data
Publishing Germplasm Vocabularies as Linked DataPublishing Germplasm Vocabularies as Linked Data
Publishing Germplasm Vocabularies as Linked DataValeria Pesce
 
VIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontologyVIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontologyValeria Pesce
 
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...Valeria Pesce
 
AgriVIVO. Fostering better networking and collaboration among researchers, re...
AgriVIVO. Fostering better networking and collaboration among researchers, re...AgriVIVO. Fostering better networking and collaboration among researchers, re...
AgriVIVO. Fostering better networking and collaboration among researchers, re...Valeria Pesce
 
AgriDrupal: general presentation
AgriDrupal: general presentationAgriDrupal: general presentation
AgriDrupal: general presentationValeria Pesce
 
Developing Agricultural Research Information Systems. The experience of the G...
Developing Agricultural Research Information Systems. The experience of the G...Developing Agricultural Research Information Systems. The experience of the G...
Developing Agricultural Research Information Systems. The experience of the G...Valeria Pesce
 
Information / software architectures based on Content Management Systems (CMS)
Information / software architectures based on Content Management Systems (CMS)Information / software architectures based on Content Management Systems (CMS)
Information / software architectures based on Content Management Systems (CMS)Valeria Pesce
 
The CIARD RING, an infrastructure for interoperability of agricultural resear...
The CIARD RING, an infrastructure for interoperability of agricultural resear...The CIARD RING, an infrastructure for interoperability of agricultural resear...
The CIARD RING, an infrastructure for interoperability of agricultural resear...Valeria Pesce
 
Libraries 2.0 and RSS
Libraries 2.0 and RSSLibraries 2.0 and RSS
Libraries 2.0 and RSSValeria Pesce
 
The Global ARD Web Ring
The Global ARD Web RingThe Global ARD Web Ring
The Global ARD Web RingValeria Pesce
 
The EGFAR web space: Using Web 2.0 technologies to electronically mimic GFAR
The EGFAR web space: Using Web 2.0 technologies to electronically mimic GFARThe EGFAR web space: Using Web 2.0 technologies to electronically mimic GFAR
The EGFAR web space: Using Web 2.0 technologies to electronically mimic GFARValeria Pesce
 

More from Valeria Pesce (16)

Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...
Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...
Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...
 
Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...
Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...
Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...
 
Farmers' data rights - Some findings
Farmers' data rights - Some findingsFarmers' data rights - Some findings
Farmers' data rights - Some findings
 
The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agriculture
 
Publishing Germplasm Vocabularies as Linked Data
Publishing Germplasm Vocabularies as Linked DataPublishing Germplasm Vocabularies as Linked Data
Publishing Germplasm Vocabularies as Linked Data
 
VIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontologyVIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontology
 
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
 
AgriVIVO. Fostering better networking and collaboration among researchers, re...
AgriVIVO. Fostering better networking and collaboration among researchers, re...AgriVIVO. Fostering better networking and collaboration among researchers, re...
AgriVIVO. Fostering better networking and collaboration among researchers, re...
 
AgriDrupal: general presentation
AgriDrupal: general presentationAgriDrupal: general presentation
AgriDrupal: general presentation
 
Developing Agricultural Research Information Systems. The experience of the G...
Developing Agricultural Research Information Systems. The experience of the G...Developing Agricultural Research Information Systems. The experience of the G...
Developing Agricultural Research Information Systems. The experience of the G...
 
Information / software architectures based on Content Management Systems (CMS)
Information / software architectures based on Content Management Systems (CMS)Information / software architectures based on Content Management Systems (CMS)
Information / software architectures based on Content Management Systems (CMS)
 
The CIARD RING, an infrastructure for interoperability of agricultural resear...
The CIARD RING, an infrastructure for interoperability of agricultural resear...The CIARD RING, an infrastructure for interoperability of agricultural resear...
The CIARD RING, an infrastructure for interoperability of agricultural resear...
 
Libraries 2.0 and RSS
Libraries 2.0 and RSSLibraries 2.0 and RSS
Libraries 2.0 and RSS
 
The Ciard RING
The Ciard RINGThe Ciard RING
The Ciard RING
 
The Global ARD Web Ring
The Global ARD Web RingThe Global ARD Web Ring
The Global ARD Web Ring
 
The EGFAR web space: Using Web 2.0 technologies to electronically mimic GFAR
The EGFAR web space: Using Web 2.0 technologies to electronically mimic GFARThe EGFAR web space: Using Web 2.0 technologies to electronically mimic GFAR
The EGFAR web space: Using Web 2.0 technologies to electronically mimic GFAR
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

How to describe a dataset. Interoperability issues

  • 1. How to describe a dataset. Interoperability issues Valeria Pesce Global Forum on Agricultural Research
  • 2. Definition of “dataset” The term “dataset” has been defined in several ways, all of which further specify or extend the basic concept of “a collection of data”. Definition given by the W3C Government Linked Data Working Group: A dataset is “a collection of data, published or curated by a single source, and available for access or download in one or more formats” The “instances” of the dataset “available for access or download in one or more formats” are called “distributions”. A dataset can have many distributions. Examples of distributions include a downloadable CSV file, an API or an RSS feed.
  • 3. Definition of “interoperability” “Data interoperability is a feature of datasets - and of information services that give access to datasets - whereby data can easily be retrieved, processed, re-used, and re-packaged (“operated”) by other systems.” Interim Proceedings of International Expert Consultation on “Building the CIARD Framework for Data and Information Sharing”, CIARD (2011) software applications datasets have to be machine-readable
  • 4. What applications need Besides information common to any type of resource (name, author / owner, date…), applications have to find enough metadata about datasets to understand: 1. the specific coverage of the dataset (type of data, thematic coverage, geographic coverage) 2. the necessary technical specifications to retrieve and parse a distribution of the dataset (format, protocol etc.) 3. the conditions for re-use (rights, licenses) 4. the “dimensions” covered by the dataset (e.g. temperature, time, salinity, gene, coordinates) 5. the semantics of the dimensions (units of measure, time granularity, syntax, reference taxonomies)
  • 5. Partial answers in existing vocabularies • DCAT vocabulary – RDF vocabulary for describing any dataset – Datasets can be standalone or part of a “catalog” – Datasets are accessible through several “distributions” – “Other, complementary vocabularies may be used together with DCAT to provide more detailed format-specific information. For example, properties from the VoID vocabulary can be used if that dataset is in RDF format.” • VOID vocabulary – RDF vocabulary for expressing metadata about RDF datasets • (SDMX ) DataCube vocabulary – RDF vocabulary for describing statistical datasets – Useful for attaching metadata about the “data structure” to any dataset that doesn’t follow a known published standard
  • 6. Coverage of a dataset • This can be handled by common Dublin Core properties like subject and coverage. • DCAT re-uses these DC properties. Issue 1: No specific property for the type of data covered in a dataset The values of these properties have to be understood by machines: - The value should be standardized, possibly a URI - The URI should be de-referenceable to a thing - The thing should be part of an authority list / taxonomy Issue 3: There is no authority vocabulary for types of data Issue 1 Issue 2
  • 7. Conditions for re-use • DCAT re-uses the license DC property at the level of distributions • DCAT re-uses the rights DC property at bith the level of dataset and the level of distribution dc:license > dc:LicenseDocument dc:rights > dc:RightsStatement
  • 8. W3C DCAT > DCAT AP
  • 10. Technical properties The necessary technical specifications to retrieve and parse a distribution of a dataset (format, protocol etc.) • DCAT re-uses the DC format property; Issue No property for protocol The values of these properties have to be understood by machines, possibly URIs: Issue2 No comprehensive RDF authority lists for these values (partial: DC Types; non-RDF: IANA types) Issue 1 Issue 2
  • 11. VOID VOID can help with the protocol metadata but only for RDF datasets: - Property for data dump: dataDump - Property for SPARQL endpoint: sparqlEndpoint
  • 12. “Dimensions” and their semantics DCAT does not describe the dimensions of a dataset, except for a reference to a standard if the dataset dimensions can be defined by a formalized standard (e.g. an XML schema or an RDF vocabulary or an ISO standard) dc:conformsTo > dc:Standard Statistical vocabularies can help with the description of the dimensions
  • 13. SDMX: data structure and dimensions SDMX: Statistical Data and Metadata Exchange The data structure definition is a description of all the metadata needed to understand the data set structure. This includes: • identification of the dimensions (Dimension) according to standard statistical terminology, • the key structure (KeyDescriptor), • the code-lists (CodeList) that enumerate valid values for each dimension • coded attribute (CodedAttribute), information about whether attributes are required or optional and coded or free text. Given the metadata in the data structure definition, all of the data in the data set becomes meaningful.
  • 15. DataCube: simplified SDMX in RDF Reference to a concept scheme
  • 16. DataCube: simplified SDMX in RDF “Semantic role” of the property
  • 17. DataCube: simplified SDMX in RDF “Semantic role” of
  • 18. Combining different vocabularies Name URL Owner Content type Topic(s) Language Metadata set(s) Data structure Distribution(s) […] DATASET Name Protocol Endpoint URL Media type Format Size DISTRIBUTION DCAT model Dimensions Attributes Measures Value lists DATA STRUCTURE DataCube model Catalog: the directory Vocabulary(ies) SPARQL endpoint Data dump Serialization format Number of triples RDF dataset info VOID properties If one or more known published metadata sets are used, just fill “metadata set(s)”, otherwise link to a “data structure” with custom “dimensions” IF media type has RDF or SPARQL response
  • 19. Tools for managing dataset metadata • CKAN maintained by the Open Knowledge Foundation Uses most of DCAT. Doesn’t describe dimensions. Also provides a global dataset hub called the Datahub • Dataverse created by Harvard University Uses a custom vocabulary. Doesn’t describe dimensions. • Commercial solutions • Repositories and catalogs: OpenAIRE, DataCite (using re3data to search repositories) and Dryad use their own vocabularies. • CIARD RING Uses full DCAT AP with some extended properties (protocol, data type) and local taxonomies with URIs mapped when possible to authorities. Next steps: adding DataCube properties for dimensions.
  • 20. Major outstanding issues • Some missing properties in existing vocabularies:  approach vocabulary owners OR extend vocabularies • Missing vocabularies for protocols, formats  approach standardizing bodies?  perhaps specific dataset formats? • Need for more standardized semantics for dimensions:  Joint discussions with the RDA Data Type Registries WG? • Lack of interoperability metadata in existing tools
  • 21. References • W3C DCAT: http://www.w3.org/TR/vocab-dcat/ • DCAT AP: https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat- application-profile-data-portals-europe-final • DataCube: http://purl.org/linked-data/cube# • VOID: http://rdfs.org/ns/void-guide • VIVO Datastar: http://sourceforge.net/projects/vivo/files/Datastar%20ontology/ • CERIF for datasets: https://cerif4datasets.wordpress.com/c4d-deliverables/ • CKAN: http://ckan.org/ • Datahub: http://datahub.io/ • DataCite: http://search.datacite.org/ui?q=subject%3Aagriculture • Re3data: http://www.re3data.org • Dryad: http://datadryad.org/ • OpenAIRE: https://www.openaire.eu/
  • 22. Thank you Valeria Pesce Global Forum on Agricultural Research