OGD Metadata standards – The ENGAGE metadata architecture
1. FP7‐INFRASTRUCTURES‐2011‐2, FP7‐ICT‐283700
Informatiks Workshop on Open Gov Data Standardisation, Koblenz, Germany, 17 Sept 2013
OGD Metadata standards – The ENGAGE
metadata architecture
Anneke Zuiderwijk, Delft University of Technology, The Netherlands
Marijn Janssen, Delft University of Technology, The Netherlands
3. Introduction
0 Metadata often defined as data about data
0 Metadata somehow describes (internet) resources for the end-user1
0 Purpose / Type of use determines the distinction between data and
metadata
0 Metadata is not just about the objects, but also about the relations
between these objects (e.g. A is creator of B)2
0 Metadata are key enablers for the effective use of OGD3 – necessary to make
sense of OGD 4, 5
1 Jeffery, K. (2013). Metadata models. Presented at Samos Summer School 2013.
2 Simons, E. (2013). Introduction. Presented at EuroCRIS Seminar 2013.
3 Zuiderwijk, A., Jeffery, K., & Janssen, M. (2012). The potential of metadata for linked open data and its value for users and publishers. Journal of e-Democracy and Open
Government, 4(2), 222-244.
4 Berners-Lee, T. (2009). Linked data Retrieved December 8, 2011, from http://www.w3.org/DesignIssues/LinkedData.html.
5 S. Dawes, T. Pardo, and A. Cresswell, “Designing electronic government information access programs: a holistic approach,” Government Information Quarterly, vol. 21, pp. 3-
23, 2004.
4. Introduction
0 Benefits of metadata6:
0 create order within datasets
0 improve storing and preservation
0 improve easily finding, analyzing, interpreting, comparing and
reproducing data
0 may make it possible to assess the quality of data
0 make it easier to link data
6 Zuiderwijk, A., Jeffery, K., & Janssen, M. (2012). The potential of metadata for linked open data and its value for users and publishers. Journal of e-Democracy and Open
Government, 4(2), 222-244.
5. Introduction
0 Current situation is a long way from the ideal situation:
0 In the domain of open data the discussion about metadata is on a very
low level or sometimes not even present7
0 usually few and insufficient ways of managing metadata and
interpretation of LOD e.g. 8,9
0 adding metadata is often viewed as an additional activity that only
consumes resources
0 Objective: discuss OGD metadata standards used by the
ENGAGE metadata architecture
CEDEM 2012
7 Simons, E. (2013). Introduction. Presented at EuroCRIS Seminar 2013.
8 Schuurman, N., Deshpande, A., & Allen, D. M. (2008). Data integration across borders: A case study of the Abbotsford-Sumas Aquifer. Journal of the American Water
Resources Association, 44(4), 921- 934.
9 Xiong, J., Hu, Y., Li, G., Tang, R., & Fan, Z. (2011). Metadata distribution and consistency techniques for large-scale cluster file systems. IEEE Transaction on parallel and
distributed systems, 22(5), 803-816.
6. Metadata standards 8
0 Hundreds of specific formats are used as a ‘standard’ within
specific communities but ones used widely are:
0 DC (Dublin Core): used to describe web pages
0 CKAN (Comprehensive Knowledge Archive Network): used in national
OGD sites – based on DC
0 eGMS; e-Government Metadata Standard – based on DC
0 DCAT (Data Catalog): used for datasets on the web – based on DC
0 INSPIRE : used for datasets with geospatial coordinates
0 EU Directive and standard; some overlap with DC but extended
0 CERIF (Common European research Information Format): used for all
research information
0 All but CERIF are ‘flat’ or ‘linear’
8 Jeffery, K. (2013). Metadata models. Presented at Samos Summer School 2013.
7. ENGAGE project
0 ENGAGE-project (FP7): An Infrastructure for Open, Linked
Governmental Data Provision towards Research Communities and
Citizens (www.engage-project.eu, www.engagedata.eu)
0 Main goal: the development and use of a data infrastructure,
incorporating distributed and diverse public sector information (PSI)
resources
0 Go beyond PSI (data.gov) sites in number of datasets, diversity of
datasets, quality of metadata
8. ENGAGE project - Functionalities
0 Contribution of ENGAGE over existing infrastructures:
1. Service for researchers and citizens
2. Metadata specification and content organisation (embracement of
the Linked Data Paradigm while ensuring the quality and
responsiveness of highly structured information models)
3. Automation in data entry and curation
4. Crowdsourcing and interaction with and between users of the
platform
5. Data curation tools and services
6. Dataset visualisation possibilities
7. Multilinguality
8. User help and training
0 Needs to be supported by appropriate OGD metadata
architecture
9. ENGAGE OGD architecture – the data model 8
A 3-layer structure for metadata is used:
0 CERIF
0 Relational model
0 Maintained by euroCRIS since 2002 (on request of EC), in use in 43 countries
0 National standard for research information in 10 countries
0 Mapped to various other metadata standards
1. DISCOVERY
(DC, eGMS…)
2. CONTEXT
(CERIF)
3. DETAIL
(SUBJECT OR TOPIC SPECIFIC)
Generate
Point to
Linked
open data
Formal
Information
Systems
6 Zuiderwijk, A., Jeffery, K., & Janssen, M. (2012). The potential of metadata for linked open data and its value for users and publishers. Journal of e-Democracy and Open
Government, 4(2), 222-244.
10. ENGAGE OGD architecture 8
0 CERIF provides much richer metadata than the standards used
commonly with PSI datasets.
0 Contextual metadata (CERIF) allows rich semantics to be
represented thus making the PSI datasets understandable to
the end user (or software) through the metadata
0 Allows for:
0 Generating discovery metadata from CERIF;
0 Interconverting common metadata formats used in PSI using CERIF as
the superset exchange mechanism;
0 Providing a semantic web / LOD representation of the metadata for
browsing or query using SPARQL;
0 While maintaining a conventional information systems capability with
structured query
6 Zuiderwijk, A., Jeffery, K., & Janssen, M. (2012). The potential of metadata for linked open data and its value for users and publishers. Journal of e-Democracy and Open
Government, 4(2), 222-244.
11. ENGAGE OGD architecture - Models for an
infrastructure 8
0 The data model (controls data representation and data re-
use) with its metadata described is only one relevant model
0 The other models are:
0 User Model
0 controls the way in which the end-user interacts with the e-infrastructure
0 Process Model
0 controls the way processes are constructed and executed in the e-infrastructure
0 Resource Model
0 catalogs the available computing resources in the e-infrastructure
8 Jeffery, K. (2013). Metadata models. Presented at Samos Summer School 2013.
12. ENGAGE OGD architecture - Integration with e-
Infrastructure 8
Complete ICT environment for open data
Complete cohort of researchers, research managers,
civil servants, innovators, citizens, media
Processing Model
User Model
Data Model
Resource Model
interaction with data, processing, persons
providing what the user
requires
representing open datasets
representing ICT
8 Jeffery, K. (2013). Metadata models. Presented at Samos Summer School 2013.
13. Conclusions
0 Metadata are key enablers for OGD use, yield considerable
benefits
0 Objective: discuss OGD metadata standards used by the
ENGAGE metadata architecture
0 ENGAGE 3-layer metadata model
0 Discovery metadata
0 Contextual metadata (CERIF)
0 Detailed metadata
0 CERIF
0 provides much richer metadata than the standards used commonly
with PSI datasets
0 allows rich semantics to be represented thus making the PSI datasets
understandable to the end user (or software) through the metadata
Data can be data to one person, but metadata to another person.Consider a libraryCatalogue cardsBooks on shelvesTo researcher or reader the catalogue cards are metadataDescribe the book and point to where it is on the shelfDescriptive and navigational metadataTo librarian catalogue cards are datause catalogue cards to count number of books on ‘information technologySo do not distinguish data and metadata except by how used
However, many public and private organizations that are releasing their data are simply putting their data on the internet without providing contextual information or linkage to other data. Many data providing organizations do not consider the way that their open data can be reused or how they can get feedback on the data that they published, as shown by the absence of advanced service e-infrastructures for the reuse of open data, that include tools that enable cleaning, analyzing, visualizing and linking datasets (Charalabidis, Ntanos, & Lampathaki, 2011).
There are discrepancies between the benefits that are described in literature and the benefits that are obtained in reality. The currentsituation is insufficient.Statements
- CERIF provides a much richer metadata than the standards used commonly with PSI datasets and so improves greatly the experience of the end user (or the software) in processing the PSI datasets described by the enhanced metadata.- The representation of contextual metadata (CERIF) allows rich semantics to be represented simply over a formal syntax thus making the PSI datasets understandable to the end user (or software) through the enhanced metadata. - The Structured Query Language (SQL) usually presented to the end-user through an easy-to-use Query By Example (QBE) interface has a simpler structure than SPARQL and includes convenient primitive operations for simple statistical calculations such as sum, count, average.