SlideShare une entreprise Scribd logo
1  sur  31
Validating
JSON, XML and CSV data
with SHACL-like constraints
Péter Király, GWDG (Göttingen)
pkiraly@gwdg.de
Deutsche Initiative für Netzwerkinformation e.V.
Kompetenzzentrum Interoperable Metadaten (KIM) Workshop
2022-05-02
https://github.com/pkiraly/metadata-qa-api
Shapes Constraint Language (SHACL)
a language for validating RDF graphs against a set of conditions (expressed as
RDF graphs)
ex:PersonShape
a sh:NodeShape ;
sh:targetClass ex:Person ; # checks persons
sh:property [
sh:path ex:ssn ; # checks social
security nr.
sh:maxCount 1 ;
sh:datatype xsd:string ;
sh:pattern "^d{3}-d{2}-d{4}$" ;
] ;
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
Metadata Quality Assessment Framework (MQAF) API
★ an open source software for metadata quality assessment
★ quality dimensions: completeness, multilinguality, uniqueness, etc.
★ extensions: Europeana, MARC, Deutsche Digitale Bibliothek
★ Java API + command line interface (in progress)
★ reads XML, JSON, CSV, MARC
★ highly configurable
★ adaptable to different metadata schemas
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
RDF agnostic SHACL tests*
Cardinality minCount <number>, maxCount <number>
Value Range minExclusive <number>, minInclusive <number>, maxExclusive <number>, maxInclusive
<number>
String minLength <number>, maxLength <number>, hasValue <String>, in [String1, ...,
StringN], pattern <regular expression>, minWords <number>, maxWords <number>
Comparision of
properties
equals <field label>, disjoint <field label>, lessThan <field label>, lessThanOrEquals
<field label>
Logical operators and [<rule1>, ..., <ruleN>], or [<rule1>, ..., <ruleN>], not [<rule1>, ..., <ruleN>]
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
* a subset of SHACL
MQAF API’s SHACL tests
Cardinality minCount <number>, maxCount <number>
Value Range minExclusive <number>, minInclusive <number>, maxExclusive <number>, maxInclusive
<number>
String minLength <number>, maxLength <number>, hasValue <String>, in [String1, ...,
StringN], pattern <regular expression>, minWords <number>, maxWords <number>
Comparision of
properties
equals <field label>, disjoint <field label>, lessThan <field label>, lessThanOrEquals
<field label>
Logical operators and [<rule1>, ..., <ruleN>], or [<rule1>, ..., <ruleN>], not [<rule1>, ..., <ruleN>]
extras contentType [type1, ..., typeN], unique <boolean>, dependencies [id1, id2, ..., idN],
dimension [criteria...] (min/max + Width/Height/Shortside/Longside)
properties id, description, failureScore, successScore, hidden, skip
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
abstracting the address of data element
XML
JSON
CSV
MARC21
have addressable data
elements (branches)
XPath
JSONPath
column
names
MARCSpec
addressing
languages
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
schema
definition
abstracting data element retrieval
XML
JSON
CSV
MARC21
data element
selector
uniform data
structure
May I
get the
title?
Title’s address
is //head/title
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
schema definition
Schema schema = new BaseSchema()
.setFormat(Format.CSV)
.addField(
new JsonBranch("title", "title")
.setRule(
new Rule()
.withDisjoint("description")))
.addField(
new JsonBranch("url", "url")
.setExtractable(true)
.setRule(
new Rule()
.withMinCount(1)
.withMaxCount(1)
.withPattern("^https?://.*$")))
format: csv
fields:
- name: title
rules:
disjoint: description
- name: url
extractable: true
rules:
minCount: 1
maxCount: 1
pattern: ^https?://.*$
Java API YAML configuration file
{
“format”: “csv”,
“fields”: [
{
“name”: “title”,
“rules”: [
{“disjoint”: “description”}
]
},
{
“name”: “url”,
“extractable”: true,
“rules”: [
{
“minCount”: 1,
“maxCount”: 1,
“pattern”: “^https?://.*$”}]}
JSON configuration file
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
one and only one data element instance
- name: about
path: $.['about']
rules:
- minCount: 1
- maxCount: 1
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
numeric value constraints
- name: price
path: $.['price']
rules:
- and:
- minInclusive: 1.0
- maxInclusive: 2.0
- name: price
path: $.['price']
rules:
- and:
- minExclusive: 1.0
- maxExclusive: 2.0
1.0 <= price <= 2.0 1.0 < price < 2.0
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
string constraints / length
- name: about
path: $.['about']
rules:
- minLength: 1
- name: about
path: $.['about']
rules:
- and:
- minLength: 3
- maxLength: 5
lenght(about) >= 1 5 >= lenght(about) >= 3
- name: status
path: $.['status']
rules:
- hasValue: published
status == “published”
- name: type
path: $.['type']
rules:
- in: [dataverse, dataset, file]
type == “dataverse” or
type == “dataset” or
type == “file”
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$
thumbnail is an image or PDF file
- name: about
path: $.['about']
rules:
- minWords: 1
nr_words(about) >= 2
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
string constraints / fixed values
- name: about
path: $.['about']
rules:
- minLength: 1
- name: about
path: $.['about']
rules:
- and:
- minLength: 3
- maxLength: 5
lenght(about) >= 1 5 >= lenght(about) >= 3
- name: status
path: $.['status']
rules:
- hasValue: published
status == “published”
- name: type
path: $.['type']
rules:
- in: [dataverse, dataset, file]
type == “dataverse” or
type == “dataset” or
type == “file”
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$
thumbnail is an image or PDF file
- name: about
path: $.['about']
rules:
- minWords: 1
nr_words(about) >= 2
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
string constraints / pattern
- name: about
path: $.['about']
rules:
- minLength: 1
- name: about
path: $.['about']
rules:
- and:
- minLength: 3
- maxLength: 5
lenght(about) >= 1 5 >= lenght(about) >= 3
- name: status
path: $.['status']
rules:
- hasValue: published
status == “published”
- name: type
path: $.['type']
rules:
- in: [dataverse, dataset, file]
type == “dataverse” or
type == “dataset” or
type == “file”
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$
thumbnail is an image or PDF file
- name: about
path: $.['about']
rules:
- minWords: 1
nr_words(about) >= 2
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
string constraints / number or words
- name: about
path: $.['about']
rules:
- minLength: 1
- name: about
path: $.['about']
rules:
- and:
- minLength: 3
- maxLength: 5
lenght(about) >= 1 5 >= lenght(about) >= 3
- name: status
path: $.['status']
rules:
- hasValue: published
status == “published”
- name: type
path: $.['type']
rules:
- in: [dataverse, dataset, file]
type == “dataverse” or
type == “dataset” or
type == “file”
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$
thumbnail is an image or PDF file
- name: about
path: $.['about']
rules:
- minWords: 2
nr_words(about) >= 2
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
comparisions of data elements
fields:
- name: id
path: $.['id']
rules:
- equals: isbn
- name: isbn
path: $.['isbn']
fields:
- name: title
path: $.['title']
rules:
- disjoint: description
- name: description
path: $.['description']
- name: startingPage
path: startingPage
rules:
- lessThanOrEquals: endingPage
id == isbn title != description startingPage <= endingPage
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
comparisions of data elements
fields:
- name: id
path: $.['id']
rules:
- equals: isbn
- name: isbn
path: $.['isbn']
fields:
- name: title
path: $.['title']
rules:
- disjoint: description
- name: description
path: $.['description']
- name: startingPage
path: startingPage
rules:
- lessThanOrEquals: endingPage
id == isbn title != description startingPage <= endingPage
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
comparisions of data elements
fields:
- name: id
path: $.['id']
rules:
- equals: isbn
- name: isbn
path: $.['isbn']
fields:
- name: title
path: $.['title']
rules:
- disjoint: description
- name: description
path: $.['description']
- name: startingPage
path: startingPage
rules:
- lessThanOrEquals: endingPage
id == isbn title != description startingPage <= endingPage
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
logical operations
- name: id
path: oai:record/dc:identifier
rules:
- and:
- minCount: 1
- maxCount: 1
- minLength: 1
- name: thumbnail
path: oai:record/dc:identifier
rules:
- or:
- pattern: ^.*.(jpe?g|png|)$
- contentType:
- image/jpeg
- image/png
- name: title
path: $.['title']
rules:
- not:
- equals: description
and or not
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
logical operations
- name: id
path: oai:record/dc:identifier
rules:
- and:
- minCount: 1
- maxCount: 1
- minLength: 1
- name: thumbnail
path: oai:record/dc:identifier
rules:
- or:
- pattern: ^.*.(jpe?g|png|)$
- contentType:
- image/jpeg
- image/png
- name: title
path: $.['title']
rules:
- not:
- equals: description
and or not
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
logical operations
- name: id
path: oai:record/dc:identifier
rules:
- and:
- minCount: 1
- maxCount: 1
- minLength: 1
- name: thumbnail
path: oai:record/dc:identifier
rules:
- or:
- pattern: ^.*.(jpe?g|png|)$
- contentType:
- image/jpeg
- image/png
- name: title
path: $.['title']
rules:
- not:
- equals: description
and or not
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
extras
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- contentType: [image/jpeg, image/png, …]
content type
- name: id
path: oai:record/dc:identifier
rules:
- unique: true
- name: url
path: oai:record/dc:identifier[@type='URL']
rules:
- id: Q-4.4
description: Both a media file and a link to an
object are referenced in context.
dependencies: [Q-3.0, Q-4.0]
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- id: 3.1
dimension:
minWidth: 200
minHeight: 200
only if other test has been passed image dimensions
unique value
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
extras
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- contentType: [image/jpeg, image/png, …]
content type
- name: id
path: oai:record/dc:identifier
rules:
- unique: true
- name: url
path: oai:record/dc:identifier[@type='URL']
rules:
- id: Q-4.4
description: Both a media file and a link to an
object are referenced in context.
dependencies: [Q-3.0, Q-4.0]
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- id: 3.1
dimension:
minWidth: 200
minHeight: 200
only if other test has been passed image dimensions
unique value
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
extras
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- contentType: [image/jpeg, image/png, …]
content type
- name: id
path: oai:record/dc:identifier
rules:
- unique: true
- name: url
path: oai:record/dc:identifier[@type='URL']
rules:
- id: Q-4.4
description: Both a media file and a link to an
object are referenced in context.
dependencies: [Q-3.0, Q-4.0]
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- id: 3.1
dimension:
minWidth: 200
minHeight: 200
only if other test has been passed image dimensions
unique value
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
extras
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- contentType: [image/jpeg, image/png, …]
content type
- name: id
path: oai:record/dc:identifier
rules:
- unique: true
- name: url
path: oai:record/dc:identifier[@type='URL']
rules:
- id: Q-4.4
description: Both a media file and a link to an
object are referenced in context.
dependencies: [Q-3.0, Q-4.0]
- name: thumbnail
path: oai:record/dc:identifier[@type='binary']
rules:
- id: 3.1
dimension:
minWidth: 200
minHeight: 200
only if other test has been passed image dimensions
unique value
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
other properties
id identifier, used in output, and in internal references
description explain what the rule checks
failureScore a numerical score assigned if the test fails
successScore a numerical score assigned if the test passes
hidden run the test, but hides from the output
skip do not run the test now (for debugging reason)
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
raw output
★ for each tests:
○ status: PASSED, FAILED, NA (if the data element is not available)
○ score: the output of successScore (if passed), failureScore (if failed) or 0
★ total score
The output could be CSV, JSON or Java objects (configurable)
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
visualization for metadata managers / single record
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
aggregation
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
status and scores
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
workflow 1. ingest
2. measure records
3. aggregate
4. report
5. evaluate with experts
catalogue
improve records
quality assessment tool
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api
research partners
early adopters and contributors
★ Miel Vander Sande (meemoo, Belgium)
★ Richard Palmer (Victoria and Albert Museum, Great Britain)
Deutsche Digitale Bibliothek
★ Francesca Schulze
★ Cosmina Berta
★ Stefanie Rühle
★ Claudia Effenberger
★ Letitia-Venetia Mölck
special thanks
★ Juliane Stiller
Validating
JSON,
XML
and
CSV
data
with
SHACL-like
constraints
https://github.com/pkiraly/metadata-qa-api

Contenu connexe

Similaire à Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)

Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
confluent
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
Jeremy Kendall
 

Similaire à Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022) (20)

Peggy elasticsearch應用
Peggy elasticsearch應用Peggy elasticsearch應用
Peggy elasticsearch應用
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
Document Conversion & Retrieve and Rank 一問一答
Document Conversion & Retrieve and Rank 一問一答Document Conversion & Retrieve and Rank 一問一答
Document Conversion & Retrieve and Rank 一問一答
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
 
Tk2323 lecture 9 api json
Tk2323 lecture 9   api jsonTk2323 lecture 9   api json
Tk2323 lecture 9 api json
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
 
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of DatabricksDataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
 
Azure ARM Templates 101
Azure ARM Templates 101Azure ARM Templates 101
Azure ARM Templates 101
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
 
REST API に疲れたあなたへ贈る GraphQL 入門
REST API に疲れたあなたへ贈る GraphQL 入門REST API に疲れたあなたへ贈る GraphQL 入門
REST API に疲れたあなたへ贈る GraphQL 入門
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R Studio
 
SHACL by example
SHACL by exampleSHACL by example
SHACL by example
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
 
Ams adapters
Ams adaptersAms adapters
Ams adapters
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 

Plus de Péter Király

Plus de Péter Király (20)

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)
 
Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
 
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
 

Dernier

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 

Dernier (20)

TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 

Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)

  • 1. Validating JSON, XML and CSV data with SHACL-like constraints Péter Király, GWDG (Göttingen) pkiraly@gwdg.de Deutsche Initiative für Netzwerkinformation e.V. Kompetenzzentrum Interoperable Metadaten (KIM) Workshop 2022-05-02 https://github.com/pkiraly/metadata-qa-api
  • 2. Shapes Constraint Language (SHACL) a language for validating RDF graphs against a set of conditions (expressed as RDF graphs) ex:PersonShape a sh:NodeShape ; sh:targetClass ex:Person ; # checks persons sh:property [ sh:path ex:ssn ; # checks social security nr. sh:maxCount 1 ; sh:datatype xsd:string ; sh:pattern "^d{3}-d{2}-d{4}$" ; ] ; Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 3. Metadata Quality Assessment Framework (MQAF) API ★ an open source software for metadata quality assessment ★ quality dimensions: completeness, multilinguality, uniqueness, etc. ★ extensions: Europeana, MARC, Deutsche Digitale Bibliothek ★ Java API + command line interface (in progress) ★ reads XML, JSON, CSV, MARC ★ highly configurable ★ adaptable to different metadata schemas Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 4. RDF agnostic SHACL tests* Cardinality minCount <number>, maxCount <number> Value Range minExclusive <number>, minInclusive <number>, maxExclusive <number>, maxInclusive <number> String minLength <number>, maxLength <number>, hasValue <String>, in [String1, ..., StringN], pattern <regular expression>, minWords <number>, maxWords <number> Comparision of properties equals <field label>, disjoint <field label>, lessThan <field label>, lessThanOrEquals <field label> Logical operators and [<rule1>, ..., <ruleN>], or [<rule1>, ..., <ruleN>], not [<rule1>, ..., <ruleN>] Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api * a subset of SHACL
  • 5. MQAF API’s SHACL tests Cardinality minCount <number>, maxCount <number> Value Range minExclusive <number>, minInclusive <number>, maxExclusive <number>, maxInclusive <number> String minLength <number>, maxLength <number>, hasValue <String>, in [String1, ..., StringN], pattern <regular expression>, minWords <number>, maxWords <number> Comparision of properties equals <field label>, disjoint <field label>, lessThan <field label>, lessThanOrEquals <field label> Logical operators and [<rule1>, ..., <ruleN>], or [<rule1>, ..., <ruleN>], not [<rule1>, ..., <ruleN>] extras contentType [type1, ..., typeN], unique <boolean>, dependencies [id1, id2, ..., idN], dimension [criteria...] (min/max + Width/Height/Shortside/Longside) properties id, description, failureScore, successScore, hidden, skip Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 6. abstracting the address of data element XML JSON CSV MARC21 have addressable data elements (branches) XPath JSONPath column names MARCSpec addressing languages Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 7. schema definition abstracting data element retrieval XML JSON CSV MARC21 data element selector uniform data structure May I get the title? Title’s address is //head/title Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 8. schema definition Schema schema = new BaseSchema() .setFormat(Format.CSV) .addField( new JsonBranch("title", "title") .setRule( new Rule() .withDisjoint("description"))) .addField( new JsonBranch("url", "url") .setExtractable(true) .setRule( new Rule() .withMinCount(1) .withMaxCount(1) .withPattern("^https?://.*$"))) format: csv fields: - name: title rules: disjoint: description - name: url extractable: true rules: minCount: 1 maxCount: 1 pattern: ^https?://.*$ Java API YAML configuration file { “format”: “csv”, “fields”: [ { “name”: “title”, “rules”: [ {“disjoint”: “description”} ] }, { “name”: “url”, “extractable”: true, “rules”: [ { “minCount”: 1, “maxCount”: 1, “pattern”: “^https?://.*$”}]} JSON configuration file Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 9. one and only one data element instance - name: about path: $.['about'] rules: - minCount: 1 - maxCount: 1 Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 10. numeric value constraints - name: price path: $.['price'] rules: - and: - minInclusive: 1.0 - maxInclusive: 2.0 - name: price path: $.['price'] rules: - and: - minExclusive: 1.0 - maxExclusive: 2.0 1.0 <= price <= 2.0 1.0 < price < 2.0 Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 11. string constraints / length - name: about path: $.['about'] rules: - minLength: 1 - name: about path: $.['about'] rules: - and: - minLength: 3 - maxLength: 5 lenght(about) >= 1 5 >= lenght(about) >= 3 - name: status path: $.['status'] rules: - hasValue: published status == “published” - name: type path: $.['type'] rules: - in: [dataverse, dataset, file] type == “dataverse” or type == “dataset” or type == “file” - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$ thumbnail is an image or PDF file - name: about path: $.['about'] rules: - minWords: 1 nr_words(about) >= 2 Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 12. string constraints / fixed values - name: about path: $.['about'] rules: - minLength: 1 - name: about path: $.['about'] rules: - and: - minLength: 3 - maxLength: 5 lenght(about) >= 1 5 >= lenght(about) >= 3 - name: status path: $.['status'] rules: - hasValue: published status == “published” - name: type path: $.['type'] rules: - in: [dataverse, dataset, file] type == “dataverse” or type == “dataset” or type == “file” - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$ thumbnail is an image or PDF file - name: about path: $.['about'] rules: - minWords: 1 nr_words(about) >= 2 Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 13. string constraints / pattern - name: about path: $.['about'] rules: - minLength: 1 - name: about path: $.['about'] rules: - and: - minLength: 3 - maxLength: 5 lenght(about) >= 1 5 >= lenght(about) >= 3 - name: status path: $.['status'] rules: - hasValue: published status == “published” - name: type path: $.['type'] rules: - in: [dataverse, dataset, file] type == “dataverse” or type == “dataset” or type == “file” - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$ thumbnail is an image or PDF file - name: about path: $.['about'] rules: - minWords: 1 nr_words(about) >= 2 Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 14. string constraints / number or words - name: about path: $.['about'] rules: - minLength: 1 - name: about path: $.['about'] rules: - and: - minLength: 3 - maxLength: 5 lenght(about) >= 1 5 >= lenght(about) >= 3 - name: status path: $.['status'] rules: - hasValue: published status == “published” - name: type path: $.['type'] rules: - in: [dataverse, dataset, file] type == “dataverse” or type == “dataset” or type == “file” - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - pattern: ^https?://.*.(jpe?g||png|tiff?|gif)$ thumbnail is an image or PDF file - name: about path: $.['about'] rules: - minWords: 2 nr_words(about) >= 2 Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 15. comparisions of data elements fields: - name: id path: $.['id'] rules: - equals: isbn - name: isbn path: $.['isbn'] fields: - name: title path: $.['title'] rules: - disjoint: description - name: description path: $.['description'] - name: startingPage path: startingPage rules: - lessThanOrEquals: endingPage id == isbn title != description startingPage <= endingPage Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 16. comparisions of data elements fields: - name: id path: $.['id'] rules: - equals: isbn - name: isbn path: $.['isbn'] fields: - name: title path: $.['title'] rules: - disjoint: description - name: description path: $.['description'] - name: startingPage path: startingPage rules: - lessThanOrEquals: endingPage id == isbn title != description startingPage <= endingPage Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 17. comparisions of data elements fields: - name: id path: $.['id'] rules: - equals: isbn - name: isbn path: $.['isbn'] fields: - name: title path: $.['title'] rules: - disjoint: description - name: description path: $.['description'] - name: startingPage path: startingPage rules: - lessThanOrEquals: endingPage id == isbn title != description startingPage <= endingPage Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 18. logical operations - name: id path: oai:record/dc:identifier rules: - and: - minCount: 1 - maxCount: 1 - minLength: 1 - name: thumbnail path: oai:record/dc:identifier rules: - or: - pattern: ^.*.(jpe?g|png|)$ - contentType: - image/jpeg - image/png - name: title path: $.['title'] rules: - not: - equals: description and or not Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 19. logical operations - name: id path: oai:record/dc:identifier rules: - and: - minCount: 1 - maxCount: 1 - minLength: 1 - name: thumbnail path: oai:record/dc:identifier rules: - or: - pattern: ^.*.(jpe?g|png|)$ - contentType: - image/jpeg - image/png - name: title path: $.['title'] rules: - not: - equals: description and or not Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 20. logical operations - name: id path: oai:record/dc:identifier rules: - and: - minCount: 1 - maxCount: 1 - minLength: 1 - name: thumbnail path: oai:record/dc:identifier rules: - or: - pattern: ^.*.(jpe?g|png|)$ - contentType: - image/jpeg - image/png - name: title path: $.['title'] rules: - not: - equals: description and or not Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 21. extras - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - contentType: [image/jpeg, image/png, …] content type - name: id path: oai:record/dc:identifier rules: - unique: true - name: url path: oai:record/dc:identifier[@type='URL'] rules: - id: Q-4.4 description: Both a media file and a link to an object are referenced in context. dependencies: [Q-3.0, Q-4.0] - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - id: 3.1 dimension: minWidth: 200 minHeight: 200 only if other test has been passed image dimensions unique value Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 22. extras - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - contentType: [image/jpeg, image/png, …] content type - name: id path: oai:record/dc:identifier rules: - unique: true - name: url path: oai:record/dc:identifier[@type='URL'] rules: - id: Q-4.4 description: Both a media file and a link to an object are referenced in context. dependencies: [Q-3.0, Q-4.0] - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - id: 3.1 dimension: minWidth: 200 minHeight: 200 only if other test has been passed image dimensions unique value Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 23. extras - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - contentType: [image/jpeg, image/png, …] content type - name: id path: oai:record/dc:identifier rules: - unique: true - name: url path: oai:record/dc:identifier[@type='URL'] rules: - id: Q-4.4 description: Both a media file and a link to an object are referenced in context. dependencies: [Q-3.0, Q-4.0] - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - id: 3.1 dimension: minWidth: 200 minHeight: 200 only if other test has been passed image dimensions unique value Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 24. extras - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - contentType: [image/jpeg, image/png, …] content type - name: id path: oai:record/dc:identifier rules: - unique: true - name: url path: oai:record/dc:identifier[@type='URL'] rules: - id: Q-4.4 description: Both a media file and a link to an object are referenced in context. dependencies: [Q-3.0, Q-4.0] - name: thumbnail path: oai:record/dc:identifier[@type='binary'] rules: - id: 3.1 dimension: minWidth: 200 minHeight: 200 only if other test has been passed image dimensions unique value Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 25. other properties id identifier, used in output, and in internal references description explain what the rule checks failureScore a numerical score assigned if the test fails successScore a numerical score assigned if the test passes hidden run the test, but hides from the output skip do not run the test now (for debugging reason) Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 26. raw output ★ for each tests: ○ status: PASSED, FAILED, NA (if the data element is not available) ○ score: the output of successScore (if passed), failureScore (if failed) or 0 ★ total score The output could be CSV, JSON or Java objects (configurable) Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 27. visualization for metadata managers / single record Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 30. workflow 1. ingest 2. measure records 3. aggregate 4. report 5. evaluate with experts catalogue improve records quality assessment tool Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api
  • 31. research partners early adopters and contributors ★ Miel Vander Sande (meemoo, Belgium) ★ Richard Palmer (Victoria and Albert Museum, Great Britain) Deutsche Digitale Bibliothek ★ Francesca Schulze ★ Cosmina Berta ★ Stefanie Rühle ★ Claudia Effenberger ★ Letitia-Venetia Mölck special thanks ★ Juliane Stiller Validating JSON, XML and CSV data with SHACL-like constraints https://github.com/pkiraly/metadata-qa-api