Validata (http://hw-swel.github.io/Validata/) is an online web application for validating a dataset description expressed in RDF against a community profile expressed as a Shape Expression (ShEx). Additionally it provides an API for programmatic access to the validator. Validata is capable of being used for multiple community agreed standards, e.g. DCAT, the HCLS community profile, or the Open PHACTS guidelines, and there are currently deployments to support each of these. Validata can be easily repurposed for different deployments by providing it with a new ShEx schema. The Validata code is available from GitHub (https://github.com/HW-SWeL/Validata).
Presentation given at SDSVoc https://www.w3.org/2016/11/sdsvoc
1. Validata: A tool for testing
profile conformance
Alasdair J G Gray
Heriot-Watt University
www.macs.hw.ac.uk/~ajg33
A.J.G.Gray@hw.ac.uk
@gray_alasdair
Andrew Beveridge
Jacob Baungard Hansen
Johnny Val
Leif Gehrmann
Roisin Farmer
Sunil Khutan
Tomas Robertson
2. HCLS Dataset Descriptions
https://www.w3.org/TR/hcls-dataset/
Dumontier M, Gray AJG, Marshall MS, et al. (2016) The health care
and life sciences community profile for dataset descriptions.
PeerJ 4:e2331 https://doi.org/10.7717/peerj.2331
1 December 2016
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
2
3. Requirements
• Online tool
– Deployable on W3C
server
– GUI
– API
• Support multiple
constraints
– Properties
– Data values
– …
• Requirement levels
– Different levels of
user messages:
Error, Warning,
Information
• Configurable
– HCLS (Required)
– DCAT, Open
PHACTS, etc
(Optional)
1 December 2016
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
3
4. Example Constraint
1 December 2016 4
• Shape
• A Dataset
– MUST be declared to be of type dctype:Dataset
– MUST have a dcterms:title as a language typed
string
– MUST NOT have dcterms:created date
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
Dates are associated
with versions in HCLS
5. Example Validation
1 December 2016 5
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
• Shape
• Data
6. Example Validation
• Shape
• Data
1 December 2016 6
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
7. Example Validation
1 December 2016 7
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
• Shape
• Data
13. Validata
https://github.com/HW-SWeL/Validata
• RDF constraint validation tool
– Configurable to any profile
• Shape Expression (ShEx) constraints
• Open source javascript implementation
www.macs.hw.ac.uk/~ajg33/
A.J.G.Gray@hw.ac.uk
@gray_alasdair
Notes de l'éditeur
Motivation: how do we check descriptions conform?
Summary level: time unchanging information, e.g. name, description, publisher
Version level: version specific information, e.g. version number, creator, etc
Distribution level: file specific information, e.g. file location and format, number of triples
18 vocabularies: DCTerms, DCAT, VoID, FOAF, …
61 prescribed properties: MUST, SHOULD, MAY, MUST NOT for each level
Link into data publishing pipeline via API
Not tied to HCLS, only a motivation
No existing tool meets these needs
Constraints form a graph pattern that data must comply with
How do we validate that our example data conforms to a certain shape
Express expected shape as ShEx
Toy example, what about for real
How do we validate that our example data conforms to a certain shape
Express expected shape as ShEx
Toy example, what about for real
How do we validate that our example data conforms to a certain shape
Express expected shape as ShEx
Toy example, what about for real
ShEx:
Concise notation
regex based
W3C SHACL not stable when work done
ShEx is an implementation of SHACL with extra features
Step through validation process
Extended ShEx to allow arbitrary hierarchies
Toy example, what about for real
ShEx-validator has other dependencies too
Minimist: arguments parser
Promise: call backs
Pegjs: parser generator
Mocha: test driven development