PoolParty Semantic Suite is Semantic Web Company’s platform for enterprise information integration based on Linked Data principles. PoolParty consists of several components that process and manage RDF based data sets. These components have consistency requirements towards the data they work on.
Also, users have requirements towards the quality of the data they manage. We want to express constraints for both in a standard way throughout PoolParty components. SKOS-based PoolParty Thesaurus project data requires both consistency and quality.
1. Robert David
CTO, Semantic Web Company
Maura Moran
Senior Content Consultant,
Mekon
PoolParty Semantic Suite
Data Validation along the
Linked Data Life Cycle
2. About
2 ▸ Linked Data Lifecycle
▸ Software Components
▸ Data consistency requirements
▸ Data validation standards
▸ Validation use cases
▸ Live demo
4. Fact sheet:
PoolParty
PoolParty Semantic Suite
▸ Most complete Semantic
Middleware on the Global Market
▸ Semantic AI: Fusion of Knowledge
Graphs, NLP, and Machine Learning
▸ Linked Data Management along the
whole Data Life Cycle
▸ W3C standards compliant
▸ First release in 2009
▸ Current version 7.0
▸ Over 200 installations world-wide
▸ On-premises or cloud-based
▸ KMWorld listed PoolParty as
Trend-Setting Product 2015, 2016
and 2017
▸ www.poolparty.biz
4
6. 6 ▸ UnifiedViews
▸ Extractor
▸ Thesaurus Server
▸ GraphEditor
▸ GraphEditor
▸ UnifiedViews
▸ UnifiedViews
▸ Extractor
▸ Thesaurus Server
▸ Extractor
▸ UnifiedViews
▸ Semantic Classifier
▸ UnifiedViews
▸ Thesaurus Server
▸ API
▸ GraphSearch
▸ 3rd
party
Knowledge Graph
Management
Along the Linked
Data Life Cycle
7. Data must be consistent so that:
▸ Applications can process them correctly
▸ Data quality is as expected
But data is often dirty and complicated, especially if sourced
from several applications
Perform checks to
▸ ensure it conforms to the scheme you’ve set out
▸ is accurate
Use relationships between concepts to perform better checks
Data
Consistency
Motivation
7
8. Validation for the Linked Data Lifecycle
RDF based validation approaches:
▸ SPARQL
▸ Closed World OWL
▸ ShEx
▸ SHACL
Data Validation
Standards
8
9. “a language for validating RDF graphs against a set of
conditions”
▸ Use RDF to define the conditions
▸ Easy to understand by humans
▸ Can be processed by machines
▸ Well defined semantics
▸ Extendible via SPARQL
▸ W3C Recommendation
SHACL
Shapes
Constraint
Language
9
10. How does it work?
▸ Define shapes using RDF
▸ Shapes define how the data should look like
▸ A processor validates existing data against shapes
▹ detect inconsistencies
▹ improve quality
▸ The result is a conformance report listing
violations where the data does not match the
shapes
SHACL
Shapes
Constraint
Language
10
11. shape:PoolPartyConceptShape defines a SHACL shape
a sh:NodeShape ; for a graph node
sh:targetClass skos:Concept ; applied to all skos:Concepts
sh:property [ which must satisfy
sh:path skos:prefLabel ;
sh:disjoint skos:altLabel ; skos:prefLabel and skos:altLabel have to be disjoint
sh:uniqueLang true ; the language for skos:prefLabel literals is unique
] ;
...
sh:property [ there is a path for each skos:Concept to a skos:ConceptScheme
sh:path ( via skos:broader and skos:topConceptOf (and inverse)
[ sh:zeroOrMorePath [ sh:alternativePath ( skos:broader [ sh:inversePath skos:narrower ])]]
[ sh:alternativePath ( skos:topConceptOf [ sh:inversePath skos:hasTopConcept ])]) ;
sh:minCount 1 ;
] ;
sh:message "The concept violates PoolParty's concept definition" . reporting this message on violations
SHACL
Shapes
Constraint
Language
11
13. Component: PoolParty Thesaurus Server
▸ SKOS based data model
▸ Users can import RDF into project
▸ The components has requirements:
▹ SKOS
▹ Additional component-specific constraints
▸ Data has to be validated on import
▸ Data can be repaired for conformance
Use Case 1
SKOS Thesaurus
Import Validation
13
14. Component: PoolParty GraphEditor
▸ Ontology based data model
▸ Ontology driven UI
▸ Users can connect to graphs
▸ Users can work freely with RDF data
▸ Not restricted to SKOS
▸ But also less stability for data
▸ Flexible data validation is needed
▸ Define checks for different use cases
Use Case 2
Graph Data
Validation
14
15. Component: PoolParty GraphEditor
Constraint:
There must not be more than two active
board members for each Legal Entity.
Use Case 3
Legal Data
Legal Definitions
15
Board MemberLegal Entity Active
hasBoardMembership hasBoardMemberStatus
16. Component: PoolParty GraphEditor
Constraint:
If a Legal Entity has a country and a city assigned,
then both must be related with a skos:narrower path,
so that the geo information is consistent.
Use Case 4
Legal Data
Geo Consistency
16
Legal Entity
Country
City
isLocatedInCountry
isLocatedInCity
skos:narrower
17. Component: PoolParty UnifiedViews
▸ Linked data orchestration tool
▸ Users process different formats
XLS, CSV, XML creating “free-form” RDF
▸ RDF data processing works in pipelines
▸ Pipelines consist of Data Processing Units
▸ Data validation using SPARQL and ASK queries
▸ Standardized data validation is needed
Use Case 5
UnifiedViews
Validation
17
18. Component: PoolParty UnifiedViews/GraphEditor
journal ⇒ impactFactor ⇔ ¬journal ∨ impactFactor
Constraint:
If a publication has a relation to a journal, that journal
must have an impactFactor and a skos:prefLabel.
Use Case 5
UnifiedViews
Validation
Publication
dataset
18 Publication
impactFactor
skos:prefLabel
journal
19. Use Case 5
Shape with
logical operators
vs SPARQL
:PublicationShape a sh:NodeShape ;
sh:targetClass :Publication ;
sh:property [
sh:path sweb:journal ;
sh:sparql [
a sh:SPARQLConstraint ;
sh:select
"""SELECT $this WHERE {
$this $PATH ?journal;
FILTER NOT EXISTS {
?journal :impactFactor ?impactFactor .
?journal skos:prefLabel ?label . }
}""" ;
] ;
] .
19
22. Why do we need data consistency?
Software components:
▸ Stability for application logic
▸ Correctness of processed results
Users:
▸ Correctness of analysis results
▸ Quality of data
Data
Consistency
Motivation
22
23. ▸ Software components support the Linked Data
Lifecycle
▸ Managed data has to conform to requirements of
software components
▸ Components need input / output validation for
data
▸ Ensure stability for software components
▸ Correctness of processed results
Data
Consistency
Software
Components
and the Linked
Data Life Cycle
23