Linked Data Quality Assessment – daQ and Luzzu

Linked Data Quality Assessment
– daQ and Luzzu
Jeremy Debattista
University of Bonn
Presentation at the Ontology Engineering
Group (UPM)

…who am I?
• B.Sc (Hons) in Computer Science – University of
Malta
– Thesis: Collaborative Editing and Expert Finding
• M.App Sc in Computer Science – DERI, National
University of Ireland, Galway
– Thesis: Ontology-based rules for User-Controlled
Support in Ubiquitous Environments
• PhD Candidate – University of Bonn

… my PhD – the big picture
• Work related to Data Quality (in LD)
– representing quality metadata (daQ)
– assessing data quality (Luzzu)
– identifying new metrics from standard
vocabularies (like PROV-O)

… the need for Quality Metadata
• Convincing data consumers to use our
published data
• Filtering datasets
• Poor Quality Perspective – Big Data Veracity

… the daQ vocabulary
• Metadata as Named Graphs
• Usage of abstract class concept
• Metric assessment as Observations
• Preserving Provenance information

… daQ on the Web
http://purl.org/eis/vocab/daq

… daQ Applications
• daQ validator – Validates quality metric
schemas extending the daQ (will be online
soon)
– e.g. checking that each dimension is in exactly one
category…
• Luzzu – next slides

… Luzzu – QA Framework
• A comprehensive QA framework
– assesses LD quality using user-provided metrics (we
have a number of LOD metrics already) in a scalable
manner
– provides queryable metadata (daQ)
– provide quality reports which can be used for cleaning
• Java Based with maven integration
• http://eis-bonn.github.io/Luzzu

Knowledge)
Layer)
Quality)Assessment)Unit)
Processing)Unit)
Assessment)
Layer)
Seman9c)Schema)Layer)
Annota9on)Unit) Opera9ons)Unit)
Communica9on)Layer)
LQML)Comp.)Unit)

Dataset& Processing&Unit& Annota0on&Unit&
Metric&1& Metric&2& Metric&n&…"
Quality&Assessment&Unit&
Communica0on&Layer&

…what’s missing in Luzzu
• Make Luzzu work better on Big Data Platforms
– We already have a SPARK Processor
– How can metrics be scaled on different cores?
Something like map-reduce maybe?

… data quality lifecycle
2.#
Assessment#
3.#Data#
Repairing#and#
Cleaning#
4.#Storage/
Cataloguing/
Archiving##
5.#
Explora@on/
Ranking#
1.#Metric#
Iden@fica@on#
and#
Defin
i
@on#

… quality metrics
• Traditional naïve way
• Probabilistic Techniques (A paper was
presented at ESWC this year)

… probabilistic technique hypothesis
Probabilistic approximation techniques would :
(H1) drastically improve computational time
(H2) give close to accurate results

… probabilistic techniques used
Reservoir
Sampling
Bloom
Filters
Clustering
Coefficient
Estimation
Dereferenceability
Links to External
Data Providers
Extensional
Conciseness
Clustering
Coefficient of a
Network

… some results
Reservoir
Sampling
Bloom
Filters
Clustering
Coefficient
Estimation
Dereferenceability
Links to External
Data Providers
Extensional
Conciseness
Clustering
Coefficient of a
Network
Precision: approx. 75%
Time Saved: > 2 Orders of Magnitude
Precision: 100%

… some results
Reservoir
Sampling
Bloom
Filters
Clustering
Coefficient
Estimation
Dereferenceability
Links to External
Data Providers
Extensional
Conciseness
Clustering
Coefficient of a
Network

… some results
Reservoir
Sampling
Bloom
Filters
Clustering
Coefficient
Estimation
Dereferenceability
Links to External
Data Providers
Extensional
Conciseness
Clustering
Coefficient of a
Network
Time Saved: > 1 Order of Magnitude

… what am I working on
• Large Scale/Data web Scale evaluation Journal
Paper
– assessing the quality of LOD Cloud datasets
• daQ (Journal Paper)

… what do we do at Bonn
• Open Government Data – Publishing and
Consumption
– Data Value Chains, Value Creation, Budgeting
• Portal for publication and consumption of open
data
– Lowering of semantic data to shallower domain
specific formats (RDB, CSV etc..)
• RDF Visualisations and Recommendations

… what do we do at Bonn
• Dataset Change Detection
• Collaborative Authoring and Open Educational
Content
• Low-threshold agile methodology for
collaborative vocabulary development
• Mapping of AutomationML to RDF

… some tools
http://purl.org/net/exconquer/

… some tools
http://purl.org/net/dsaas

… some tools
http://slidewiki.org

… some tools
http://eis.iai.uni-bonn.de/Projects/LinkDaViz.html

Linked Data Quality Assessment – daQ and Luzzu

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Linked Data Quality Assessment – daQ and Luzzu

Similaire à Linked Data Quality Assessment – daQ and Luzzu (20)

Dernier

Dernier (20)

Linked Data Quality Assessment – daQ and Luzzu

Notes de l'éditeur