SlideShare une entreprise Scribd logo
1  sur  27
Linked Data Quality Assessment
– daQ and Luzzu
Jeremy Debattista
University of Bonn
Presentation at the Ontology Engineering
Group (UPM)
…who am I?
• B.Sc (Hons) in Computer Science – University of
Malta
– Thesis: Collaborative Editing and Expert Finding
• M.App Sc in Computer Science – DERI, National
University of Ireland, Galway
– Thesis: Ontology-based rules for User-Controlled
Support in Ubiquitous Environments
• PhD Candidate – University of Bonn
… my PhD – the big picture
• Work related to Data Quality (in LD)
– representing quality metadata (daQ)
– assessing data quality (Luzzu)
– identifying new metrics from standard
vocabularies (like PROV-O)
… the need for Quality Metadata
• Convincing data consumers to use our
published data
• Filtering datasets
• Poor Quality Perspective – Big Data Veracity
… the daQ vocabulary
… the daQ vocabulary
… the daQ vocabulary
• Metadata as Named Graphs
• Usage of abstract class concept
• Metric assessment as Observations
• Preserving Provenance information
… daQ on the Web
http://purl.org/eis/vocab/daq
… daQ Applications
• daQ validator – Validates quality metric
schemas extending the daQ (will be online
soon)
– e.g. checking that each dimension is in exactly one
category…
• Luzzu – next slides
… Luzzu – QA Framework
• A comprehensive QA framework
– assesses LD quality using user-provided metrics (we
have a number of LOD metrics already) in a scalable
manner
– provides queryable metadata (daQ)
– provide quality reports which can be used for cleaning
• Java Based with maven integration
• http://eis-bonn.github.io/Luzzu
… Luzzu – QA Framework
Knowledge)
Layer)
Quality)Assessment)Unit)
Processing)Unit)
Assessment)
Layer)
Seman9c)Schema)Layer)
Annota9on)Unit) Opera9ons)Unit)
Communica9on)Layer)
LQML)Comp.)Unit)
… Luzzu – QA Framework
Dataset& Processing&Unit& Annota0on&Unit&
Metric&1& Metric&2& Metric&n&…"
Quality&Assessment&Unit&
Communica0on&Layer&
…what’s missing in Luzzu
• Make Luzzu work better on Big Data Platforms
– We already have a SPARK Processor
– How can metrics be scaled on different cores?
Something like map-reduce maybe?
… data quality lifecycle
2.#
Assessment#
3.#Data#
Repairing#and#
Cleaning#
4.#Storage/
Cataloguing/
Archiving##
5.#
Explora@on/
Ranking#
1.#Metric#
Iden@fica@on#
and#
Defin
i
@on#
… quality metrics
• Traditional naïve way
• Probabilistic Techniques (A paper was
presented at ESWC this year)
… probabilistic technique hypothesis
Probabilistic approximation techniques would :
(H1) drastically improve computational time
(H2) give close to accurate results
… probabilistic techniques used
Reservoir
Sampling
Bloom
Filters
Clustering
Coefficient
Estimation
Dereferenceability
Links to External
Data Providers
Extensional
Conciseness
Clustering
Coefficient of a
Network
… some results
Reservoir
Sampling
Bloom
Filters
Clustering
Coefficient
Estimation
Dereferenceability
Links to External
Data Providers
Extensional
Conciseness
Clustering
Coefficient of a
Network
Precision: approx. 75%
Time Saved: > 2 Orders of Magnitude
Precision: 100%
Time Saved: > 2 Orders of Magnitude
… some results
Reservoir
Sampling
Bloom
Filters
Clustering
Coefficient
Estimation
Dereferenceability
Links to External
Data Providers
Extensional
Conciseness
Clustering
Coefficient of a
Network
Precision: approx. 97%
Time Saved: > 3 Orders of Magnitude
… some results
Reservoir
Sampling
Bloom
Filters
Clustering
Coefficient
Estimation
Dereferenceability
Links to External
Data Providers
Extensional
Conciseness
Clustering
Coefficient of a
Network
Precision: approx. 95%
Time Saved: > 1 Order of Magnitude
… what am I working on
• Large Scale/Data web Scale evaluation Journal
Paper
– assessing the quality of LOD Cloud datasets
• daQ (Journal Paper)
… what do we do at Bonn
• Open Government Data – Publishing and
Consumption
– Data Value Chains, Value Creation, Budgeting
• Portal for publication and consumption of open
data
– Lowering of semantic data to shallower domain
specific formats (RDB, CSV etc..)
• RDF Visualisations and Recommendations
… what do we do at Bonn
• Dataset Change Detection
• Collaborative Authoring and Open Educational
Content
• Low-threshold agile methodology for
collaborative vocabulary development
• Mapping of AutomationML to RDF
… some tools
http://purl.org/net/exconquer/
… some tools
http://purl.org/net/dsaas
… some tools
http://slidewiki.org
… some tools
http://eis.iai.uni-bonn.de/Projects/LinkDaViz.html

Contenu connexe

Tendances

RDAP14: Emerging role of UC Libraries in research data management education
RDAP14: Emerging role of UC Libraries in research data management educationRDAP14: Emerging role of UC Libraries in research data management education
RDAP14: Emerging role of UC Libraries in research data management educationASIS&T
 
Project E: Citation
Project E: CitationProject E: Citation
Project E: CitationLizLyon
 
Towards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong StudentsTowards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong StudentsCITE
 
Ran zhou poster 2018
Ran zhou poster 2018Ran zhou poster 2018
Ran zhou poster 2018Ran Zhou
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceJian Qin
 
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...tmra
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Intobutest
 
RDAP14: Comparing disciplinary repositories: tDAR vs. Open Context
RDAP14: Comparing disciplinary repositories: tDAR vs. Open ContextRDAP14: Comparing disciplinary repositories: tDAR vs. Open Context
RDAP14: Comparing disciplinary repositories: tDAR vs. Open ContextASIS&T
 
4.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-354.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-35Alexander Decker
 
11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...Alexander Decker
 
Supporting PDF accessibility evaluation: Early results from the FixRep project
 Supporting PDF accessibility evaluation: Early results from the FixRep project Supporting PDF accessibility evaluation: Early results from the FixRep project
Supporting PDF accessibility evaluation: Early results from the FixRep projectUKOLN (dev), University of Bath
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction TechniquesIRJET Journal
 
A Data Curation Framework: Data Curation and Research Support Services
A Data Curation Framework: Data Curation and Research Support ServicesA Data Curation Framework: Data Curation and Research Support Services
A Data Curation Framework: Data Curation and Research Support ServicesSusanMRob
 
A survey of heterogeneous information network analysis
A survey of heterogeneous information network analysisA survey of heterogeneous information network analysis
A survey of heterogeneous information network analysisSOYEON KIM
 

Tendances (20)

RDAP14: Emerging role of UC Libraries in research data management education
RDAP14: Emerging role of UC Libraries in research data management educationRDAP14: Emerging role of UC Libraries in research data management education
RDAP14: Emerging role of UC Libraries in research data management education
 
Project E: Citation
Project E: CitationProject E: Citation
Project E: Citation
 
Towards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong StudentsTowards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong Students
 
Ran zhou poster 2018
Ran zhou poster 2018Ran zhou poster 2018
Ran zhou poster 2018
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information Science
 
CV
CVCV
CV
 
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Into
 
RDAP14: Comparing disciplinary repositories: tDAR vs. Open Context
RDAP14: Comparing disciplinary repositories: tDAR vs. Open ContextRDAP14: Comparing disciplinary repositories: tDAR vs. Open Context
RDAP14: Comparing disciplinary repositories: tDAR vs. Open Context
 
Cooper "Simplicity is the Ultimate Sophistication: Accessible, Ubiquitous Tec...
Cooper "Simplicity is the Ultimate Sophistication: Accessible, Ubiquitous Tec...Cooper "Simplicity is the Ultimate Sophistication: Accessible, Ubiquitous Tec...
Cooper "Simplicity is the Ultimate Sophistication: Accessible, Ubiquitous Tec...
 
4.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-354.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-35
 
11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...
 
Supporting PDF accessibility evaluation: Early results from the FixRep project
 Supporting PDF accessibility evaluation: Early results from the FixRep project Supporting PDF accessibility evaluation: Early results from the FixRep project
Supporting PDF accessibility evaluation: Early results from the FixRep project
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction Techniques
 
A Data Curation Framework: Data Curation and Research Support Services
A Data Curation Framework: Data Curation and Research Support ServicesA Data Curation Framework: Data Curation and Research Support Services
A Data Curation Framework: Data Curation and Research Support Services
 
QQML presentation
QQML presentationQQML presentation
QQML presentation
 
krynski_cv
krynski_cvkrynski_cv
krynski_cv
 
Gunderman, Slayton, and Wang, "Planning for the Long-Term"
Gunderman, Slayton, and Wang, "Planning for the Long-Term"Gunderman, Slayton, and Wang, "Planning for the Long-Term"
Gunderman, Slayton, and Wang, "Planning for the Long-Term"
 
Ievobio2010cdaostore
Ievobio2010cdaostoreIevobio2010cdaostore
Ievobio2010cdaostore
 
A survey of heterogeneous information network analysis
A survey of heterogeneous information network analysisA survey of heterogeneous information network analysis
A survey of heterogeneous information network analysis
 

En vedette

Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyAmrapali Zaveri, PhD
 
Managing Completeness of Web Data
Managing Completeness of Web DataManaging Completeness of Web Data
Managing Completeness of Web DataFariz Darari
 
Martin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityMartin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityNuffield Trust
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked dataWilliam Smith
 
Query-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data QualityQuery-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data QualityFariz Darari
 
Enhancing educational data quality in heterogeneous learning contexts using p...
Enhancing educational data quality in heterogeneous learning contexts using p...Enhancing educational data quality in heterogeneous learning contexts using p...
Enhancing educational data quality in heterogeneous learning contexts using p...Alex Rayón Jerez
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Pablo Mendes
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupalemmanuel_jamin
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in LibrariesCarl Hess
 
Quality Metrics for Linked Open Data
Quality Metrics for  Linked Open Data Quality Metrics for  Linked Open Data
Quality Metrics for Linked Open Data ebrahim_bagheri
 
Rigor and relevance ppt
Rigor and relevance pptRigor and relevance ppt
Rigor and relevance pptdeborahsutton
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentAmrapali Zaveri, PhD
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernAmin Chowdhury
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open DataDerilinx
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data qualityIUPUI
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesOpen Data Support
 
Institutionalising open data quality - Processes Standards, Tools
Institutionalising open data quality - Processes Standards, ToolsInstitutionalising open data quality - Processes Standards, Tools
Institutionalising open data quality - Processes Standards, ToolsJohann Höchtl
 

En vedette (20)

Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A Survey
 
Managing Completeness of Web Data
Managing Completeness of Web DataManaging Completeness of Web Data
Managing Completeness of Web Data
 
Martin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityMartin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of Quality
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
Query-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data QualityQuery-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data Quality
 
Enhancing educational data quality in heterogeneous learning contexts using p...
Enhancing educational data quality in heterogeneous learning contexts using p...Enhancing educational data quality in heterogeneous learning contexts using p...
Enhancing educational data quality in heterogeneous learning contexts using p...
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Quality Metrics for Linked Open Data
Quality Metrics for  Linked Open Data Quality Metrics for  Linked Open Data
Quality Metrics for Linked Open Data
 
Rigor and relevance ppt
Rigor and relevance pptRigor and relevance ppt
Rigor and relevance ppt
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing Concern
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data quality
 
Open data quality
Open data qualityOpen data quality
Open data quality
 
Data Quality Presentation
Data Quality PresentationData Quality Presentation
Data Quality Presentation
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 
Institutionalising open data quality - Processes Standards, Tools
Institutionalising open data quality - Processes Standards, ToolsInstitutionalising open data quality - Processes Standards, Tools
Institutionalising open data quality - Processes Standards, Tools
 

Similaire à Linked Data Quality Assessment – daQ and Luzzu

RDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesRDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesASIS&T
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
Data Quality
Data QualityData Quality
Data Qualityjerdeb
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustMenchita Falcutila Dumlao
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsSusanMRob
 
AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016Manjula Ambur
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxelisarosa29
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceLizLyon
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
 
ALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and ToolsALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and ToolsAlignedProject
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
empirical-SLR.pptx
empirical-SLR.pptxempirical-SLR.pptx
empirical-SLR.pptxJitha Kannan
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsIlkay Altintas, Ph.D.
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open DataBlerina Spahiu
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarshiptsbbbu
 

Similaire à Linked Data Quality Assessment – daQ and Luzzu (20)

RDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesRDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue Libraries
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Data Quality
Data QualityData Quality
Data Quality
 
Hmp 201512
Hmp 201512Hmp 201512
Hmp 201512
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrust
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libs
 
AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
ALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and ToolsALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and Tools
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
empirical-SLR.pptx
empirical-SLR.pptxempirical-SLR.pptx
empirical-SLR.pptx
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open Data
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
 

Dernier

Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsaqsarehman5055
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 

Dernier (20)

Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 

Linked Data Quality Assessment – daQ and Luzzu

  • 1. Linked Data Quality Assessment – daQ and Luzzu Jeremy Debattista University of Bonn Presentation at the Ontology Engineering Group (UPM)
  • 2. …who am I? • B.Sc (Hons) in Computer Science – University of Malta – Thesis: Collaborative Editing and Expert Finding • M.App Sc in Computer Science – DERI, National University of Ireland, Galway – Thesis: Ontology-based rules for User-Controlled Support in Ubiquitous Environments • PhD Candidate – University of Bonn
  • 3. … my PhD – the big picture • Work related to Data Quality (in LD) – representing quality metadata (daQ) – assessing data quality (Luzzu) – identifying new metrics from standard vocabularies (like PROV-O)
  • 4. … the need for Quality Metadata • Convincing data consumers to use our published data • Filtering datasets • Poor Quality Perspective – Big Data Veracity
  • 5. … the daQ vocabulary
  • 6. … the daQ vocabulary
  • 7. … the daQ vocabulary • Metadata as Named Graphs • Usage of abstract class concept • Metric assessment as Observations • Preserving Provenance information
  • 8. … daQ on the Web http://purl.org/eis/vocab/daq
  • 9. … daQ Applications • daQ validator – Validates quality metric schemas extending the daQ (will be online soon) – e.g. checking that each dimension is in exactly one category… • Luzzu – next slides
  • 10. … Luzzu – QA Framework • A comprehensive QA framework – assesses LD quality using user-provided metrics (we have a number of LOD metrics already) in a scalable manner – provides queryable metadata (daQ) – provide quality reports which can be used for cleaning • Java Based with maven integration • http://eis-bonn.github.io/Luzzu
  • 11. … Luzzu – QA Framework Knowledge) Layer) Quality)Assessment)Unit) Processing)Unit) Assessment) Layer) Seman9c)Schema)Layer) Annota9on)Unit) Opera9ons)Unit) Communica9on)Layer) LQML)Comp.)Unit)
  • 12. … Luzzu – QA Framework Dataset& Processing&Unit& Annota0on&Unit& Metric&1& Metric&2& Metric&n&…" Quality&Assessment&Unit& Communica0on&Layer&
  • 13. …what’s missing in Luzzu • Make Luzzu work better on Big Data Platforms – We already have a SPARK Processor – How can metrics be scaled on different cores? Something like map-reduce maybe?
  • 14. … data quality lifecycle 2.# Assessment# 3.#Data# Repairing#and# Cleaning# 4.#Storage/ Cataloguing/ Archiving## 5.# Explora@on/ Ranking# 1.#Metric# Iden@fica@on# and# Defin i @on#
  • 15. … quality metrics • Traditional naïve way • Probabilistic Techniques (A paper was presented at ESWC this year)
  • 16. … probabilistic technique hypothesis Probabilistic approximation techniques would : (H1) drastically improve computational time (H2) give close to accurate results
  • 17. … probabilistic techniques used Reservoir Sampling Bloom Filters Clustering Coefficient Estimation Dereferenceability Links to External Data Providers Extensional Conciseness Clustering Coefficient of a Network
  • 18. … some results Reservoir Sampling Bloom Filters Clustering Coefficient Estimation Dereferenceability Links to External Data Providers Extensional Conciseness Clustering Coefficient of a Network Precision: approx. 75% Time Saved: > 2 Orders of Magnitude Precision: 100% Time Saved: > 2 Orders of Magnitude
  • 19. … some results Reservoir Sampling Bloom Filters Clustering Coefficient Estimation Dereferenceability Links to External Data Providers Extensional Conciseness Clustering Coefficient of a Network Precision: approx. 97% Time Saved: > 3 Orders of Magnitude
  • 20. … some results Reservoir Sampling Bloom Filters Clustering Coefficient Estimation Dereferenceability Links to External Data Providers Extensional Conciseness Clustering Coefficient of a Network Precision: approx. 95% Time Saved: > 1 Order of Magnitude
  • 21. … what am I working on • Large Scale/Data web Scale evaluation Journal Paper – assessing the quality of LOD Cloud datasets • daQ (Journal Paper)
  • 22. … what do we do at Bonn • Open Government Data – Publishing and Consumption – Data Value Chains, Value Creation, Budgeting • Portal for publication and consumption of open data – Lowering of semantic data to shallower domain specific formats (RDB, CSV etc..) • RDF Visualisations and Recommendations
  • 23. … what do we do at Bonn • Dataset Change Detection • Collaborative Authoring and Open Educational Content • Low-threshold agile methodology for collaborative vocabulary development • Mapping of AutomationML to RDF

Notes de l'éditeur

  1. there are various reasons why dataset should contain quality metadata convincing data consumers: is the published data fit to the user’s needs filtering datasets: if the publisher does not care about his data, then why should a consumer use it? Poor quality perspective: LD is a good use case for Veracity in Big Data, but it is often overlooked due to its poor quality perspective. If the big data community is convinced otherwise, LD might be used more often on bigger platforms. Therefore we have to start by assessing data quality and stamp our datasets in a machine readable format.
  2. Represent Quality Metadata in Named Graphs that can be attached to datasets CDM are abstract classes… these are only conceptual.. more concrete classes should be represented as sub-classes A dataset can be assessed multiple metrics. Each metric can be assessed over the dataset infinite times, each time the new value represented as an observation Each observation is also a Provenance Entity, enabling the representation of concepts such as the activity agent and how a metric was executed (for example parameter setting in reservoir techniques)
  3. The general architecture
  4. The processing workflow
  5. We identified the data quality lifecycle, which could be part of a bigger lifecyle like the LODStack or even to bigger more generic processes like data value chains Metric Identification and Definition – Choosing the right metrics for a dataset and task at hand; Assessment – Dataset assessment based on the metrics chosen Data Repairing and Cleaning – Ensuring that, following a quality assessment, a dataset is curated in order to improve its quality; Storage, Cataloguing and Archiving – Updating the improved dataset on the cloud whilst making the quality metadata available to the public Exploration and Ranking – Finally, data consumers can explore cleaned datasets according to their quality metadata
  6. our hypothesis is that probabilistic approximation techniques would drastically improve computational time when compared with the naïve implementations which gives 100% accurate results having said that the probabilistic techniques will still give a close to accurate results given the right parameter settings.
  7. Therefore to sum up the metrics using the Res Sampling techniques: The deref metric gave around 75% precision, whilst the order of magnitude can easily go over 2 with small datasets having 1M triples The links to External DP metric gave us 100% precision, whilst the difference in the time can be easily noticeable when datasets grow larger.
  8. From the results we saw that the precision was on average 97%, whilst the computational time takes more than 3 order of magnitude in most of the cases
  9. sum up