SlideShare une entreprise Scribd logo
1  sur  14
A Role for Provenance in Quality
Assessment


Chris Baillie, Pete Edwards, and Edoardo Pignotti
c.baillie@abdn.ac.uk
Overview

 Motivation

 Evaluating Data Quality

 A Role for Provenance

 Future work




                    c.baillie@abdn.ac.uk
Motivation

 “we don’t know whether the information we find [on the Web]
  is accurate or not. So we have to teach people how to assess
  what they’ve found’’
       Vint Cerf, 2010


 Web of Documents has become the Web of documents,
  services, data, and people.

 Anyone can publish anything so we need a way to evaluate
  quality.

 We are investigating these issues within the Internet of Things
    Sensors now at the centre of many applications


                         c.baillie@abdn.ac.uk
Example Scenario




             c.baillie@abdn.ac.uk
Evaluating Data Quality
                                                           Quality Scores
                                                           -Quality is a multi-
Entity (and context)                                       dimensional construct
To evaluate quality, we                                        - Accuracy
must examine the                                               - Timeliness
context around data                                            - Relevance
                                  F(E, R) = Q

WIQA Framework
examines data content,                                 Data Requirements
context, and external                                  -Furber and Hepp (2011)
ratings                                                use rules to identify
          (Bizer et al. 2009)                          quality problems




                                c.baillie@abdn.ac.uk
Representing Sensor Observations


 Linked Data: “recommended best practice for exposing,
  sharing, and connecting pieces of data using URIs and RDF”




                      c.baillie@abdn.ac.uk
Performing Quality Assessment




                                         CONSTRUCT {
                                           _:b0 a QualityScore .
                                           _:b0 score ?qs .
                             ( E distanceFromRoute X )
                                           _:b0 dqm:ruleViolation _:b1 .
         Rrelevance =     1-
                                      100 _:b1 a DataRequirementViolation .
                                           _:b1 dqm:affectedInstance ?instance .
                                         } WHERE {
                                           ?instance a Observation .
                                           ?instance distanceFromRoute ?distance .
                                           LET (?qs := (1 - (?distance / 100))) .
                                         }



                        c.baillie@abdn.ac.uk
Quality Assessment Results




              c.baillie@abdn.ac.uk
Observation Provenance
 Provenance is a critical part of observation context

 Describes the entities, agents, and activities involved in
  data creation:
    How was the observation value measured?
    Who controlled the sensing process?
    How has the observation been transformed since it was
     created?


 W3C Prov-O model provides linked data representation
  of provenance
Observation Provenance
                     Entity
                 "Observation 2"


                 wasGeneratedBy
                                      Activity
                                   "Map matching"

                                        used
                                                           Agent
                                                           "Chris"
                                       Entity
                                   "Observation 1"
                                                     wasAssociatedWith

                                   wasGeneratedBy
                                                          Activity
                                                      "Sensing Process"

                                                            used


                                                            Entity
                                                       "iPhoneSensor"
Quality Score Provenance
Work To Date
 Developed Quality Assessment Framework that enables:
    Linked data representation of sensor observations
    Definition of quality requirements using SPARQL rules
    Generation of quality scores via reasoning



Future Work
 Implementation of quality rules that examine provenance
 Investigate quality score re-use
Any questions?




Come and see the IRP demo (D9) to see quality
           assessment in action.
Implementation
                                       Quality Rules
           Observation      Reasoner   Relevance
             Triple          (SPIN)      Rule
             Store
                                       Timeliness
                                          Rule
                  Apache Tomcat         Accuracy
                                          Rule
          Observation        Quality
           Service           Service   Availability
                                         Rule

Contenu connexe

En vedette

Unforgetable trip sp2 h
Unforgetable trip sp2 hUnforgetable trip sp2 h
Unforgetable trip sp2 hslidesharer09
 
Evaluating Data Quality using Sensor Metadata and Provenance
Evaluating Data Quality using Sensor Metadata and ProvenanceEvaluating Data Quality using Sensor Metadata and Provenance
Evaluating Data Quality using Sensor Metadata and ProvenanceChris Baillie
 
Connect and combine
Connect and combineConnect and combine
Connect and combinedoeniadee
 
11.mon div
11.mon div11.mon div
11.mon divwdwasile
 
Quality Reasoning in the Semantic Web
Quality Reasoning in the Semantic WebQuality Reasoning in the Semantic Web
Quality Reasoning in the Semantic WebChris Baillie
 
Circuitos Digitales - Contador ascendente y descendente con reset
Circuitos Digitales - Contador ascendente y descendente con resetCircuitos Digitales - Contador ascendente y descendente con reset
Circuitos Digitales - Contador ascendente y descendente con resetFernando Marcos Marcos
 

En vedette (9)

Unforgetable trip sp2 h
Unforgetable trip sp2 hUnforgetable trip sp2 h
Unforgetable trip sp2 h
 
Evaluating Data Quality using Sensor Metadata and Provenance
Evaluating Data Quality using Sensor Metadata and ProvenanceEvaluating Data Quality using Sensor Metadata and Provenance
Evaluating Data Quality using Sensor Metadata and Provenance
 
Grammar book
Grammar bookGrammar book
Grammar book
 
Connect and combine
Connect and combineConnect and combine
Connect and combine
 
10.mon pr
10.mon pr10.mon pr
10.mon pr
 
11.mon div
11.mon div11.mon div
11.mon div
 
Quality Reasoning in the Semantic Web
Quality Reasoning in the Semantic WebQuality Reasoning in the Semantic Web
Quality Reasoning in the Semantic Web
 
Filtros y oscilador de wien
Filtros y oscilador de wienFiltros y oscilador de wien
Filtros y oscilador de wien
 
Circuitos Digitales - Contador ascendente y descendente con reset
Circuitos Digitales - Contador ascendente y descendente con resetCircuitos Digitales - Contador ascendente y descendente con reset
Circuitos Digitales - Contador ascendente y descendente con reset
 

Similaire à A Role for Provenance in Quality Assessment

COBWEB A quality assurance workflow authoring tool for citizen science and cr...
COBWEB A quality assurance workflow authoring tool for citizen science and cr...COBWEB A quality assurance workflow authoring tool for citizen science and cr...
COBWEB A quality assurance workflow authoring tool for citizen science and cr...COBWEB Project
 
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...Stuart Wrigley
 
IoT 2010 Talk on System Infrastructure for the Internet of Things.
IoT 2010 Talk on System Infrastructure for the  Internet of Things.IoT 2010 Talk on System Infrastructure for the  Internet of Things.
IoT 2010 Talk on System Infrastructure for the Internet of Things.Fahim Kawsar
 
Kliment ppt gi2011_testing_remote_final
Kliment ppt gi2011_testing_remote_finalKliment ppt gi2011_testing_remote_final
Kliment ppt gi2011_testing_remote_finalIGN Vorstand
 
Using Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality AssessmentUsing Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality AssessmentOlaf Hartig
 
Testing systemqualities agile2012
Testing systemqualities   agile2012Testing systemqualities   agile2012
Testing systemqualities agile2012drewz lin
 
Testing System Qualities Agile2012 by Rebecca Wirfs-Brock and Joseph Yoder
Testing System Qualities Agile2012 by Rebecca Wirfs-Brock and Joseph YoderTesting System Qualities Agile2012 by Rebecca Wirfs-Brock and Joseph Yoder
Testing System Qualities Agile2012 by Rebecca Wirfs-Brock and Joseph YoderJoseph Yoder
 
February 2010 8 Things You Cant Afford To Ignore About eDiscovery
February 2010 8 Things You Cant Afford To Ignore About eDiscoveryFebruary 2010 8 Things You Cant Afford To Ignore About eDiscovery
February 2010 8 Things You Cant Afford To Ignore About eDiscoveryJohn Wang
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Pr 005 qa_workshop
Pr 005 qa_workshopPr 005 qa_workshop
Pr 005 qa_workshopFrank Gielen
 
Top100summit christina
Top100summit christinaTop100summit christina
Top100summit christinaChristina Geng
 
Ca partner day - qualità servizi - roma 2 di 2
Ca partner day - qualità servizi - roma 2 di 2Ca partner day - qualità servizi - roma 2 di 2
Ca partner day - qualità servizi - roma 2 di 2CA Technologies Italia
 
MED301 Is My CDN Performing? - AWS re: Invent 2012
MED301 Is My CDN Performing? - AWS re: Invent 2012MED301 Is My CDN Performing? - AWS re: Invent 2012
MED301 Is My CDN Performing? - AWS re: Invent 2012Amazon Web Services
 
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialCloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialStuart Charlton
 
Albert Simard - Mobilizing Knowledge: Acquisition, Analysis, and Action
Albert Simard - Mobilizing Knowledge: Acquisition, Analysis, and ActionAlbert Simard - Mobilizing Knowledge: Acquisition, Analysis, and Action
Albert Simard - Mobilizing Knowledge: Acquisition, Analysis, and ActionInstitute for Knowledge Mobilization
 
Semantically-Enhanced Recommendation Algorithms
Semantically-Enhanced Recommendation AlgorithmsSemantically-Enhanced Recommendation Algorithms
Semantically-Enhanced Recommendation AlgorithmsLuigi Ceccaroni
 
Industrialized Linked Data
Industrialized Linked DataIndustrialized Linked Data
Industrialized Linked DataDave Reynolds
 
service quality & usability
service quality & usabilityservice quality & usability
service quality & usabilityYves Pigneur
 

Similaire à A Role for Provenance in Quality Assessment (20)

COBWEB A quality assurance workflow authoring tool for citizen science and cr...
COBWEB A quality assurance workflow authoring tool for citizen science and cr...COBWEB A quality assurance workflow authoring tool for citizen science and cr...
COBWEB A quality assurance workflow authoring tool for citizen science and cr...
 
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
 
IoT 2010 Talk on System Infrastructure for the Internet of Things.
IoT 2010 Talk on System Infrastructure for the  Internet of Things.IoT 2010 Talk on System Infrastructure for the  Internet of Things.
IoT 2010 Talk on System Infrastructure for the Internet of Things.
 
Kliment ppt gi2011_testing_remote_final
Kliment ppt gi2011_testing_remote_finalKliment ppt gi2011_testing_remote_final
Kliment ppt gi2011_testing_remote_final
 
Using Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality AssessmentUsing Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality Assessment
 
Testing systemqualities agile2012
Testing systemqualities   agile2012Testing systemqualities   agile2012
Testing systemqualities agile2012
 
Testing System Qualities Agile2012 by Rebecca Wirfs-Brock and Joseph Yoder
Testing System Qualities Agile2012 by Rebecca Wirfs-Brock and Joseph YoderTesting System Qualities Agile2012 by Rebecca Wirfs-Brock and Joseph Yoder
Testing System Qualities Agile2012 by Rebecca Wirfs-Brock and Joseph Yoder
 
February 2010 8 Things You Cant Afford To Ignore About eDiscovery
February 2010 8 Things You Cant Afford To Ignore About eDiscoveryFebruary 2010 8 Things You Cant Afford To Ignore About eDiscovery
February 2010 8 Things You Cant Afford To Ignore About eDiscovery
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Pr 005 qa_workshop
Pr 005 qa_workshopPr 005 qa_workshop
Pr 005 qa_workshop
 
Top100summit christina
Top100summit christinaTop100summit christina
Top100summit christina
 
Ca partner day - qualità servizi - roma 2 di 2
Ca partner day - qualità servizi - roma 2 di 2Ca partner day - qualità servizi - roma 2 di 2
Ca partner day - qualità servizi - roma 2 di 2
 
MED301 Is My CDN Performing? - AWS re: Invent 2012
MED301 Is My CDN Performing? - AWS re: Invent 2012MED301 Is My CDN Performing? - AWS re: Invent 2012
MED301 Is My CDN Performing? - AWS re: Invent 2012
 
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialCloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
 
Knowledge mobilization
Knowledge mobilization Knowledge mobilization
Knowledge mobilization
 
Albert Simard - Mobilizing Knowledge: Acquisition, Analysis, and Action
Albert Simard - Mobilizing Knowledge: Acquisition, Analysis, and ActionAlbert Simard - Mobilizing Knowledge: Acquisition, Analysis, and Action
Albert Simard - Mobilizing Knowledge: Acquisition, Analysis, and Action
 
Semantically-Enhanced Recommendation Algorithms
Semantically-Enhanced Recommendation AlgorithmsSemantically-Enhanced Recommendation Algorithms
Semantically-Enhanced Recommendation Algorithms
 
Hypothesis Based Testing: Power + Speed.
Hypothesis Based Testing: Power + Speed.Hypothesis Based Testing: Power + Speed.
Hypothesis Based Testing: Power + Speed.
 
Industrialized Linked Data
Industrialized Linked DataIndustrialized Linked Data
Industrialized Linked Data
 
service quality & usability
service quality & usabilityservice quality & usability
service quality & usability
 

Dernier

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 

Dernier (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 

A Role for Provenance in Quality Assessment

  • 1. A Role for Provenance in Quality Assessment Chris Baillie, Pete Edwards, and Edoardo Pignotti c.baillie@abdn.ac.uk
  • 2. Overview  Motivation  Evaluating Data Quality  A Role for Provenance  Future work c.baillie@abdn.ac.uk
  • 3. Motivation  “we don’t know whether the information we find [on the Web] is accurate or not. So we have to teach people how to assess what they’ve found’’ Vint Cerf, 2010  Web of Documents has become the Web of documents, services, data, and people.  Anyone can publish anything so we need a way to evaluate quality.  We are investigating these issues within the Internet of Things  Sensors now at the centre of many applications c.baillie@abdn.ac.uk
  • 4. Example Scenario c.baillie@abdn.ac.uk
  • 5. Evaluating Data Quality Quality Scores -Quality is a multi- Entity (and context) dimensional construct To evaluate quality, we - Accuracy must examine the - Timeliness context around data - Relevance F(E, R) = Q WIQA Framework examines data content, Data Requirements context, and external -Furber and Hepp (2011) ratings use rules to identify (Bizer et al. 2009) quality problems c.baillie@abdn.ac.uk
  • 6. Representing Sensor Observations  Linked Data: “recommended best practice for exposing, sharing, and connecting pieces of data using URIs and RDF” c.baillie@abdn.ac.uk
  • 7. Performing Quality Assessment CONSTRUCT { _:b0 a QualityScore . _:b0 score ?qs . ( E distanceFromRoute X ) _:b0 dqm:ruleViolation _:b1 . Rrelevance = 1- 100 _:b1 a DataRequirementViolation . _:b1 dqm:affectedInstance ?instance . } WHERE { ?instance a Observation . ?instance distanceFromRoute ?distance . LET (?qs := (1 - (?distance / 100))) . } c.baillie@abdn.ac.uk
  • 8. Quality Assessment Results c.baillie@abdn.ac.uk
  • 9. Observation Provenance  Provenance is a critical part of observation context  Describes the entities, agents, and activities involved in data creation:  How was the observation value measured?  Who controlled the sensing process?  How has the observation been transformed since it was created?  W3C Prov-O model provides linked data representation of provenance
  • 10. Observation Provenance Entity "Observation 2" wasGeneratedBy Activity "Map matching" used Agent "Chris" Entity "Observation 1" wasAssociatedWith wasGeneratedBy Activity "Sensing Process" used Entity "iPhoneSensor"
  • 12. Work To Date  Developed Quality Assessment Framework that enables:  Linked data representation of sensor observations  Definition of quality requirements using SPARQL rules  Generation of quality scores via reasoning Future Work  Implementation of quality rules that examine provenance  Investigate quality score re-use
  • 13. Any questions? Come and see the IRP demo (D9) to see quality assessment in action.
  • 14. Implementation Quality Rules Observation Reasoner Relevance Triple (SPIN) Rule Store Timeliness Rule Apache Tomcat Accuracy Rule Observation Quality Service Service Availability Rule

Notes de l'éditeur

  1. In this talk I will outline: why the need for quality assessment exists describe how quality is perceived outline our approach to quality assessment provide an example scenario and outline our future work.
  2. Don’t know whether information is accuracte: need to assess! Web has evolved. Web = open platform. Web is big, need smaller platform for eval.
  3. Consider mobile phones providing passenger information regarding the location of buses. Sometimes we get lucky and observations land right on the bus route. However, there are many different sources of low quality data. Inaccurate GPS readings… Malicious users… someone playing with the app while at home People that make mistakes… someone perhaps on the wrong bus…
  4. Animate this ObservationValue ->[Motivate SSN here] Observation + foi -> disruption report
  5. DataRequirement1 -> wasAttributedTo -> Agent