SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Using Semantic Web
Resources for Data Quality
      Management
       Christian Fürber and Martin Hepp
      christian@fuerber.com, mhepp@computer.org

  Presentation at the 17th International Conference on
 Knowledge Engineering and Knowledge Management,
        October 10-15, 2010, Lisbon, Portugal
Purpose of Data
  Measurement                                      Information &
                                                   Knowledge

                                      101010101
                                      010101010
                                     DATA
                                      101010101
                                      001010101
    Automation                        001010101     Decisions




C. Fürber, M. Hepp:                                          2
Using SemWeb Resources for DQM
Data Quality in Practice




       Reference: http://www.heise.de/newsticker/meldung/Comdirect-Bank-macht-Kunden-zu-Billiardaeren-996088.html


C. Fürber, M. Hepp:                                                                                                 3
Using SemWeb Resources for DQM
The Web of Messy Data?
 Retrieved from http://dbpedia.org/sparql on July 20th




                                                                         Which one is
                                                                          the correct
                                                                         population?




C. Fürber, M. Hepp:                                                                     4
Using SemWeb Resources for DQM
The Web of Messy Data?
 Retrieved from http://dbpedia.org/sparql on July 20th




                                                                            Places with
                                                                             negative
                                                                           population?!?




C. Fürber, M. Hepp:                                                                        5
Using SemWeb Resources for DQM
Risk of Failure
  Measurement                                      Information &
                                                   Knowledge

                                     101010101
                                     010101010
                                    DATA
                                     101010101
                                     001010101
    Automation                       001010101      Decisions




C. Fürber, M. Hepp:                                          6
Using SemWeb Resources for DQM
Data Quality Problem Types
                                                      Inconsistent duplicates
                     Invalid characters                              Missing classification




                                                                                                                       Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  Incorrect reference                                                                  Approximate duplicates




                                                                                                                           Reference: Linking Open Data cloud diagram, by
                                                      Character alignment violation

                   Word transpositions
                                 Invalid substrings
                                                           Mistyping / Misspelling errors
  Cardinality violation
                                                 Missing values                  Referential integrity violation
                 Misfielded values
        Unique value violation        False values              Functional Dependency
                          Out of range values
                                                                Violation                Imprecise values
   Existence of Homonyms             Meaningless values
                                                                        Incorrect classification
        Existence of Synonyms                                Contradictory relationships
                          Outdated conceptual elements          Untyped literals        Outdated values


C. Fürber, M. Hepp:                                                                                                7
Using SemWeb Resources for DQM
Goals

• Use Semantic Web data to identify data
  quality problems on instance level

• Support Data Quality Management (DQM)
  process


C. Fürber, M. Hepp:                        8
Using SemWeb Resources for DQM
Total Data Quality Management
  for and based on the Semantic Web
                                                               Develop and
     Define what‘s
                                                              apply SPARQL
     good and / or
                                                              queries based
      what‘s poor                Define    Measure
                                                                 on DQ-
      data quality
                                                                Definition

                                          DQ
                                 Improve   Analyze

                                                     Reference: Richard Wang (1998)




C. Fürber, M. Hepp:                                                                   9
Using SemWeb Resources for DQM
How can the Semantic Web support
    Data Quality Management?

   Availability of FREE Data Quality Knowledge,
   e.g. for the identification of…

                • Legal value violations
                • Functional dependency violations


C. Fürber, M. Hepp:                                  10
Using SemWeb Resources for DQM
Using Trusted References
  Las Vegas                      France       DQ-Constraints



                             local:Location                    tref:Location


 Las Vegas

                                                                               Las Vegas
                France
                                                                       USA


    Tested Knowledgebase                                       Trusted Reference

C. Fürber, M. Hepp:                                                                  11
Using SemWeb Resources for DQM
Basic Architecture




C. Fürber, M. Hepp:                               12
Using SemWeb Resources for DQM
Basic Characteristics of SPIN
                                 • Allows definition of generalized
                                   SPARQL query templates
 http://spinrdf.org/
                                 • Constraint checking based on
                                   SPARQL
                                 • Definition of inferencing rules via
                                   SPARQL



C. Fürber, M. Hepp:                                                  13
Using SemWeb Resources for DQM
Generic Data Quality Constraints
       Library for Easy DQ-Defintion
                                                • Mandatory properties &
                                                  literals
                                                • Legal values*
                                                • Legal value ranges
                                                • Functional dependencies*
                                                • Legal syntaxes
                                                • Uniqueness

                                                * Designed to use trusted references

          available @ http://semwebquality.org/ontologies/dq-constraints#
C. Fürber, M. Hepp:                                                          14
Using SemWeb Resources for DQM
Definition of Data Quality
                Constraints based on SPIN




C. Fürber, M. Hepp:                           15
Using SemWeb Resources for DQM
Constraint checking in Practice




C. Fürber, M. Hepp:                       16
Using SemWeb Resources for DQM
Legal Value Constraints
   Return all instances of class vcard:Address that do not have a
   matching value for property vcard:country-name in property
   tref:country
                      SELECT ?s
                      WHERE {
                          ?s a vcard:Address .
                          ?s vcard:country-name ?value .
                      OPTIONAL {
                          ?s2 a tref:Location .
                          ?s2 tref:country ?value1 .
                          FILTER(str(?value1)= str(?value))
                          } .
                          FILTER(!bound(?value1))
                      }
C. Fürber, M. Hepp:                                                 17
Using SemWeb Resources for DQM
Functional Dependency Constraints
   Return all instances of vcard:ADR with city-country-combinations
   that do not have a matching pair in instances of gn:Location.

                     SELECT ?s
                     WHERE {
                     ?s a gr:LocationOfSalesOrServiceProvisioning .
                     ?s vcard:ADR ?node
                     ?node vcard:city ?value1 .
                     ?node vcard:country ?value2 .
                     NOT EXISTS {
                     ?s2 a gn:Location .
                     ?s2 gn:asciiname ?value1 .
                     ?s2 gn:country ?value2 .
                     }}



C. Fürber, M. Hepp:                                                   18
Using SemWeb Resources for DQM
Acquisition of Semantic Web
                 Sources for DQM
        (1)          Replication of relevant knowledge-bases
        (2)          On the fly via federated SPARQL queries:
                            PREFIX dbo:<http://dbpedia.org/ontology/>
                            SELECT *
                            WHERE {
                            ?s1 :location_CITY ?city .
                            OPTIONAL{
                            SERVICE <http://dbpedia.org/sparql>{
                            ?s2 a dbo:City .
                            ?s2 rdfs:label ?city .
                            FILTER (lang(?city) = "en") .
                            }
                            }
                            FILTER(!bound(?s2))
                            }

C. Fürber, M. Hepp:                                                     19
Using SemWeb Resources for DQM
Limitations
• High degree of uncertainty about quality of Semantic
  Web resources
• Risk for data quality problem proliferation
• Lack of Semantic Web resources for certain domains
• Flexible design of RDF and structural heterogeneity
  complicate definition of generic DQ constraints
• Scalability on large data sets
• DQ constraints close the world



C. Fürber, M. Hepp:                                      20
Using SemWeb Resources for DQM
Contributions
• Data quality control for Semantic Web data
• Identification of potential inconsistencies
  between Semantic Web Resources
• Reduction of effort for the definition of functional
  dependency rules and legal value rules
• Reuse of shared data quality rules on a Web
  scale


C. Fürber, M. Hepp:                                  21
Using SemWeb Resources for DQM
Future Work
• Semantic Web information quality assessment
  framework (SWIQA) with computation of KPI‘s
• Analysis and identification of useful „trusted
  references“ based on SWIQA
• Application on multi-source master data of
  information systems
• Evaluation on large data sets


C. Fürber, M. Hepp:                                22
Using SemWeb Resources for DQM
Data Quality Constraints Library for SPIN @
http://semwebquality.org/ontologies/dq-constraints#

          Christian Fürber
          Researcher
          E-Business & Web Science Research Group

                        Werner-Heisenberg-Weg 39
                        85577 Neubiberg
                        Germany

                        skype            c.fuerber
                        email            christian@fuerber.com
                        web              http://www.unibw.de/ebusiness
                        homepage         http://www.fuerber.com
                        twitter          http://www.twitter.com/cfuerber




     Paper available at http://bit.ly/c5v6TM
                                                                           23

Contenu connexe

En vedette (8)

Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
Lesson Plan
Lesson PlanLesson Plan
Lesson Plan
 
Chapter 10-DATA ANALYSIS & PRESENTATION
Chapter 10-DATA ANALYSIS & PRESENTATIONChapter 10-DATA ANALYSIS & PRESENTATION
Chapter 10-DATA ANALYSIS & PRESENTATION
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysis
 
Displaying Data
Displaying DataDisplaying Data
Displaying Data
 
Digital in 2016
Digital in 2016Digital in 2016
Digital in 2016
 

Similaire à Using Semantic Web Resources for Data Quality Management

Formal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHPFormal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHP
FACE
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbench
European Data Forum
 
wEb infomation retrieval
wEb infomation retrievalwEb infomation retrieval
wEb infomation retrieval
George Ang
 
438_AmeeruddinMohammed
438_AmeeruddinMohammed438_AmeeruddinMohammed
438_AmeeruddinMohammed
Ameeruddin MD
 

Similaire à Using Semantic Web Resources for Data Quality Management (19)

From Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsFrom Linked Data to Semantic Applications
From Linked Data to Semantic Applications
 
Story cmpe255
Story cmpe255Story cmpe255
Story cmpe255
 
Data aware apps
Data aware appsData aware apps
Data aware apps
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
VictorCassen
VictorCassenVictorCassen
VictorCassen
 
PRISSMA, Towards Mobile Adaptive Presentation of the Web of Data
PRISSMA,Towards Mobile Adaptive Presentation of the Web of DataPRISSMA,Towards Mobile Adaptive Presentation of the Web of Data
PRISSMA, Towards Mobile Adaptive Presentation of the Web of Data
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular data
 
Prov4J: A Semantic Web Framework for Generic Provenance Management
Prov4J: A Semantic Web Framework for Generic Provenance Management Prov4J: A Semantic Web Framework for Generic Provenance Management
Prov4J: A Semantic Web Framework for Generic Provenance Management
 
Formal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHPFormal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHP
 
Final Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.KeyFinal Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.Key
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbench
 
wEb infomation retrieval
wEb infomation retrievalwEb infomation retrieval
wEb infomation retrieval
 
Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012
 
Speculating on the Future of the Metadata Standards Landscape
Speculating on the Future of the Metadata Standards LandscapeSpeculating on the Future of the Metadata Standards Landscape
Speculating on the Future of the Metadata Standards Landscape
 
RCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment ClassificationRCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment Classification
 
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerRCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMiner
 
Initial Usage Analysis of DBpedia's Triple Pattern Fragments
Initial Usage Analysis of DBpedia's Triple Pattern FragmentsInitial Usage Analysis of DBpedia's Triple Pattern Fragments
Initial Usage Analysis of DBpedia's Triple Pattern Fragments
 
438_AmeeruddinMohammed
438_AmeeruddinMohammed438_AmeeruddinMohammed
438_AmeeruddinMohammed
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Using Semantic Web Resources for Data Quality Management

  • 1. Using Semantic Web Resources for Data Quality Management Christian Fürber and Martin Hepp christian@fuerber.com, mhepp@computer.org Presentation at the 17th International Conference on Knowledge Engineering and Knowledge Management, October 10-15, 2010, Lisbon, Portugal
  • 2. Purpose of Data Measurement Information & Knowledge 101010101 010101010 DATA 101010101 001010101 Automation 001010101 Decisions C. Fürber, M. Hepp: 2 Using SemWeb Resources for DQM
  • 3. Data Quality in Practice Reference: http://www.heise.de/newsticker/meldung/Comdirect-Bank-macht-Kunden-zu-Billiardaeren-996088.html C. Fürber, M. Hepp: 3 Using SemWeb Resources for DQM
  • 4. The Web of Messy Data? Retrieved from http://dbpedia.org/sparql on July 20th Which one is the correct population? C. Fürber, M. Hepp: 4 Using SemWeb Resources for DQM
  • 5. The Web of Messy Data? Retrieved from http://dbpedia.org/sparql on July 20th Places with negative population?!? C. Fürber, M. Hepp: 5 Using SemWeb Resources for DQM
  • 6. Risk of Failure Measurement Information & Knowledge 101010101 010101010 DATA 101010101 001010101 Automation 001010101 Decisions C. Fürber, M. Hepp: 6 Using SemWeb Resources for DQM
  • 7. Data Quality Problem Types Inconsistent duplicates Invalid characters Missing classification Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ Incorrect reference Approximate duplicates Reference: Linking Open Data cloud diagram, by Character alignment violation Word transpositions Invalid substrings Mistyping / Misspelling errors Cardinality violation Missing values Referential integrity violation Misfielded values Unique value violation False values Functional Dependency Out of range values Violation Imprecise values Existence of Homonyms Meaningless values Incorrect classification Existence of Synonyms Contradictory relationships Outdated conceptual elements Untyped literals Outdated values C. Fürber, M. Hepp: 7 Using SemWeb Resources for DQM
  • 8. Goals • Use Semantic Web data to identify data quality problems on instance level • Support Data Quality Management (DQM) process C. Fürber, M. Hepp: 8 Using SemWeb Resources for DQM
  • 9. Total Data Quality Management for and based on the Semantic Web Develop and Define what‘s apply SPARQL good and / or queries based what‘s poor Define Measure on DQ- data quality Definition DQ Improve Analyze Reference: Richard Wang (1998) C. Fürber, M. Hepp: 9 Using SemWeb Resources for DQM
  • 10. How can the Semantic Web support Data Quality Management? Availability of FREE Data Quality Knowledge, e.g. for the identification of… • Legal value violations • Functional dependency violations C. Fürber, M. Hepp: 10 Using SemWeb Resources for DQM
  • 11. Using Trusted References Las Vegas France DQ-Constraints local:Location tref:Location Las Vegas Las Vegas France USA Tested Knowledgebase Trusted Reference C. Fürber, M. Hepp: 11 Using SemWeb Resources for DQM
  • 12. Basic Architecture C. Fürber, M. Hepp: 12 Using SemWeb Resources for DQM
  • 13. Basic Characteristics of SPIN • Allows definition of generalized SPARQL query templates http://spinrdf.org/ • Constraint checking based on SPARQL • Definition of inferencing rules via SPARQL C. Fürber, M. Hepp: 13 Using SemWeb Resources for DQM
  • 14. Generic Data Quality Constraints Library for Easy DQ-Defintion • Mandatory properties & literals • Legal values* • Legal value ranges • Functional dependencies* • Legal syntaxes • Uniqueness * Designed to use trusted references available @ http://semwebquality.org/ontologies/dq-constraints# C. Fürber, M. Hepp: 14 Using SemWeb Resources for DQM
  • 15. Definition of Data Quality Constraints based on SPIN C. Fürber, M. Hepp: 15 Using SemWeb Resources for DQM
  • 16. Constraint checking in Practice C. Fürber, M. Hepp: 16 Using SemWeb Resources for DQM
  • 17. Legal Value Constraints Return all instances of class vcard:Address that do not have a matching value for property vcard:country-name in property tref:country SELECT ?s WHERE { ?s a vcard:Address . ?s vcard:country-name ?value . OPTIONAL { ?s2 a tref:Location . ?s2 tref:country ?value1 . FILTER(str(?value1)= str(?value)) } . FILTER(!bound(?value1)) } C. Fürber, M. Hepp: 17 Using SemWeb Resources for DQM
  • 18. Functional Dependency Constraints Return all instances of vcard:ADR with city-country-combinations that do not have a matching pair in instances of gn:Location. SELECT ?s WHERE { ?s a gr:LocationOfSalesOrServiceProvisioning . ?s vcard:ADR ?node ?node vcard:city ?value1 . ?node vcard:country ?value2 . NOT EXISTS { ?s2 a gn:Location . ?s2 gn:asciiname ?value1 . ?s2 gn:country ?value2 . }} C. Fürber, M. Hepp: 18 Using SemWeb Resources for DQM
  • 19. Acquisition of Semantic Web Sources for DQM (1) Replication of relevant knowledge-bases (2) On the fly via federated SPARQL queries: PREFIX dbo:<http://dbpedia.org/ontology/> SELECT * WHERE { ?s1 :location_CITY ?city . OPTIONAL{ SERVICE <http://dbpedia.org/sparql>{ ?s2 a dbo:City . ?s2 rdfs:label ?city . FILTER (lang(?city) = "en") . } } FILTER(!bound(?s2)) } C. Fürber, M. Hepp: 19 Using SemWeb Resources for DQM
  • 20. Limitations • High degree of uncertainty about quality of Semantic Web resources • Risk for data quality problem proliferation • Lack of Semantic Web resources for certain domains • Flexible design of RDF and structural heterogeneity complicate definition of generic DQ constraints • Scalability on large data sets • DQ constraints close the world C. Fürber, M. Hepp: 20 Using SemWeb Resources for DQM
  • 21. Contributions • Data quality control for Semantic Web data • Identification of potential inconsistencies between Semantic Web Resources • Reduction of effort for the definition of functional dependency rules and legal value rules • Reuse of shared data quality rules on a Web scale C. Fürber, M. Hepp: 21 Using SemWeb Resources for DQM
  • 22. Future Work • Semantic Web information quality assessment framework (SWIQA) with computation of KPI‘s • Analysis and identification of useful „trusted references“ based on SWIQA • Application on multi-source master data of information systems • Evaluation on large data sets C. Fürber, M. Hepp: 22 Using SemWeb Resources for DQM
  • 23. Data Quality Constraints Library for SPIN @ http://semwebquality.org/ontologies/dq-constraints# Christian Fürber Researcher E-Business & Web Science Research Group Werner-Heisenberg-Weg 39 85577 Neubiberg Germany skype c.fuerber email christian@fuerber.com web http://www.unibw.de/ebusiness homepage http://www.fuerber.com twitter http://www.twitter.com/cfuerber Paper available at http://bit.ly/c5v6TM 23