SlideShare a Scribd company logo
1 of 29
Download to read offline
A DATABASE APPROACH TO MONITORING THE
                    QUALITY OF INFORMATION IN RDF STORES
                        Alexandre Rademaker and Edward Hermann




Wednesday, November 30, 11
NOTES



                  This is not a research report, this is a research
                  propose!

                  Let us start by looking results from database
                  researchers.




Wednesday, November 30, 11
WHAT IS (ENSURE) DATA QUALITY?




                  Semantic properties of databases can be represented
                  by integrity constraints!

                  Integrity enforcement means maintain correctness of
                  database. Truth Maintenance!



                                                         Hendrik, 2011

Wednesday, November 30, 11
HENDRIK DECKER


             http://web.iti.upv.es/~hendrik/
             Universidad Politécnica de Valencia




Wednesday, November 30, 11
EXAMPLE




                  A marriage is between one man and one women only.
                  How can we model such constraint in a relational
                  DB?

                  We are talking about more than: check constraint,
                  foreign key and primary key.




Wednesday, November 30, 11
DB THEORY USES DATALOG




                  Datalog is more expressive than SQL (transitive
                  closure)

                  SQL is FOL (dedidable for finite model)

                  SELECT X WHERE Y (give me the binds that satisfy
                  the clauses)




Wednesday, November 30, 11
TWO WAYS TO ENFORCE INTEGRITY




                 In each update, check if any integrity constraint is
                 violated. (not always rigorously check due its
                 performance penalty)

                 Repair extant violations of constraints. (accumulation
                 of inconsistency is inevitable)



                                                           Hendrik, 2011

Wednesday, November 30, 11
INCONSISTENCY-TOLERANT METHODS




                 Rigorous way is to eliminate all inconsistency. Repair
                 the whole database.

                 Relaxation... partial (flexible) repairs!


                             Absolute consistency is out of question
                                     due its intractability!

                                                                Hendrik, 2011

Wednesday, November 30, 11
FLEXIBILITY OF PARTIAL INCONSISTENCY


            Flexibility served in two ways:
                 Integrity enforcement is more flexible. Don’t have to
                 be done all at once. (constraint violations can be
                 tolerated to be solved in appropriate moment)

                 Some inconsistency may be unknown at update time.
                 Total approach would fail in such situation.

                 But...


                                                         Hendrik, 2011

Wednesday, November 30, 11
PARTIAL REPAIRS


                 Absolute consistency is out of question due its intractability.

                 But, naive inconsistency-tolerant repairs can be data-
                 destructive.

                 For a rational flexible repair strategy, one needs criteria
                 (expressed in terms of metrics)

                 Only admit repairs that are integrity-preserving! That is, total
                 amount of integrity violation not increase after the repair.


                                                                   Hendrik, 2011

Wednesday, November 30, 11
FORMAL DEFINITIONS

           For an update U (inserts, deletes) of database D, we
           denoted DU the updated database.


           D      =    database
           IC     =    integrity theory
           I      =    constraint
           U      =    update
                                            D(F)   = true if F eval to true in D

                                            D(I)   = true if I is satisfied in D

                                            D(IC) = true if all I in IC is
                                                    satisfied in D




                                                                  Hendrik, 2011

Wednesday, November 30, 11
FORMAL DEFINITIONS


            Let be an ordering antisymmetric, reflexive and transitive.
            For two elements in a lattice A and B, A B is their least upper bound.




                                                                    Hendrik, 2011

Wednesday, November 30, 11
FORMAL DEFINITIONS

            We say that (µ, ) is an inconsistency metric if
            µ maps tuples (D, IC) to some lattice that is partially ordered by                 .

           Simple example of a metric        is given by (D, IC) = D(IC)
           with the natural order true       f alse of the range of .

                                                  That is, integrity sat, D(IC) = true,
                                                  mean lower inconsistency than integrity violation,
                                                  D(IC) = false.


              Non trivial examples given by comparing or
              counting violated constraints.



                                                                               Hendrik, 2011

Wednesday, November 30, 11
INCONSISTENCY METRICS


                 Inconsistency metrics are used to decide if an update preserves
                 integrity, that is, doesn’t create a integrity violation that
                 doesn’t exist before the update.

                 Intuitively, an update preserves integrity if it doesn’t increase
                 the measured inconsistency

                             For a metric (µ, ), an update U in a database D
                             with integrity theory IC is integrity-preserving with
                             regard to (µ, ) if µ(DU , IC) µ(D, IC).

                                                                        Hendrik, 2011

Wednesday, November 30, 11
AND MORE...




                 Inconsistency-tolerant integrity checking

                 Repairs

                 Computing and checking partial repairs

                 Computing integrity-preserving repairs




                                                             Hendrik, 2011

Wednesday, November 30, 11
WHY WE ARE TALKING ABOUT IT?




Wednesday, November 30, 11
WHY WE ARE TALKING ABOUT IT?



                 Lattes@FGV Project (a unified KB of FGV research
                 publications, researchers, skills etc), http://dck092.fgv.br/

                 Semantic Web brings, RDF, description logics, linked data etc.

                 Our research topics include Logics and knowledge
                 representation.

                 RDF are the key concept of Semantic Web

                 Relational has fixed model (TBOX of an ontology)



Wednesday, November 30, 11
TOPOS: THEORETICAL PART
                                                                                         scra
                                                                                              tchi
                                                                                                   n g th
                                                                                                          e su
                                                                                                                 rfac
                                                                                                                     e!
                 A topos (plural topoi or toposes) is a category with a quite expressive internal logic

                 The category of graphs and graph-homomorphisms can be viewed as a topos.

                 This topos already has a Heyting algebra that is used as the truth-basis of its internal logic.

                 A Heyting algebra is a lattice with additional properties. This topos-theoretic view of RDF
                 stores can be investigated in order to provide a natural way to provide foundations to
                 partial repairs in RDF stores.

                 Besides that, if we view traditional DBs as finite first-order logical structures, the category
                 of (finite) first-order structures and homomorphism between then has its own internal
                 logic. This internal logic can be investigated also regarding partial repairs.




Wednesday, November 30, 11
LATTES@FGV




Wednesday, November 30, 11
LATTES@FGV




Wednesday, November 30, 11
LATTES@FGV




Wednesday, November 30, 11
LATTES@FGV: THE RDF KB




                              http://dck092.fgv.br:10035/repositories/fgv (800k triples)



Wednesday, November 30, 11
LATTES@FGV



                 480 CV Lattes and collected data from other sources (Qualis,
                 Digital Library etc) in one triple store

                 lots of errors (inconsistencies) for different reasons: poor user
                 interface for input data, misinterpretation etc.

                 How to identify the errors? (non ad-hoc matter)

                 How to fix what can be fixed automatically?




Wednesday, November 30, 11
INTEGRITY CONSTRAINTS IN RDF




                 We can consider the extension of what was discussed so far to
                 non-SQL

                 KR/DB can be viewed as a graph

                 The query language of RDF based stores, SPARQL, can be
                 used to provide semantics to the store.




Wednesday, November 30, 11
EXAMPLES




                                  An article referenced by a CV
                                  must have the author of this CV as
                                  one of its authors!




Wednesday, November 30, 11
EXAMPLES




                                  If two resources were identified by
                                  reference to the same article, every
                                  author of the first one should also
                                  be related to the second one!




Wednesday, November 30, 11
IN THE LAST EXAMPLE

           Of course, two publications cannot be considered
           the same comparing only their titles!

           We need entity alignment, similarity checker...

           Suppose we have identified all resources that
           represent the same real “entity” using ask {
           owl:sameAs, than ...                     ?p1 owl:sameAs  ?p2 ;
                                                        dc:creator ?c .
                                                    OPTIONAL {
                                                      ?p2 ?rel ?c .
                                                    }
                                                    FILTER( !bound(?rel) )
                                                  }



Wednesday, November 30, 11
A LITTLE BIT ABOUT THE
                   IDENTIFICATION OF SIMILARITY

           (defun assert-same-list (list)
             (let ((new nil))
               (mapcar (lambda (pair)
                      (let ((a (first pair))
                         (b (second pair)))
                     (if (not (blank-node-p a))
                         (push (reverse pair) new)
                         (push pair new))))
                    list)
               (dolist (pair new)
                 (add-triple (first pair) !owl:sameAs (second pair)))))




           (select0/callback (?x ?y) #'insert-same-as
                (q- ?x !rdf:type !foaf:Agent)
                (q- ?y !rdf:type !foaf:Agent)
                (q- ?x !foaf:name ?n)
                (q- ?y !foaf:name ?n)
                (lispp (upi< ?x ?y)))




                                                  Naive approach: Shaking hands!


Wednesday, November 30, 11
A LITTLE BIT ABOUT THE
                   IDENTIFICATION OF SIMILARITY

           (defun components (vertices n generator)
             (do ((res nil)
                  (vtx vertices
                    (set-difference vtx (car res) :test #'upi=)))
                 ((null vtx) res)
                 (push (ego-group (car vtx) n generator) res)))


           (defsna-generator same-journal (node)
             (select0 (?j)
               (q- (?? node) !bibo:issn ?i)
               (q- ?j !bibo:issn ?i)
               (lispp (utils::check-issn (part->value ?i)))
               (lispp (upi< node ?j))
               (q- ?j !dc:title ?t2)
               (q- (?? node) !dc:title ?t1)
               (lispp (> (utils::jaro-winkler-distance (part->value ?t1) (part->value ?t2)) 0.7))))


           (let ((nodes (mapcar #'subject (get-triples-list :p !bibo:issn :limit nil))))
                (dolist (g (components nodes 2 'same-journal)))
                    (merge-nodes g))


                    An ad-hoc solution: breath-first-search of connected components!


Wednesday, November 30, 11

More Related Content

More from Alexandre Rademaker

Verifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetVerifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetAlexandre Rademaker
 
An overview of Portuguese WordNets
An overview of Portuguese WordNetsAn overview of Portuguese WordNets
An overview of Portuguese WordNetsAlexandre Rademaker
 
On the Computational Complexity of Intuitionistic Hybrid Modal Logic
On the Computational Complexity of Intuitionistic Hybrid Modal LogicOn the Computational Complexity of Intuitionistic Hybrid Modal Logic
On the Computational Complexity of Intuitionistic Hybrid Modal LogicAlexandre Rademaker
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportAlexandre Rademaker
 
Embedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PTEmbedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PTAlexandre Rademaker
 
A linked open data architecture for contemporary historical archives
A linked open data architecture for contemporary historical archivesA linked open data architecture for contemporary historical archives
A linked open data architecture for contemporary historical archivesAlexandre Rademaker
 
Processamento de Linguagem Natural em textos da História Comptemporânea do Br...
Processamento de Linguagem Natural em textos da História Comptemporânea do Br...Processamento de Linguagem Natural em textos da História Comptemporânea do Br...
Processamento de Linguagem Natural em textos da História Comptemporânea do Br...Alexandre Rademaker
 
On the proof theory for Description Logics
On the proof theory for Description LogicsOn the proof theory for Description Logics
On the proof theory for Description LogicsAlexandre Rademaker
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allAlexandre Rademaker
 
Intuitionistic Description Logic for Legal Reasoning
Intuitionistic Description Logic for Legal ReasoningIntuitionistic Description Logic for Legal Reasoning
Intuitionistic Description Logic for Legal ReasoningAlexandre Rademaker
 
Is it important to explain a theorem? A case study in UML and ALCQI
Is it important to explain a theorem? A case study in UML and ALCQIIs it important to explain a theorem? A case study in UML and ALCQI
Is it important to explain a theorem? A case study in UML and ALCQIAlexandre Rademaker
 

More from Alexandre Rademaker (12)

Verifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetVerifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNet
 
An overview of Portuguese WordNets
An overview of Portuguese WordNetsAn overview of Portuguese WordNets
An overview of Portuguese WordNets
 
On the Computational Complexity of Intuitionistic Hybrid Modal Logic
On the Computational Complexity of Intuitionistic Hybrid Modal LogicOn the Computational Complexity of Intuitionistic Hybrid Modal Logic
On the Computational Complexity of Intuitionistic Hybrid Modal Logic
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project Report
 
Embedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PTEmbedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PT
 
A linked open data architecture for contemporary historical archives
A linked open data architecture for contemporary historical archivesA linked open data architecture for contemporary historical archives
A linked open data architecture for contemporary historical archives
 
Processamento de Linguagem Natural em textos da História Comptemporânea do Br...
Processamento de Linguagem Natural em textos da História Comptemporânea do Br...Processamento de Linguagem Natural em textos da História Comptemporânea do Br...
Processamento de Linguagem Natural em textos da História Comptemporânea do Br...
 
On the proof theory for Description Logics
On the proof theory for Description LogicsOn the proof theory for Description Logics
On the proof theory for Description Logics
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for all
 
Intuitionistic Description Logic for Legal Reasoning
Intuitionistic Description Logic for Legal ReasoningIntuitionistic Description Logic for Legal Reasoning
Intuitionistic Description Logic for Legal Reasoning
 
First Order Logic
First Order LogicFirst Order Logic
First Order Logic
 
Is it important to explain a theorem? A case study in UML and ALCQI
Is it important to explain a theorem? A case study in UML and ALCQIIs it important to explain a theorem? A case study in UML and ALCQI
Is it important to explain a theorem? A case study in UML and ALCQI
 

Recently uploaded

Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 

Recently uploaded (20)

Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 

A database approach to monitoring the quality of information in RDF stores

  • 1. A DATABASE APPROACH TO MONITORING THE QUALITY OF INFORMATION IN RDF STORES Alexandre Rademaker and Edward Hermann Wednesday, November 30, 11
  • 2. NOTES This is not a research report, this is a research propose! Let us start by looking results from database researchers. Wednesday, November 30, 11
  • 3. WHAT IS (ENSURE) DATA QUALITY? Semantic properties of databases can be represented by integrity constraints! Integrity enforcement means maintain correctness of database. Truth Maintenance! Hendrik, 2011 Wednesday, November 30, 11
  • 4. HENDRIK DECKER http://web.iti.upv.es/~hendrik/ Universidad Politécnica de Valencia Wednesday, November 30, 11
  • 5. EXAMPLE A marriage is between one man and one women only. How can we model such constraint in a relational DB? We are talking about more than: check constraint, foreign key and primary key. Wednesday, November 30, 11
  • 6. DB THEORY USES DATALOG Datalog is more expressive than SQL (transitive closure) SQL is FOL (dedidable for finite model) SELECT X WHERE Y (give me the binds that satisfy the clauses) Wednesday, November 30, 11
  • 7. TWO WAYS TO ENFORCE INTEGRITY In each update, check if any integrity constraint is violated. (not always rigorously check due its performance penalty) Repair extant violations of constraints. (accumulation of inconsistency is inevitable) Hendrik, 2011 Wednesday, November 30, 11
  • 8. INCONSISTENCY-TOLERANT METHODS Rigorous way is to eliminate all inconsistency. Repair the whole database. Relaxation... partial (flexible) repairs! Absolute consistency is out of question due its intractability! Hendrik, 2011 Wednesday, November 30, 11
  • 9. FLEXIBILITY OF PARTIAL INCONSISTENCY Flexibility served in two ways: Integrity enforcement is more flexible. Don’t have to be done all at once. (constraint violations can be tolerated to be solved in appropriate moment) Some inconsistency may be unknown at update time. Total approach would fail in such situation. But... Hendrik, 2011 Wednesday, November 30, 11
  • 10. PARTIAL REPAIRS Absolute consistency is out of question due its intractability. But, naive inconsistency-tolerant repairs can be data- destructive. For a rational flexible repair strategy, one needs criteria (expressed in terms of metrics) Only admit repairs that are integrity-preserving! That is, total amount of integrity violation not increase after the repair. Hendrik, 2011 Wednesday, November 30, 11
  • 11. FORMAL DEFINITIONS For an update U (inserts, deletes) of database D, we denoted DU the updated database. D = database IC = integrity theory I = constraint U = update D(F) = true if F eval to true in D D(I) = true if I is satisfied in D D(IC) = true if all I in IC is satisfied in D Hendrik, 2011 Wednesday, November 30, 11
  • 12. FORMAL DEFINITIONS Let be an ordering antisymmetric, reflexive and transitive. For two elements in a lattice A and B, A B is their least upper bound. Hendrik, 2011 Wednesday, November 30, 11
  • 13. FORMAL DEFINITIONS We say that (µ, ) is an inconsistency metric if µ maps tuples (D, IC) to some lattice that is partially ordered by . Simple example of a metric is given by (D, IC) = D(IC) with the natural order true f alse of the range of . That is, integrity sat, D(IC) = true, mean lower inconsistency than integrity violation, D(IC) = false. Non trivial examples given by comparing or counting violated constraints. Hendrik, 2011 Wednesday, November 30, 11
  • 14. INCONSISTENCY METRICS Inconsistency metrics are used to decide if an update preserves integrity, that is, doesn’t create a integrity violation that doesn’t exist before the update. Intuitively, an update preserves integrity if it doesn’t increase the measured inconsistency For a metric (µ, ), an update U in a database D with integrity theory IC is integrity-preserving with regard to (µ, ) if µ(DU , IC) µ(D, IC). Hendrik, 2011 Wednesday, November 30, 11
  • 15. AND MORE... Inconsistency-tolerant integrity checking Repairs Computing and checking partial repairs Computing integrity-preserving repairs Hendrik, 2011 Wednesday, November 30, 11
  • 16. WHY WE ARE TALKING ABOUT IT? Wednesday, November 30, 11
  • 17. WHY WE ARE TALKING ABOUT IT? Lattes@FGV Project (a unified KB of FGV research publications, researchers, skills etc), http://dck092.fgv.br/ Semantic Web brings, RDF, description logics, linked data etc. Our research topics include Logics and knowledge representation. RDF are the key concept of Semantic Web Relational has fixed model (TBOX of an ontology) Wednesday, November 30, 11
  • 18. TOPOS: THEORETICAL PART scra tchi n g th e su rfac e! A topos (plural topoi or toposes) is a category with a quite expressive internal logic The category of graphs and graph-homomorphisms can be viewed as a topos. This topos already has a Heyting algebra that is used as the truth-basis of its internal logic. A Heyting algebra is a lattice with additional properties. This topos-theoretic view of RDF stores can be investigated in order to provide a natural way to provide foundations to partial repairs in RDF stores. Besides that, if we view traditional DBs as finite first-order logical structures, the category of (finite) first-order structures and homomorphism between then has its own internal logic. This internal logic can be investigated also regarding partial repairs. Wednesday, November 30, 11
  • 22. LATTES@FGV: THE RDF KB http://dck092.fgv.br:10035/repositories/fgv (800k triples) Wednesday, November 30, 11
  • 23. LATTES@FGV 480 CV Lattes and collected data from other sources (Qualis, Digital Library etc) in one triple store lots of errors (inconsistencies) for different reasons: poor user interface for input data, misinterpretation etc. How to identify the errors? (non ad-hoc matter) How to fix what can be fixed automatically? Wednesday, November 30, 11
  • 24. INTEGRITY CONSTRAINTS IN RDF We can consider the extension of what was discussed so far to non-SQL KR/DB can be viewed as a graph The query language of RDF based stores, SPARQL, can be used to provide semantics to the store. Wednesday, November 30, 11
  • 25. EXAMPLES An article referenced by a CV must have the author of this CV as one of its authors! Wednesday, November 30, 11
  • 26. EXAMPLES If two resources were identified by reference to the same article, every author of the first one should also be related to the second one! Wednesday, November 30, 11
  • 27. IN THE LAST EXAMPLE Of course, two publications cannot be considered the same comparing only their titles! We need entity alignment, similarity checker... Suppose we have identified all resources that represent the same real “entity” using ask { owl:sameAs, than ...   ?p1 owl:sameAs ?p2 ;       dc:creator ?c .   OPTIONAL {     ?p2 ?rel ?c .   }   FILTER( !bound(?rel) ) } Wednesday, November 30, 11
  • 28. A LITTLE BIT ABOUT THE IDENTIFICATION OF SIMILARITY (defun assert-same-list (list) (let ((new nil)) (mapcar (lambda (pair) (let ((a (first pair)) (b (second pair))) (if (not (blank-node-p a)) (push (reverse pair) new) (push pair new)))) list) (dolist (pair new) (add-triple (first pair) !owl:sameAs (second pair))))) (select0/callback (?x ?y) #'insert-same-as (q- ?x !rdf:type !foaf:Agent) (q- ?y !rdf:type !foaf:Agent) (q- ?x !foaf:name ?n) (q- ?y !foaf:name ?n) (lispp (upi< ?x ?y))) Naive approach: Shaking hands! Wednesday, November 30, 11
  • 29. A LITTLE BIT ABOUT THE IDENTIFICATION OF SIMILARITY (defun components (vertices n generator) (do ((res nil) (vtx vertices (set-difference vtx (car res) :test #'upi=))) ((null vtx) res) (push (ego-group (car vtx) n generator) res))) (defsna-generator same-journal (node) (select0 (?j) (q- (?? node) !bibo:issn ?i) (q- ?j !bibo:issn ?i) (lispp (utils::check-issn (part->value ?i))) (lispp (upi< node ?j)) (q- ?j !dc:title ?t2) (q- (?? node) !dc:title ?t1) (lispp (> (utils::jaro-winkler-distance (part->value ?t1) (part->value ?t2)) 0.7)))) (let ((nodes (mapcar #'subject (get-triples-list :p !bibo:issn :limit nil)))) (dolist (g (components nodes 2 'same-journal))) (merge-nodes g)) An ad-hoc solution: breath-first-search of connected components! Wednesday, November 30, 11