SlideShare une entreprise Scribd logo
1  sur  236
Tutorial at CAISE 2010
Information Quality in the Web Era
   C. Batini & Matteo Palmonari
        Department of Computer Science,
          Communication and Systems
          University of Milano Bicocca
      [batini;palmonari]@disco.unimib.it   1
Outline
•  Motivation [Palmonari]
  –  the Web era / information quality meeting
     ontologies / the ontology landscape /
•  Quality of data (conceptual level) [Batini]
  –  frameworks / metamodels / dimensions /
     metrics / groups of schemas
•  Quality of ontologies [Palmonari]
  –  frameworks / metamodels / dimensions /
     metrics
•  Conclusions [Palmonari+Batini]
                                                 2
Outline
•  Motivation:
  –  the Web era / information quality meeting
     ontologies / the ontology landscape /
•  Quality of data (conceptual level)
  –  frameworks / metamodels / dimensions /
     metrics / groups of schemas
•  Quality of ontologies
  –  frameworks / metamodels / dimensions /
     metrics
•  Conclusions
                                                 3
the Web era is
characterized by…




   The “Big Data”
     phenomenon
How to make sense of all these data?




                                   5
Documents’ and diplicates’ size along time




                                         6
How to make sense of all these data?




                    Data management needs
                         data quality




                                            7
How to make sense of all these data?




                    Data management needs
                         data quality




                                            8
Data/information heterogeneity in Information Systems
Information is available in different formats and is represented
   according different models

Place       Country   Population   Main economic activity

Portofino   Italy     7.000        Tourism




                                                            Need to consider information
                                             Image           quality for heterogeneous
Structured data                                                 information sources

Portofino                                               Map



                                                                   Dear Laure, I try to describe the wonder-
                                                                   ful harbour of Portofino as I have seen
                                                      Text         this morning a boat is going in, other boats
                                                                   are along the wharf. Small pretty buildings
                                                                                                           9
                                                                   and villas are looking on to the harbour.
Tutorial Background - Data Quality (Structured Data)
23rd International Conference on Conceptual Modeling (ER 2004), Shangai

                 A Survey of Data Quality Issues in
                  Cooperative Information Systems
                      Carlo Batini   Monica Scannapieco
    Università di Milano “Bicocca”   Università di Roma “La Sapienza”
           batini@disco.unimib.it    monscan@dis.uniroma1.it
Tutorial Background – Towards Information Quality
                (Heterogenous Data)




Tutorial at ER 08, Barcelona, Spain


   Quality of Data, Textual Information and Images: a
                   comparative survey
                           Speaker: C. Batini
           Other authors: F. Cabitza, G. Pasi, R. Schettini
Dipartimento di Informatica, Sistemistica e Comunicazione, Universita’ di
                      Milano Bicocca, Milano, Italy
                         batini@disco.unimib.it
How to make sense of all these data?

                Together with automatic techniques for
                 information extraction, processing &
            integration, also need automatic techniques for
                  assessing the quality of information

             Information quality for information shared,
                consumed and delivered on the Web

            Increasing attention to information semantics




                                                         12
Of course, the “Semantic Web” perspective
•  Make the semantics of                                                                                  1998
   information explicit with Web-
   compliant ontologies* by
    –  sharing conceptualizations/
       terminologies on the Web
    –  sharing data on the Web
•  Models, languages &
   technologies
      –  E.g. RDF, RDFS, OWL, SKOS
                                                                                                          2006




By now, let’s consider a very broad definition
An ontology is a specification of a conceptualization.
T. R. Gruber. A translation approach to portable ontologies. Knowledge Acquisition, 5(2):199-220, 1993.          13
Ontologies out of the Semantic Web
•  But also for the ones that are skeptic wrt the semantic
   Web,
•  Ontologies (e.g. OWL ontologies, linked data, thesauri)
   can be considered useful external resources to use in
   –    Conceptual modeling
   –    Data integration
   –    Document management
   –    Service Oriented Computing
   –    Information retrieval
   –    …
   –    Software Engineering
   –    Information System Design



                                                         14
Ontology + “Information Systems”




                                   15
Ontology + “Software Engineering”




                                    16
Ontologies &
Semantic Resources
•  KB - Axiomatic ontologies (e.g. SUMO)
   –  Terminological (intentional/schema) level:
      concepts, relationships, axioms specifying logical
      constraints
   –  Assertional (extensional/data) level: instances,
      typing, relations between instances
•  LD - Linked data on the Web (e.g. DBpedia)
   –  RDF data, usually light-weight KBs
•  Th – Thesauri (e.g. WordNet)
   –  Lexical ontologies: terms, no schema vs. instances


•  In synthesis, the ontology landscape
   includes:
   –    Shared Vocabulary (KB,LD,Th)
   –    Modeling principles (KB)
   –    Logical theories supporting reasoning (KB)
   –    Web-compliant representations of models and
        data (KB,LD,Th)                                    17
Need for ontology evaluation
•  Ontology “Quality”  Ontology Evaluation
•  Quality of ontologies matters!
  –  In particular, when ontologies:
    •  are built to support specific applications (their
       quality impacts on the application effectiveness)
    •  are searched on the Web, reused, extended
  –  Many ontologies to choose from!

  –  E.g. suppose that you need an ontology
     describing customer and the business domain

                                                           18
Searching for “Customer” with Sindice




                                        19
Searching for “Customer” with Watson




                                       20
Searching for “Customer” on Swoogle




                                      21
Searching for “Customer” on Swoogle
          (refined search)




                                      22
Ontologies and semantic resources should
 be considered in comprehensive studies
 about information quality in the Web era

              Tough work!

     Let’s start from the beginning:
     ontologies and structured data
                                       23
Structured data and ontologies
•  Structured data                   •  Ontologies (KB)
  Instances                               Instances
  Logical Schemas                         Schema                   Tight vs loose
                                                                   instance-schema
  Conceptual schemas                                               coupling




                                                             A
                  - Concpetual level representations
                  - Externalized models (semiotic objects)
                  - Constraints on domain (data)
   Diagrammatic models
   (ER, UML,ORM)                                      Logical models
                                                      supporting reasoning   24
Ontologies and their grandparents
•  Structured data                  •  Ontologies (KB)
  Instances                              Instances
  Logical Schemas                        Schema / Terminologies
  Conceptual schemas
          In this (mini) tutorial we will:

          - focus on the modeling level:
              “Quality of Conceptual Schemas and
                          Ontologies”
                                                            A
          -  provide a guided tour on the topic by
                   - Concpetual level representations
             discussing only part of the material (soon
                   - Externalized models (semiotic objects)
             available online) on domain (data)
                   - Constraints
   Diagrammatic models
   (ER, UML,ORM) on
           -  focus  common aspects and, in         Logical models
            particular, differences                 supporting reasoning   25
Outline
•  Motivation:
  –  the Web era / information quality meeting
     ontologies / the ontology landscape /
•  Quality of data (conceptual level)
  –  frameworks / metamodels / dimensions /
     metrics / groups of schemas
•  Quality of ontologies
  –  frameworks / metamodels / dimensions /
     metrics
•  Conclusions
                                                 26
Outline
•  Motivation:
  –  the Web era / information quality meeting
     ontologies / the ontology landscape /
•  Quality of Conceptual Schemas
  –  frameworks / metamodels / dimensions /
     metrics / groups of schemas
•  Quality of ontologies
  –  frameworks / metamodels / dimensions /
     metrics
•  Conclusions
                                                 27
# of slides

•  About 130  30

•  I will provide mainly a guided
   introduction to the slides
In a database,
      quality can be investigated..

•  At model (language) level
•  At schema (model) level
•  Al instance (value/data) level




                                      29
Data quality dimensions




                          30
Acronym     Data Quality Dimension
TDQM        Accessibility, Appropriateness, Believability, Completeness, Concise/Consistent representation, Ease of manipulation, Value added, Free of
            error, Interpretability, Objectivity, Relevance, Reputation, Security, Timeliness, Understandability

DWQ         Correctness, Completeness, Minimality, Traceability, Interpretability, Metadata Evolution, Accessibility (System, Transactional, Security),
            Usefulness (Interpretability), Timeliness (Currency, Volatility), Responsiveness, Completeness, Credibility, Accuracy, Consistency,
            Interpretability

TIQM        Inherent dimensions: Definition conformance (consistency), Completeness, Business rules conformance, Accuracy (to surrogate source),
            Accuracy (to reality), Precision, Nonduplication, Equivalence of redundant data, Concurrency of redundant data, Pragmatic dimensions:
            accessibility, timeliness, contextual clarity, Derivation integrity, Usability, Rightness (fact completeness), cost.

AIMQ        Accessibility, Appropriateness, Believability, Completeness, Concise/Consistent representation, Ease of operation, Freedom from errors,
            Interpretability, Objectivity, Relevancy, Reputation, Security, Timeliness, Understandability

CIHI        Dimensions: Accuracy, Timeliness Comparability, Usability, Relevance
            Characteristics: Over-coverage, Under-coverage, Simple/correlated response variance, Reliability, Collection and capture, Unit/Item non
            response, Edit and imputation, Processing, Estimation, Timeliness, Comprehensiveness, Integration, Standardization, Equivalence, Linkage
            ability, Product/Historical comparability, Accessibility, Documentation, Interpretability, Adaptability, Value.

DQA         Accessibility, Appropriate amount of data, Believability, Completeness, Freedom from errors, Consistency, Concise Representation,
            Relevance, Ease of manipulation, Interpretability, Objectivity, Reputation, Security, Timeliness, Understandability, Value added.

IQM         Accessibility, Consistency, Timeliness, Conciseness, Maintainability, Currency, Applicability, Convenience, Speed, Comprehensiveness, Clarity,
            Accuracy, Traceability, Security, Correctness, Interactivity.

ISTAT       Accuracy, Completeness, Consistency

AMEQ        Consistent representation, Interpretability, Case of understanding, Concise representation, Timeliness, Completeness Value added, Relevance,
            Appropriateness, Meaningfulness, Lack of confusion, Arrangement, Readable, Reasonability, Precision, Reliability, Freedom from bias, Data
            Deficiency, Design Deficiency, Operation, Deficiencies, Accuracy, Cost, Objectivity, Believability, Reputation, Accessibility, Correctness,
            Unambiguity, Consistency

COLDQ       Schema: Clarity of definition, Comprehensiveness, Flexibility, Robustness, Essentialness, Attribute granularity, Precision of domains,
            Homogeneity, Identifiability, Obtainability, Relevance, Simplicity/Complexity, Semantic consistency, Syntactic consistency.
            Data: Accuracy, Null Values, Completeness, Consistency, Currency, Timeliness, Agreement of Usage, Stewardship, Ubiquity, Presentation:
            Appropriateness, Correct Interpretation, Flexibility, Format precision, Portability, Consistency, Use of storage, Information policy:
            Accessibility, Metadata, Privacy, Security, Redundancy, Cost.

DaQuinCIS   Accuracy, Completeness, Consistency, Currency, Trustworthiness

QAFD        Syntactic/Semantic accuracy, Internal/External consistency, Completeness, Currency, Uniqueness.

CDQ         Schema: Correctness with respect to the model, Correctness with respect to Requirements, Completeness, Pertinence, Readability,
            Normalization, Data: Syntactic/Semantic Accuracy, Semantic Accuracy, Completeness, Consistency, Currency, Timeliness, Volatility,
            Completability, Reputation, Accessibility, Cost.
                                                                                                                                                   31
Reference for
quality of data
 in databases



    2006




                  32
Here we focus on

•  Model level

• Schema level
•  Data level




                                33
Quality of Conceptual Schemas - contents

•  Frameworks and Metamodels proposed
•  Quality of Schemas
  –  Classifications, Dimensions & Metrics: main
     proposals
  –  Comparison of proposals
  –  Improving the quality of schemas
•  Quality of groups of schemas
  –  Quality of Data Integration Architectures
  –  Quality of the documentation for large
     related groups of schemas
                                                   34
Quality of schemas




                     35
Some figures on proposed approaches in the literature
       (from Mehmood 2009, citing Moody 2005)

                      Research    Practice      Mixed
 # of proposals       29          8             2
                                 Frameworks and
 % of total           74%          21%      5%
                                   metamodels
 Empirically validated 6          0             1
 %                    20%         0%            50%
 Generalizable        5           0             0
 %                    175         0%            0%
 Not generalizable    24          8             2
 %                    83%         100%          100%

Generalizable means that the proposal can be applied to
conceptual models in general and is not specific to, e.g., ER
Metaschema of approaches
              Formal          Meta       Classification
            Framework        schema
                                                One/two or three level
                                                     taxnomies

                            Quality
    Concepts and           dimension
                                             Concepts and paradigms
paradigms involved in                        involved in the life cycle
a formally grounded                          of quality, namely in the
 approach to quality         Quality          production assessment
                          subdimension           and improvement
                                                     activities
                             Metrics



                        Examples
                               Experiments
                                                                      37
Krogstie & Solvberg
 (the Scandinavians)         Proposals
                                Meta     Classification
                  Formal       schema
                Framework



• Shanks                     Quality
• Arab French               dimension
• Vassiliadis
                               Quality origins – Batini et al.
                                   The
                            subdimension
                                   • Scandinavians
                                  • Arab French
                                  • Moody
                              Metrics
                                  • Genero et al.
                                  • Herden
                                  • Poels
                        Examples
                               Experiments
                                                                 38
Proposals
  Formal          Meta       Classification
Framework        schema




                Quality
               dimension


                 Quality
              subdimension


                 Metrics



            Examples
                       Experiments
                                                 39
Frameworks for schema quality




                                40
Krogstie and Solvberg framework
                                                                 Social
           Participant                                           quality
           knowledge
                                Perceived           Social actor
                                Semantic            Interpretation
Goal of                          quality
modeling
                     Physical                     Social
                                 Empirical
                      quality                   Pragmatic
                                  quality
          Organizational                          quality
             quality
 Modeling                     Model             Syntactic      Language
                Semantic                                       extension
 domain                       Externalization    quality
                 quality
                               Technical
                               Pragmatic
                                 quality


                              Technical actor
                              Intepretation
                                                                           41
Krogstie and Solvberg framework
                                                                  Social
           Participant                                            quality
           knowledge
                                 Perceived           Social actor
                                 Semantic            Interpretation
Goal of                       Correspondence
                                  quality        between
modeling
                              the conceptual model and
                       Physical
                                  Empirical     Social
                        quality
                                   quality domain
                                     the      Pragmatic
            Organizational                        quality
                quality
 Modeling                      Model             Syntactic      Language
                  Semantic                                      extension
 domain                        Externalization    quality
                    quality
                                Technical
                                Pragmatic
                                  quality


                               Technical actor
                               Intepretation
                                                                            42
Krogstie and Solvberg framework
                        Correspondence between
                                      participant knowledge and
                                      individual interpretation
                                                           Social
           Participant                                           quality
           knowledge
                                Perceived           Social actor
                                Semantic            Interpretation
Goal of                          quality
modeling
                     Physical                     Social
                                 Empirical
                      quality                   Pragmatic
                                  quality
          Organizational                          quality
             quality
 Modeling                     Model             Syntactic      Language
                Semantic                                       extension
 domain                       Externalization    quality
                 quality
                               Technical
                               Pragmatic
                                 quality


                              Technical actor
                              Intepretation
                                                                           43
Krogstie and Solvberg framework
                                                                 Social
           Participant                                           quality
           knowledge
                                Perceived           Social actor
                                Semantic            Interpretation
Goal of                          quality
modeling
                     Physical                     Social
                                 Empirical
                      quality                   Pragmatic
                                  quality
          Organizational                          quality
             quality
 Modeling                     Model             Syntactic      Language
                Semantic                                       extension
 domain                       Externalization    quality
                 quality
                               Technical
                               Pragmatic
                                 quality
                                          Correspondence between
                                          the conceptual model and
                              Technical actor
                              Intepretation
                                                the language
                                                                           44
Krogstie and Solvberg framework
                                                                   Social
             Participant                                           quality
             knowledge
                                  Perceived           Social actor
                                  Semantic            Interpretation
  Goal of                          quality
  modeling
                       Physical                     Social
                                   Empirical
                        quality                   Pragmatic
                                    quality
            Organizational                          quality
               quality
   Modeling                     Model             Syntactic      Language
                  Semantic                                       extension
   domain                       Externalization    quality
                   quality
                                 Technical
                                 Pragmatic
                                   quality
Correspondence between the
conceptual model and the Technical actor
audience’s interpetation Intepretation
                         of it
                                                                             45
Correspondence between participant knowledge and
       Krogstie and Solvberg framework
  the externalized conceptual model
   ° Externalization: the knowledge of social actors
  has been externalized in the model                             Social
  ° Internalizability, the model is persistent
        Participant                                              quality
           knowledge
                                Perceived           Social actor
                                Semantic            Interpretation
Goal of                          quality
modeling
                     Physical                     Social
                                 Empirical
                      quality                   Pragmatic
                                  quality
          Organizational                          quality
             quality
 Modeling                     Model             Syntactic      Language
                Semantic                                       extension
 domain                       Externalization    quality
                 quality
                               Technical
                               Pragmatic
                                 quality


                              Technical actor
                              Intepretation
                                                                           46
Krogstie and Solvberg framework
                                                                  Social
It is reflected by the error frequency when a model is
          Participant                                             quality
                              Perceived
read or written, so by readability and clarity Social actor
          knowledge
                              Semantic               Interpretation
Goal of                           quality
modeling
                      Physical                     Social
                                  Empirical
                       quality                   Pragmatic
                                   quality
           Organizational                          quality
              quality
  Modeling                     Model             Syntactic      Language
                 Semantic                                       extension
  domain                       Externalization    quality
                  quality
                                Technical
                                Pragmatic
                                  quality


                               Technical actor
                               Intepretation
                                                                            47
Krogstie and Solvberg framework
                                                              Social
           Participant                                        quality
           knowledge
                             Perceived           Social actor
                             Semantic            Interpretation
Goal of                       quality
modeling
                     Physical                   Social
                                 Empirical
                      quality                 Pragmatic
                                  quality
          Organizational                        quality
             quality
 Modeling                                     Syntactic     Language
 domain  Agreement on participant knowledge
                Semantic      Model
                                                quality     extension
                 quality      Externalization
           and individual interpretation
                             Technical
                             Pragmatic
                               quality


                           Technical actor
                           Intepretation
                                                                        48
More formally
•  G, the goals of the modeling task.
•  L, the language extension, i.e., the set of all statements that are
   possible to make according to the graphemes, vocabulary, and
   syntax of the modeling languages used.
•  D, the domain, i.e., the set of all statements that can be stated
   about the situation at hand.
•  M, the model (schema) itself.
•  Ks, the relevant explicit knowledge of those being involved in
   modeling. A subset of these is actively involved in modeling, and
   their explicit knowledge is indicated by KM.
•  I, the social actor interpretation, i.e., the set of all statements that
   the audience thinks that an externalized model consists of.
•  T, the technical actor interpretation, i.e., the statements in the
   model as 'interpreted' by modeling tools.


                                                                       49
Main quality types
•  Physical quality: The basic quality goal is that the model M is
  available for the audience.
•  Empirical quality deals with predictable error frequencies when a
  model is read or written by different users, coding (e.g. shapes of
  boxes) and HCI-ergonomics for documentation and modeling-tools.
  For instance, graph layout to avoid crossing lines in a model is a
  mean to address the empirical quality of a model.
•  Syntactic quality is the correspondence between the model M
  and the language extension L.
•  Semantic quality is the correspondence between the model M
  and the domain D. This includes validity and completeness.
•  Perceived semantic quality is the similar correspondence
  between the audience interpretation I of a model M and his or hers
  current knowledge K of the domain D.
•  Pragmatic quality is the correspondence between the model M
  and the audience's interpretation and application of it (I).       50
Framework for language (model) quality




                                         51
Framework for language (model) quality


           Participant                                 Social actor
           knowledge                                   Interpretation
                         Participant appropriateness
Goal of
modeling
                  Organizational      Modeler appropr. Comprehensibility
                                                        appropriateness
                 appropriateness
                              Model
                              Externalization
                                                                 Language
 Modeling            Domain
                                                                 extension
 domain          Appropriateness

                                                     Tool
                                                Appropriateness
                            Technical actor
                            Intepretation

                                                                         52
Main quality types
Domain appropriateness. This relates the language and the domain.
  Ideally, the conceptual basis must be powerful enough to express
  anything in the domain, not having what terms construct deficit. On
  the other hand, you should not be able to express things that are
  not in the domain, i.e. what is termed construct excess. Domain
  appropriateness is primarily a mean to achieve semantic quality.
Participant appropriateness relates the social actors’ explicit
  knowledge to the language. Participant appropriateness is primarily a
  mean to achieve pragmatic quality both for comprehension, learning
  and action.
Modeler appropriateness: This area relates the language
  extension to the participant knowledge. The goal is that there are
  no statements in the explicit knowledge of the modeler that cannot
  be expressed in the language. Modeler appropriateness is primarily a
  mean to achieve semantic quality.

                                                                   53
Main quality types
Comprehensibility appropriateness relates the language to the
  social actor interpretation. The goal is that the participants in the modeling
  effort using the language understand all the possible statements of the
  language. Comprehensibility appropriateness is primarily a mean to achieve
  empirical and pragmatic quality.
Tool appropriateness relates the language to the technical audience
  interpretations. For tool interpretation, it is especially important that the
  language lend itself to automatic reasoning. This requires formality (i.e.
  both formal syntax and semantics being operational and/or logical), but
  formality is not necessarily enough, since the reasoning must also be
  efficient to be of practical use. This is covered by what we term
  analyzability (to exploit any mathematical semantics) and executability (to
  exploit any operational semantics). Different aspects of tool
  appropriateness are means to achieve syntactic, semantic and pragmatic
  quality (through formal syntax, mathematical semantics, and operational
  semantics).
Organizational appropriateness relates the language to standards
  and other organizational needs within the organizational context of
  modeling. These are means to support organizational quality.               54
Metamodels




             55
Shanks et al. composite model
Theory based
       Domain                Quality type                       Means



                 Language                      Goal      Property


  Prqa                              Model                           Activity
  Audience


                        Weighting              Quality factor




             Rating                 Evaluation method    Practice based


                                                                               56
Metamodels – Arab/French
 Mehmood, Chefri et al. 2009, based on goals,
             question, metrics


 Quality goal      Q. Dimension         Q. Attribute


                        Model element



Transformation    Transformation          Q. Metric
     step              rule



                                                  57
Metamodel instantiation

Quality goal              Ease of change

Dimension         Complexity                Mantainability

 Quality        Simplicity Structural   Modu     Under       Modi
 attribute                 complexity   larity   standa      fiabi
                                                  bility     lity
 Quality           # of          # of
 metric         associations dependencies

 Transfor       Merge    Divide
  mation       entities The model




                                                                     58
Metamodels – Vassiliadis et al. For DWs

            Quality goal              Q. Dimension


 Improvement                         Factor
 process
                 Interaction                      Measurem.
                                    Q. Metric
                                                   method
Transformation

           Information                Measurem.
           System object                value        Date


 Data o.   Process o.    Model o.
                                                        59
Quality goal                Q. Dimension

 Comparison
                  Improvement                              Factor
                  process


                                     Interaction                           Measurem.
                                                          Q. Metric         method

                  Transformation



    Vassiliadis                Information
                               System object
                                                             Measurem.
                                                               value           Date




                  Data o.      Process o.      Model o.


Quality goal           Q. Dimension                            Q. Attribute

                                       Model element
   Mehemood
Transformation              Transformation                            Q. Metric
     step                        rule
Schema Quality Dimensions




                            61
The origins…
Batini, Ceri, Navathe 1991

                    Formal Meta Classifica
                    Frame schema tion
                     work




                          Quality
                         dimension


                          Quality
                        subdimension


                         Metrics



                    Examples
                           Experiments
Batini, Ceri, Navathe 1991
Q. Dimension        Definition
Completeness        Represents all (only) relevant features of
Pertinence          requirements
Correctness -       Concepts are properly defined in the schema
Syntactic
Correctness -       Concepts are used according to their definitions
Semantic
Minimality          Every aspect of reqs. appears only once in the schema

Expressiveness      Can be easily understood
Readability         Diagram respects aesthetic criteria
Self-explaination   Other formalisms and languages not needed
Extensibility       Easily adapted to changing requirements
Normality           From theory of normalization                        63
Completeness
Completeness measures the
extent to which a conceptual           Students have a
Schema includes all the                code, a name, a
                                       place of birth.
conceptual elements necessary to
meet some specified requirements.
It is possible that the designer has
not included certain characteristics
present in the requirements in the
                                                         Code
schema, e.g., attributes related to
                                          Student        Name
an entity Person; in this case, the
schema is incomplete.




                                                            64
Pertinence
Pertinence measures how many
unnecessary conceptual            Students have a
                                  code and a name.
elements are included in the
Conceptual schema. In the case
of a schema that is not
pertinent, the designer has
Gone too far in modeling the                         Code
requirements, and has included      Student          Name
                                                      Place_of
too many concepts.                                    Birth




                                                         65
Correctness - syntactic
Concerns the correct use of the
categories of the model in representing
requirements.
                                             Student
Example – In the Entity Relationship
model we may represent the                           (1,n)

logical link between persons and their         has


first names using the two entities Person            (1,1)
and FirstName and a relationship between    First Name
them. The schema is not correct wrt the
model since an entity should be used only
when the concept has a unique existence
in the real world and has an identifier.


                                                             66
Correctness - semantic
Correctness with respect to requirements
concerns the correct representation of
The requirements in terms of the model
                                              Manager
categories.
                                                    (1,n)
Example - In an organization each
department is headed by exactly one             heads


manager and each manager may head                   (1,1)

exactly one department.                      Department
If we represent Manager and Department
as entities, the Relationship between them
should be one-to-one; in this case, the
Schema is correct wrt requirements. If we
Use a one-to-many relationship, the
schema is incorrect.
                                                            67
Minimality/Redundancy

                                              1,n
A schema is minimal if every      Student

part of the requirements is            1,n


represented only once in the      Attends

                                       1,n
schema. In other words, it is
                                  Course            Assigned to
not possible to eliminate some         1,?

element from the schema           Teaches

without compromising the              1,n

                                 Instructor
information content.                          1,n




                                                                  68
Expressiveness/Readability
Intuitively, a schema is readable whenever it represents
the meaning of the reality represented by the schema in a
clear way for its intended use. This simple, qualitative
definition is not easy to translate in a more formal way,
since the evaluation expressed by the word clearly
conveys some elements of subjectivity. In models, such as
the Entity Relationship model, that provide a graphical
representation of the schema, called readability concerns
both the diagram and the schema itself.




                                                       69
Diagrammatic readability

With regard to the diagrammatic representation,
readability can be expressed objectively by a
number of aesthetic criteria that human beings adopt in
drawing diagrams:
   1.  crossings between lines should be minimized,
   2.  graphic symbols should be embedded in a grid,
   3.  lines should be made of horizontal or vertical segments,
   4.  The number of bends in lines should be minimized,
   5.  the total area of the diagram should be minimized, and, finally,
   6.  Parents in generalization hierarchies should be positioned at a
       higher level in the diagram in respect to children.
   7.  The children entities in the generalization hierarchy should be
       symmetrical with respect to the parent entity.


                                                                     70
Unreadable schema
              Works                                  Manages



  Head
                                                  Employee

             Floor                                                                    Purchase
                               Vendor
             Located


                                                   Born
                       In
Department
                                                           Warehouse
                                                                                          Engineer
                       Worker
                                                                                 Of
  Produces
                                    Acquires            Order

                                                                   Item                 Type
                                           City

                                                                                      Warranty
                                                                                                 71
                            @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
A Readable schema
   Floor

   Located                                  Manages



                     Head
                                                                     Born       City
Department                               Employee
                     Works




  Produces
                         Vendor              Worker                 Engineer



   Item         In



             Warehouse
   Type



                             Acquires          Order                   Of      Purchase
 Warranty

                                                                                       72
                      @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
Is diagrammatic readability objective?
                                                                                                                                                         SEM Place close
                                                                                                                                                           entitities in
SYNT Minimize                                                                                                                                            generalizations
   bends
                                                    Works                            Manages



                                        Head
                                                                                    Employee                               SYNT Minimize
                                                                                                                              Minimize
                                                   Floor                                                          Purchase
                                                                                                                              crossings
                                                                                                                             crossings…
                                                                    Vendor
                                                    Located


SEM Place most                                                                      Born                                                         Don’t change at all !
   important
                                                              In
                                    Department
                                                                                           Warehouse
                                                                                                                      Engineer
                                                              Worker
 concept in the                                                                                          Of


     middle
                                        Produces
                                                                       Acquires        Order

                                                                                               Item                 Type
                                                                             City

SYNT Use only                                                                                                     Warranty


  horizontal

                                                                                                                   Works                          Manages
     Floor
                                                                                                       Head
     Located                       Manages                                                                                                       Employee

  Department
                   Head
                                  Employee          Born            City                                           Floor                                                  Purchase
                   Works                                                                                                          Vendor
                                                                                                                   Located

                                                                                                                                                  Born


    Produces           Vendor       Worker         Engineer                                       Department                 In
                                                                                                                                                         Warehouse
                                                                                                                                                                             Engineer
                                                                                                                             Worker
     Item         In
                                                                                                                                                                     Of
                                                                                                       Produces
     Type
               Warehouse                                                                                                              Acquires       Order
                                                                                                                                                             Item           Type
   Warranty            Acquires     Order            Of            Purchase                                                               City
                                                                                                                                                                          Warranty   73
                                                                              @C.Batini, 2009
But ……personal experience in China,
                            Beda University, about 1985

Question to chinese professors:
Which one of the two diagrams do you like more?
               Works                           Manages                                          Floor

                                                                                                Located                               Manages
   Head
                                              Employee
                                                                                                                  Head                                      City
                                                                                                                                                 Born
              Floor                                                                          Department                              Employee
                                                                      Purchase                                    Works
                             Vendor
              Located


                                              Born                                             Produces
                                                                                                                      Vendor           Worker   Engineer
                        In
 Department
                                                     Warehouse
                                                                          Engineer
                        Worker
                                                                                                Item         In
                                                                 Of
   Produces
                                 Acquires        Order                                                    Warehouse
                                                                                                Type

                                                         Item           Type
                                       City                                                                               Acquires     Order      Of       Purchase
                                                                                              Warranty
                                                                      Warranty




                         Answer: definitively the left one,
                         we like asymmetry and movement …
                                                                                                                                                                      74
                                                                                     @C.Batini, 2009
Expressiveness

The second issue addressed by readability is the
compactness of schema representation. Among the
different conceptual schemas that equivalently represent a
certain reality, we prefer the one or the ones that are
more compact, because compactness favors readability.




                                                        75
Transformation the preserves
              information content
    and enhances compactness/expressiveness



         Employee                                   Born      City
                                        Employee


Vendor     Worker   Engineer
                               Vendor     Worker   Engineer

           Born



 Born      City     Born




                                                                76
Normalization
             Unnormalized ER schema
                Employee-Project
                Employee #
                Salary
                Project #
                Budget
                Role

              Normalized ER schema

  Employee      1,n                 1,n   Project
                      Assigned to
Employee #                                Project #
Salary                   Role             Budget

                                                      77
Scandinavians (1994-

    Formal        Meta Classification
  Framework      schema




                Quality
               dimension


                Quality
              subdimension


               Metrics



       Examples
              Experiments
Scandinavians (1994-

    Formal        Meta Classification
  Framework      schema




                Quality
               dimension


                Quality
              subdimension


               Metrics



       Examples
              Experiments


                                        79
Main model (schema) quality dimensions
Physical quality
•   Externalization, number of statements on the domain not yet stated in the model/total # of stat.
•   Interalizability
      –  Persistence, proptection against loss or damage
      –  Availability, usual meaning
Empirical quality, deals with readability by the audience
      Expressed in terms of graph aesthetics and graph layout criteria
Syntactic quality, correspondence between the model (schema) and the language (model), where errors are due by
      Syntactic invalidity, words or graphems not part of the language are used
      Syntactic incompleteness, the model lack constructs to obey the language’s grammar (e.g. usa only one
          cardinality to express minimum and max cards
Semantic quality
      (feasible) Validity, the stements in the model are correct and relevant for the problem
      (feasible) Completeness, the model contains all the stements which would be correct and relevant
Perceived semantic quality the correspondence between the actor interpetation of the model and her current
    knolwledge of the domain
      Validity
      Completeness
Pragmatic quality, the correspondence between the model and the audience interpretation of it
      (Feasible) Comprehension the actors undesrstaod the moled, or else individual actors und. The part of the
          model relevant to them
Social quality,
      Agreement in knowledge,
      Agreement in model interpretation
Knowledge quality, that is perfect when the audience knew everything about the domain at a given time.
      Validity
      Completeness                                                                                              80
Language quality dimensions - 1
May refer
   a. to the language or else
   b. to the relationship btwn language and other issues.
In the first case may refer to:
   –  the constructs of the language
   –  the external visual representation
For both
   Perceptibility, how easy for persons is language comprehension
   Expressive power, what it is possible to espress in the language
   Expressive economy, hoe effectively can things be expressed in the
      lanugage
   Method/tool potential, how easily the language lends itself to proper
      method or tool support.
   Reducibility, what features are provided by the language to deal with
      large and complex models.

                                                                           81
Language quality dimensions - 2
Referring to the relationship btwn the language and other
  issues
Domain appropriateness, there are not statements in the
  domain that cannot be expressed in the language
Participant kn. appr., statements in the language models
  are part of the explicit knowledge of participants.
Knowledge externalizability appr. There are no
  statements in the explicit kn. of the participants that
  cannot be expressed in the language
Comprehensibility appr
Technical actor interpretation appr.


                                                       82
More of Pragmatic quality
•  Social pragmatic quality (to what extent
 people understand and are able to use the
 models) and technical pragmatic quality (to
 what extent tools can be made that interpret
 the models).




                                              83
Arab French (2002-

  Formal        Meta Classification
Framework      schema




              Quality
             dimension


               Quality
            subdimension


             Metrics



     Examples
            Experiments
Chefri et al. classification
•    Specification
      –    Legibility
             •    Clarity
             •    Minimality
                     –      Non Redundancy
                     –      Factorization degree
                     –      Aggregation degree
      –    Expressiveness
             •    Concept expressiveness
             •    Schema expressiveness
      –    Simplicity
      –    Correctness
•    Usage
      –    Understandability
             •    Documentation degree
             •    User Vocabulary
             •    Concept independence degree
      –    Completeness
             •    Requirements coverage degree
             •    Cross modeling completeness

•    Implementation
      –  Implementability
      –  Maintainability
             •    Modifiability
             •    Cohesion
             •    Coupling
                                                              85
Definitions – 1
Q. Dimension                    Definition
Clarity                         is an aesthetic criterion, based on the graphical arrangement
Minimality                      Every aspect of the requirements appears only once
Min - Non Redundancy            No concept can be canceled without decreasing the information content
Min - Factorization degree      Measures the effectiveness of inheritance hierarchies of the schema
Min - Aggregation degree        Measures the efficient use of aggregate attributes in the schema
Expressiveness                  The schema can be easily understood without additional explaination
Exp – Concept and schema expr   Compactness
Simplicity                      The schema contains the minimum possible constructs
Correctness (syntactical)       Concepts are properly defined in the schema
Understandability (model)       The easy with which the data model can be intepreted by the user
Understandability (schema)      How much modeling features are made explicit
Und – Documentation degree      Presence of additional documentation for concepts
Und – User vocabulary rate      Users are able to make easy correspondences btwn schema and reqs.
Und Concept independ. degree    “short paths” for semantic intercnnections (ex. A ISA B)




                                                                                                   86
Definitions - 1
Q. Dimension             Definition
Completeness             The schema represents all relevant features in the
                         requirements
Comp – Requirements      Correpondence btwn concepts in sch. and relevant terms in
coverage                 reqs
Comp – Cross modeling    Presence in a sch S1 of all concepts in schemas in a set
compl.
Implementability         Amount of effort to implement the schema
Imp - Implementability   Overall semantic distance btwn concept is the source m and
                         conc in the target model
Maintainability          Ease with which the schema can evolve
Man - Modifiability      # of modif. related to a concept mod. deriving from
                         dependencies
Man - Cohesion           Existence of clusters with high # of internal links btwn
                         clusters compared with external links
Man – Coupling           Existence of clusters with low # of links btwn clusters



                                                                                     87
Chefri et al. classification – metrics
                   (examples)
Specification
  Legibility
  –  Clarity    # of concepts – number of crossings in the diagram
  –  Minimality
      •  Non Redundancy (# weight. conc. - # red. Conc.)/ total # weigh conc.
      •  Factorization degree
      •  Aggregation degree
  Expressiveness
  –  Concept expressiveness
  –  Schema expressiveness
  Simplicity
  Correctness



                                                                           88
Metrics for structural complexity

•    # of associations
•    # of dependencies
•    # of aggregations
•    Depth inheritance tree, longest path from
     the root of a hierachy to the leaves
Moody 1998 -

                        Meta Classification
               Formal
                       schema
Method for   Framework




                      Quality
                     dimension


                       Quality
                    subdimension


                     Metrics



                Examples
                       Experiments
Moody’s classification

•    Completness
•    Integrity
•    Flexibility
•    Understendability
•    Correctness
•    Simplicity
•    Implementability
•    Integration  Quality of related groups of
     schemas (see later)

                                                  91
Moody’s classific. of Quality dim. and metrics - 1

Dimension        Definition
Completeness     The schema contains all the information required to meet reqs.
Completness M1   # of items that do not correspond to reqs.
Completness M2   # of reqs. Not represented in the schema
Completness M3   # of items that inacurrately represent reqs
Completness M4   # of inconsistencies in the schema
Integrity        Extent to which the business rules on data are enforced by the sch.
Integrity M1     # of business rules not enforced by the schema
Integrity M2     # of integrity constr. In the schema not accurate in repr. Bus. rules
Flexibility      The ease with which the schema can cope with business change
Flexibility M1   # of elements in the sch. Which are subject to change in the future
Flexibility M2   Estimated cost of changes
Flexibility M3   Strategic importance of change


                                                                                   92
Moody’s classific. of Quality dim. and metrics - 2
Dimension           Definition
Understandability   Ease with which the schema can be understood
Understandability   User rating
M1
Understandability   Ability of users to interpret the model correctly
M2
Understandability   Application developer rating
M3
Correctness         The schema conforms to the rules of the conceptual
                    model
Correctness M1      # of violations to model conventions
Correctness M2      Intra ent. Redundancy: Number of normal form
                    violations
Correctness M3.a    Inter ent. Redundancy: # of redund. concepts in the
                    schema

                                                                          93
Moody’s classific. of Quality dim. and metrics - 3

Dimension        Definition
Simplicity       The schema contains the minimum possible constructs
Simplicity M1    # of entities
Simplicity M2    # of entities + relationships
Simplicity M3    # of entities + relationships + attributes
Implementability Ease with which the schema can be implemented within
                 time, budget, technology constraints
Implement M1     Technical risk rating
Implement M2     Schedule risk rating
Implement M3     Development cost estimate




                                                                        94
Moody’s monumental contribution to empirical quality/
      quality of diagrammatic notations (TSE 2009)
Semiotic clarity – there should be a 1:1 correspondence between
  semantic constructs and graphical symbols
   Symbol redundancy
   Symbol overload
   Symbol excess
   Symbol deficit
Perceptual discriminability: different symbols should be clearly
   distinguishable form each other
   Visual distance
   Discriminability treshold
Semantic transparency: use visual representations whose appearenace
  suggests their meaning, where symbols can be
   Immediate
   Semantically opaque
   Semantically perverse
   Semantic translucent
Moody’s monumental contribution to empirical quality/
     quality of diagrammatic notations (TSE 2009)
Complexity management: include explicit mechanisms for
  dealing with complexity
   Modularization
   Abstraction
Cognitive integration: include explicit mechanisms to
  support integration of information for different
  diagrams
   Conceptual integration
      Contextualization
   Perceptual integration
      Wayfinding
Moody’s monumental contribution to empirical quality/
     quality of diagrammatic notations (TSE 2009)
Visual expressiveness: use the full range and capacities of
  visual variables
   Degree of visual freedom
   Saturation
Dual coding: use text to complement graphics
Graphic economy: the number of different graphical
  symboles should be cognitively maneageble
   Symbol deficit
Cognitive fit: use different visual dialects for different
  tasks and audiences
   Visual mono/plurilinguism
Moody’s monumental contribution to empirical quality/
      quality of diagrammatic notations (TSE 2009)
               Interactions among principles
Semiotic Clarity can affect Graphic Economy either positively or
   negatively: Symbol excess and symbol redundancy increase graphic
   complexity, while symbol overload and symbol deficit reduce it.
Perceptual Discriminability increases Visual Expressiveness as it
   involves using more visual variables and a wider range of values (a
   side effect of increasing visual distance); similarly, Visual
   Expressiveness is one of the primary ways of improving Perceptual
   Discriminability.
Increasing Visual Expressiveness reduces the effects of graphic
   complexity, while Graphic Economy defines limits on Visual
   Expressiveness (how much information can be effectively encoded
   graphically).
Increasing the number of symbols (Graphic Economy) makes it more
   difficult to discriminate between them (Perceptual Discriminability).
Perceptual Discriminability, Complexity Management, Semantic
   Transparency, Graphic Economy, and Dual Coding improve
   effectiveness for novices, though Semantic Transparency can
   reduce effectiveness for experts (Cognitive Fit).
Semantic Transparency and Visual Expressiveness can make hand
   drawing more difficult (Cognitive Fit)
Others…
Genero et al. 2005 -

  Formal      Meta         Classifica
Framework    schema           tion




              Quality
             dimension


               Quality
            subdimension


             Metrics



      Examples
             Experiments
Genero et al classification
Maintainability is influenced by the following subcharacteristics:
•  Understandability: the ease with which the conceptual data model can be
  understood.
•  Legibility: is the ease with which the conceptual data model can be read,
   with respect to certain aesthetic criteria [13].
•  Simplicity: means that the conceptual data model contains the minimum
   number of constructions possible.
•  Analysability: the capability of the conceptual data model to be diagnosed
   for deficiencies or for parts to be modified.
•  Modifiability: the capability of the conceptual data model to enable a
   specified modification to be implemented.
•  Stability: the capability of the conceptual data model to avoid unexpected
   effects from modifications.
•  Testability: the capability of the conceptual data model to enable
   modifications to be validated

                                                                            101
Herden

           Formal      Meta
           Frame                Classification
                      schema
            work




                     Quality
Metadata            dimension


                   Quality
                subdimension


                    Metrics



            Examples
                   Experiments
Herden classification
•    Correctness
•    Consistency
•    Scope
•    Level of detail
•    Completeness
•    Minimality
•    Ability of integration (see later)
•    Readability



                                          103
Herden
Dimension                 Definition
(Technical) Correctness   Correctness of concepts w.r.t reqs.

(Technical) Consistency   Absence of contradiction

Scope                     Comprehensive w.r.t. general user acceptance

Level of detail           Adequacy in detail w.r.t. user acceptance

Completeness              Completeness w.r.t. requirements

Minimality                Compactness and absence of redundancies

Readability               Completeness od documentation




                                                                         104
Metadata in Herden’s classification

•    Description
•    Relevance
•    Measuring
•    Metric
•    Degree of automation
•    Objectivity




                                             105
Poels et al

Formal      Meta     Classifica
Frame      schema       tion
 work




          Quality
         dimension


        Quality
     subdimension


         Metrics



 Examples
        Experiments
Poels et al
Interested in
•  Perceived semantic quality
•  Perceived pragmatic quality
To understand their relationship with
1.  Perceived ease of use (efficiency)
2.  Perceived usefullness (effectiveness)
and
3.  User information satisfaction
Poels et al. classification
Quality#   Quality dimension     Definition
PSQ1                             The schema represents the business process
                                 correctly
PSQ2                             The schema is a realistic representation of
                                 the business process
PSQ3                             The schema contains contradicting elements
PSQ4                             The schema contains redundant elements
PSQ5                             Elements must be added to faithfully
                                 represent the business process
PSQ6                             All the elements in the conceptual schema are
                                 relevant for the representation of the
                                 business process
PSQ7                             The schema gives a complete representation
                                 of the business process



                                                                             108
                        @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
Poels et al. classification
Quality#   Quality dimension       Definition
PSQ1       Correctness/            The schema represents the business process
           Validity                correctly
PSQ2       Feasible cor-           The schema is a realistic representation of
           rectness/validity       the business process
PSQ3       Coherence               The schema contains contradicting elements
PSQ4       Non redundancy          The schema contains redundant elements
PSQ5       ???                     Elements must be added to faithfully
                                   represent the business process
PSQ6       Relevance               All the elements in the conceptual schema are
                                   relevant for the representation of the
                                   business process
PSQ7       Completeness            The schema gives a complete representation
                                   of the business process



                                                                               109
                          @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
Poels et al general findings


                     Perceived
                    usefullness
            0,1                   0,38
Perceived                                     User
Semantic                                  Infornation
 quality               0,58               satisfaction
            0,35
                                   0,29
                    Perceived
                   ease of use
Comparison
among proposals
Physical and empirical quality

Author(s)/Types of       Batini     Scand.   Moody   ArabFrench   Genero et   Herden   Poels
qualities                et al 91   94-      98-     02-          2005
Physical quality
Externalization                       x

Persistence                           x

Availability                          x

Empirical quality
Minimality                                                x                     x          x

Readability/legibility      x                  x          x           x         x

Expressiveness              x                  x          x

Simplicity/self             x                  x          x           x
explaination
Graph aesthetics/           x         x        x
readability/Clarity
Understandability                             X-3         x           x




                                                                                          112
Syntactic and semantic quality

Author(s)/Types of           Batini     Scand.   Moody   ArabFrench   Genero    Herden   Poels
qualities                    et al 91   94-      98-     02-          et 2005
Syntactic quality
Invalidity                      x         x        x         x

Incompleteness                            x

Semantic quality
Validity/Correctness            x         x       X-1                             x          x

Feasible validity                         x                                                  x

Normality                       x

Integrity                                         X-2                             x          x

Completeness                    x         x
                                                  X-4        x                               x


Level of detail                                                                   x

Scope                                                                             x

Relevance/Pertinence            x         x                                                  x

Perceived semanitc quality                x

Analyzability                                                            x

Testability                                                              x


                                                                                                 113
Pragmatic, knowledge and process quality

Author(s)/Types of       Batini et   Scand.   Moody   ArabFren   Genero et   Herden   Poels
qualities                al 91-      94-      98-     ch 02-     05
Pragmatic quality
Comprehension                        x

Social quality                       x

Agreement in                         x

knowledge
Agreement in model
interpret.
Knowledge quality
Completeness                         x

Validity                             x

Process quality
Implementability                                      x

Stability                                                        x

Maintainability/ Fle-    x                    X - 3   x

xibility/Extensibility

                                                                                              114
Specific dimensions
Sheldon classification for Inheritance hierarchies
Viewpoints.
•  (1) The deeper a class is in the hierarchy, the higher the
   degree of methods inheritance, making it more complex
   to predict its behavior.
•  (2) Deeper trees constitute greater design complexity,
   since more methods and classes are involved.
•  (3) The deeper a particular class is in the hierarchy, the
   greater the potential reuse of inherited methods.




                                                                         116
                    @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
Sheldon classification for Inheritance hierarchies

•  Maintainability
•  Understandability
•  Modifiability




                                                                     117
                @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
Schema and Data Quality together
Person
When a schema is defined, quality
                                                               ID   Name    Surname
can be achieved working both on
the schema and on the instance                                 1    John    Smith

                                                               2    Mark    Bauer

                                                               3    Ann     Swenson




    Person                                          Address
ID    Name   Surname       Address                  ID     StreetPrefix    StreetName   Number       City
1     John   Smith         113 Sunset Avenue        A11    Avenue          Sunset       113          Chicago
                           60601 Chicago
2     Mark   Bauer         113 Sunset Avenue        A12    Street          4 Heroes     null         Denver
                           60601 Chicago
3     Ann    Swenson       4 Heroes Street Denver
                                                          ResidenceAddress

                     (a)                                   PersonID        AddressID
                                                           1               A11
                                                                                               (b)
                                                           2               A11
                                                           3               A12

                                                                                                       119
Experimentally investigated
     by Arab French

     Quality at schema level


             Impact




     Quality at data level     Interdependencies
Improving the quality
    of schemas




                        121
Methods

•  Origins: achieving normal form 
   Decomposition techniques
•  Scandinavian: derived from the framework
•  Through schema transformations




                                        122
Derived from framework
            Syntactic quality

•  Error prevention through syntax directed
   editors
•  Error detection through syntactic checks




                                         123
Derived from framework
              Semantic quality
•  Consistency checking
  –  Based on a logical description
  –  Based on constructivity, namely through
     properties of the generation process
     (Langefors et al.)
  –  Use of driving questions to improve
     completeness




                                               124
Derived from framework
               Pragmatic quality
•  Audience training
•  Inspection and walkthroughs
•  Transformations (see also later)
  –  Rephrasing
  –  Filtering
•  Translation
  –  Explaination generation
  –  Model execution
•  Documentation
•  Prototyping
                                      125
Derived from framework
                 Social quality
•  Integration
  –  Intra project
  –  Inter project
  –  Inter organizational


•  Integration process
  –  Pre-integration
  –  Viewpoint comparison
  –  Viewpoint conforming
  –  Merginf and restructuring
                                     126
Through schema transformation




                                127
The Assenova Johannesons approach
              Dimensions considered

Dimension         Definition

Explicitness      Requirements are represented at the schema
                  level, not at instance level
Size              # of entities + relationships + attributes

Rule simplicity   High # of business rules are represented by
                  simple type of constraints
Rule uniformity   Cardinality constraints are uniform,

Query simplicity Simple retrieval form requirements corresponds
                 to simple queries on the schema
Stability         Small changes in requrements result in small
                  changes in requirements
                                                                 128
Dimensions and transformations
                                   Explicit   Size   Rule sim-   Rule uni-   Query      Stabi
                                   ness              plicity     formity     Simplic.   lity

Partial attributes                             -                    +                     +
Non surjective attributes                      -                    +                     +
Partial attr. which are total in                        +           +           -         +
Union
Non-surg. attributes surjective                         +           +           -         +
in Un.
M-N attributes                                 -                    +           -         +
Lexical attributes                             -                    +           -         +
Attributes with fixed ranges          +        -                                =         +
Two non disjoint entities             +        -        +                       +
Non unary “overloaded”                                                        +/-         +
attributes
                                                                                         129
Example transformation
                 Partial attribute




- The size of the schema increases (-)
- Introducing the entity EMPLOYEE results in
    -  increased rule uniformity (+) (all attributes are total)
    -  increased stability (+) 
                                                            130
Example of increased stability
Introducing different categories of employees can be done
  in the new schema without violating rule simplicity
The same cannot be done in the old schema




          Old schema               New schema
                                                      131
Quality in data integration
      architectures
The approach of Akoka et al. (2007)
General statement: In DI Architectures quality of data
  and quality of schemas have to be considered together
•  Qualities at schema level
    –    Completeness,
    –    Understandability
    –    Minimality
    –    Expressiveness
•  Qualities at data level
    –  Completeness
          •  Coverage
          •  Density
    –  Uniqueness
    –  Consistency
    –  Freshness
          •  Currency
          •  Timeliness
    –  Accuracy
          •  Semantic
          •  Syntactic
          •  Precision
The approach of De Conseicao et al (2007)
   Relevant qualities to be evaluated in DI arch.
Given a DI Architecture defined in terms of
•  [Data, Local Schemas, Global Schema, Data sources]

  DI Element     IQ Criteria

  Data Sources Reputation; Verifiability; Availability; Response
               Time

  Schema         Schema completeness, Minimality, Type
                 Consistency

  Data           Data Completeness, Timeliness, Accuracy
The approach of De Conseicao et al (2007)
      Relevant qualities to be evaluated in DI arch.
 Given a DI Architecture defined in terms of
 •  [Data, Local Schemas, Global Schema, Data sources]
Quality        Definition            Refers to       Metrics                Detailed in terms of
dimension

Completeness   Coverage of global    Global schema   1 – (# of incomplete
               schema concepts                       items / # total
               wrt the application                   items)
               domain
Minimality     Extent in which the   Global schema   1 – (# redundant       Attrib. in an entity
               schema is                             schema elements /      Attrib. in diff. Ent.
               compaclty modeled                     # total items)         Ent. Redundancy degree
               and without                                                  Redundant Relationship
               redundancy                                                   Entity Redund. of a Schema
                                                                            Relationsh. Red. Of a Schema
                                                                            Schema Minimality

Type           Data Type             Global schema   1 – (# of              Data type consistency
Consistency    uniformity across     +               inconsistent schema    Attribute type consistency
               the schemas           Local schemas   elements / # total     Schema data type consistency
                                                     schema elements )
The H. Dai et al. approach (2006)
•  Focus on Column Heterogeneity
                      e-mail, phone n.   Many e-mail and
  Only E-mail addr.   And socsec n.                        phone numb. And
                                         Few phone numb
                                                           Socsec numbers




  B more heterogeneous than a
  B more heterogeneous than c

  B more heterogeneous than d
The H. Dai et al. approach (2006)
Focus on Column Heterogeneity

Heterogeneity dimensions

  –  Number of semantic types resulting in
     different clusters
  –  Cluster entropy
  –  Probabilistic soft clustering
The Moody’s approach
        Classification of schemas related by integration

Quality categ.   Definition
Integration      Level of consistency of the schema with the rest of the org.
                 data
Integr M1        # of data conflicts with the Corporate Schema
Integr M1.a      # of entity conflicts
Integr M1.b      # of data element conflicts, namely, defs. and domains
Integr M1.c      # of naming conflicts (synonims + homonims)
Integr. M2       # of data conflicts with existing systems (ES)
Integr M2.a      # of data element conflicts, namely, defs. and domains with ES
Integr M2.b      # of key conflicts, namely, defs. and domains with ES
Integr M2.c      # of naming conflicts (synonims + homonims) with ES
Integr M3        # of data elements with duplicate data elem. in ES
Integr M4        Rating by representatives of other business areas

                                                                            138
The Chai approach
           Matchability of schemas

•  Focus on the evolution of a Data
   Integration system, and the cost of
   maintaining the mediated schema S
•  Quality observed: the matchability of S
   against a matching tool M, defined as
•  the average of accuracies of matching S
   with future schemas F1, F2, …Fn (that we
   assume known at least to some extent)
   using M
Cases for matching mistakes

•  Predict a spurious match
•  Miss a match
•  Predict a wrong match

•  Strategy to improve matchability
  –  Change concepts in M using rules that
     minimize error probability
Batini et al. 2010
Potential information content
The data architecture of a set of databases is the
           allocation of concepts and tables
              across the DB data schemas
Example of change of data architecture due to improving
                    access efficiency


Employee
Employee #
                                      Distribute
Salary
                                           d
                                          DB
Assigned-to
Employee #
Project #
Role          Centralized DB

Project
Project #
Budget
Data integration technologies
•    Virtual data integration
•    Data Warehouses
•    Application integration
•    Consolidation
Potential information content
                                                  Global
                                  Boat
                            has
                                                  schema
                Tax payer
                            declares

                                       Income


                     Find CF, Name of Tax Payer that
Tax payer                                                  Boat
                     declares <= 30.000 € and
                                                     has
            declares has >= 1 Boat

                    Income                  Tax payer


                          Sources
Potential information content

•  Given a schema I, global schemas resulting
   from virtual integration of schemas S1,
   S2, .., Sn, the potential information
   content of I is the set of queries that can
   be performed on I and cannot be
   performed on S1, S2, .., Sn.
Example
     E1	
              E2	
              E6	
  

                                   Q11
                Q12
S1




                      E3	
  

                                    S2

                          E4	
  

               Q21

                         E5	
  
                                                  146
Quality of the documentation
      for large related
     groups of schemas




                               147
Why integration alone is not enough?




             ?
                                Hundreds
                               of schemas
Relationships investigated

•  Integration
•  Abstraction
•  Abstraction/Integration




                                      149
Abstraction




              150
Abstraction
Department       Employee          City
                                                  Department            Employee


                   Seller
                                                         Item              Order

  Item     in    Order       of     Purchaser




   Floor                                        Department      Employee        City

 Department       Employee           City

                                                                  Seller
                  Seller          Engineer
                            Clerk
  Item      in   Order                            Item     in   Order      of      Purchaser
                              of
Warranty Warehouse                  Purchaser
                                                                                               151
Integration + abstraction




                            152
First case: integration + abstraction
             Company                                                Production                                    Sales                Department structure

   Department                        Employee




          Item                            Order




Department             Employee           City



                         Seller



 Item        in        Order        of       Purchaser




    Floor                                                  Floor
                                                                                                                                          Floor
                                                                                                               Employee
  Department             Employee           City         Department     Employee        City
                                                                                                                                         Department   Employee
                                                                                       Engineer                 Seller
                        Seller            Engineer

                                  Clerk                                        Clerk
                                                                                                  Item   in   Order                                     City
   Item                                                   Item                                                            of
                  in    Order
                                     of
                                                                                                         Warehouse
                                                         Warranty                                                          Purchaser
 Warranty         Warehouse
                                            Purchaser


                                                                                                                                                        153
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era

Contenu connexe

Tendances

Information among networks and systems of knowledge
Information among networks and systems of knowledgeInformation among networks and systems of knowledge
Information among networks and systems of knowledgeJosé Nafría
 
A Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital LibrariesA Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital LibrariesFulvio Rotella
 
A Semi-Automatic Ontology Extension Method for Semantic Web Services
A Semi-Automatic Ontology Extension Method for Semantic Web ServicesA Semi-Automatic Ontology Extension Method for Semantic Web Services
A Semi-Automatic Ontology Extension Method for Semantic Web ServicesIDES Editor
 
Semantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCubeSemantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCubeAmit Sheth
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...IJwest
 
On Distance between Deep Syntax and Semantic Representation
On Distance between Deep Syntax and Semantic RepresentationOn Distance between Deep Syntax and Semantic Representation
On Distance between Deep Syntax and Semantic RepresentationVáclav Novák
 
Eswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalEswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalElena Simperl
 
Dodig-Crnkovic-Information and Computation
Dodig-Crnkovic-Information and ComputationDodig-Crnkovic-Information and Computation
Dodig-Crnkovic-Information and ComputationJosé Nafría
 
Model-Driven Software Development with Semantic Web Technologies
Model-Driven Software Development with Semantic Web TechnologiesModel-Driven Software Development with Semantic Web Technologies
Model-Driven Software Development with Semantic Web TechnologiesFernando Silva Parreiras
 
Ontology learning techniques and applications computer science thesis writing...
Ontology learning techniques and applications computer science thesis writing...Ontology learning techniques and applications computer science thesis writing...
Ontology learning techniques and applications computer science thesis writing...Tutors India
 
Lieto - Book Presentation Cognitive Design for Artificial Minds (AGI Northwes...
Lieto - Book Presentation Cognitive Design for Artificial Minds (AGI Northwes...Lieto - Book Presentation Cognitive Design for Artificial Minds (AGI Northwes...
Lieto - Book Presentation Cognitive Design for Artificial Minds (AGI Northwes...Antonio Lieto
 
Horizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence dataHorizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence dataBarry Smith
 
M1. sem web & ontology introd
M1. sem web & ontology introdM1. sem web & ontology introd
M1. sem web & ontology introdMichele Missikoff
 

Tendances (16)

Information among networks and systems of knowledge
Information among networks and systems of knowledgeInformation among networks and systems of knowledge
Information among networks and systems of knowledge
 
Artificial Intelligence of the Web through Domain Ontologies
Artificial Intelligence of the Web through Domain OntologiesArtificial Intelligence of the Web through Domain Ontologies
Artificial Intelligence of the Web through Domain Ontologies
 
A Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital LibrariesA Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital Libraries
 
A Semi-Automatic Ontology Extension Method for Semantic Web Services
A Semi-Automatic Ontology Extension Method for Semantic Web ServicesA Semi-Automatic Ontology Extension Method for Semantic Web Services
A Semi-Automatic Ontology Extension Method for Semantic Web Services
 
Semantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCubeSemantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCube
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
 
Tutorial kcc-2011
Tutorial kcc-2011Tutorial kcc-2011
Tutorial kcc-2011
 
On Distance between Deep Syntax and Semantic Representation
On Distance between Deep Syntax and Semantic RepresentationOn Distance between Deep Syntax and Semantic Representation
On Distance between Deep Syntax and Semantic Representation
 
Eswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalEswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies final
 
Dodig-Crnkovic-Information and Computation
Dodig-Crnkovic-Information and ComputationDodig-Crnkovic-Information and Computation
Dodig-Crnkovic-Information and Computation
 
Model-Driven Software Development with Semantic Web Technologies
Model-Driven Software Development with Semantic Web TechnologiesModel-Driven Software Development with Semantic Web Technologies
Model-Driven Software Development with Semantic Web Technologies
 
Ontology learning techniques and applications computer science thesis writing...
Ontology learning techniques and applications computer science thesis writing...Ontology learning techniques and applications computer science thesis writing...
Ontology learning techniques and applications computer science thesis writing...
 
Lieto - Book Presentation Cognitive Design for Artificial Minds (AGI Northwes...
Lieto - Book Presentation Cognitive Design for Artificial Minds (AGI Northwes...Lieto - Book Presentation Cognitive Design for Artificial Minds (AGI Northwes...
Lieto - Book Presentation Cognitive Design for Artificial Minds (AGI Northwes...
 
Horizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence dataHorizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence data
 
M1. sem web & ontology introd
M1. sem web & ontology introdM1. sem web & ontology introd
M1. sem web & ontology introd
 
Text mining
Text miningText mining
Text mining
 

Similaire à Information Quality in the Web Era

The Semantic Web #8 - Ontology
The Semantic Web #8 - OntologyThe Semantic Web #8 - Ontology
The Semantic Web #8 - OntologyMyungjin Lee
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
 
Integrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataIntegrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataDhaval Thakker
 
Towards an ecosystem of data and ontologies
Towards an ecosystem of data and ontologiesTowards an ecosystem of data and ontologies
Towards an ecosystem of data and ontologiesMathieu d'Aquin
 
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...Mathieu d'Aquin
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebMarin Dimitrov
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisJamshaid Ashraf
 
No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...
No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...
No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...ACTUONDA
 
Objective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteObjective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteAldo Gangemi
 
Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Yannis Kalfoglou
 
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDASupporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDAJesse Lingeman
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNRDatiGovIT
 
Taming digital traces for informal learning dhaval
Taming digital traces for informal learning  dhavalTaming digital traces for informal learning  dhaval
Taming digital traces for informal learning dhavalDhavalkumar Thakker
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 
Ontology Engineering: Introduction
Ontology Engineering: IntroductionOntology Engineering: Introduction
Ontology Engineering: IntroductionGuus Schreiber
 
Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture jrhowe
 
20120419 linkedopendataandteamsciencemcguinnesschicago
20120419 linkedopendataandteamsciencemcguinnesschicago20120419 linkedopendataandteamsciencemcguinnesschicago
20120419 linkedopendataandteamsciencemcguinnesschicagoDeborah McGuinness
 

Similaire à Information Quality in the Web Era (20)

The Semantic Web #8 - Ontology
The Semantic Web #8 - OntologyThe Semantic Web #8 - Ontology
The Semantic Web #8 - Ontology
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
 
Integrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataIntegrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched data
 
Towards an ecosystem of data and ontologies
Towards an ecosystem of data and ontologiesTowards an ecosystem of data and ontologies
Towards an ecosystem of data and ontologies
 
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
 
No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...
No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...
No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...
 
Objective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteObjective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynote
 
Text Mining : Experience
Text Mining : ExperienceText Mining : Experience
Text Mining : Experience
 
Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002
 
Larflast
LarflastLarflast
Larflast
 
moocpixel
moocpixelmoocpixel
moocpixel
 
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDASupporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNR
 
Taming digital traces for informal learning dhaval
Taming digital traces for informal learning  dhavalTaming digital traces for informal learning  dhaval
Taming digital traces for informal learning dhaval
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
Ontology Engineering: Introduction
Ontology Engineering: IntroductionOntology Engineering: Introduction
Ontology Engineering: Introduction
 
Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture
 
20120419 linkedopendataandteamsciencemcguinnesschicago
20120419 linkedopendataandteamsciencemcguinnesschicago20120419 linkedopendataandteamsciencemcguinnesschicago
20120419 linkedopendataandteamsciencemcguinnesschicago
 

Plus de Università degli Studi di Milano-Bicocca

DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...Università degli Studi di Milano-Bicocca
 
EW-Shopp: Supporting Event and Weather-based Data Analytics and Marketing alo...
EW-Shopp: Supporting Event and Weather-basedData Analytics and Marketing alo...EW-Shopp: Supporting Event and Weather-basedData Analytics and Marketing alo...
EW-Shopp: Supporting Event and Weather-based Data Analytics and Marketing alo...Università degli Studi di Milano-Bicocca
 
Research Challenges in Artificial Intelligence: Tackling the Complexity of H...
Research Challenges in Artificial Intelligence: Tackling the Complexity of H...Research Challenges in Artificial Intelligence: Tackling the Complexity of H...
Research Challenges in Artificial Intelligence: Tackling the Complexity of H...Università degli Studi di Milano-Bicocca
 
Using Ontology-based Data Summarization to Develop Semantics-aware Recommende...
Using Ontology-based Data Summarization to Develop Semantics-aware Recommende...Using Ontology-based Data Summarization to Develop Semantics-aware Recommende...
Using Ontology-based Data Summarization to Develop Semantics-aware Recommende...Università degli Studi di Milano-Bicocca
 
Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...
Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...
Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...Università degli Studi di Milano-Bicocca
 

Plus de Università degli Studi di Milano-Bicocca (8)

Semantic Data Enrichment: a Human-in-the-Loop Perspective
Semantic Data Enrichment: a Human-in-the-Loop PerspectiveSemantic Data Enrichment: a Human-in-the-Loop Perspective
Semantic Data Enrichment: a Human-in-the-Loop Perspective
 
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
 
EW-Shopp: Interoperability Challenges and Solutions
EW-Shopp: Interoperability Challenges and SolutionsEW-Shopp: Interoperability Challenges and Solutions
EW-Shopp: Interoperability Challenges and Solutions
 
EW-Shopp: Supporting Event and Weather-based Data Analytics and Marketing alo...
EW-Shopp: Supporting Event and Weather-basedData Analytics and Marketing alo...EW-Shopp: Supporting Event and Weather-basedData Analytics and Marketing alo...
EW-Shopp: Supporting Event and Weather-based Data Analytics and Marketing alo...
 
Research Challenges in Artificial Intelligence: Tackling the Complexity of H...
Research Challenges in Artificial Intelligence: Tackling the Complexity of H...Research Challenges in Artificial Intelligence: Tackling the Complexity of H...
Research Challenges in Artificial Intelligence: Tackling the Complexity of H...
 
Using Ontology-based Data Summarization to Develop Semantics-aware Recommende...
Using Ontology-based Data Summarization to Develop Semantics-aware Recommende...Using Ontology-based Data Summarization to Develop Semantics-aware Recommende...
Using Ontology-based Data Summarization to Develop Semantics-aware Recommende...
 
Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...
Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...
Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...
 
Pay-as-you-go Multi-User Feedback Model for Ontology Matching - EKAW2014
Pay-as-you-go Multi-User Feedback Model for Ontology Matching - EKAW2014Pay-as-you-go Multi-User Feedback Model for Ontology Matching - EKAW2014
Pay-as-you-go Multi-User Feedback Model for Ontology Matching - EKAW2014
 

Dernier

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Information Quality in the Web Era

  • 1. Tutorial at CAISE 2010 Information Quality in the Web Era C. Batini & Matteo Palmonari Department of Computer Science, Communication and Systems University of Milano Bicocca [batini;palmonari]@disco.unimib.it 1
  • 2. Outline •  Motivation [Palmonari] –  the Web era / information quality meeting ontologies / the ontology landscape / •  Quality of data (conceptual level) [Batini] –  frameworks / metamodels / dimensions / metrics / groups of schemas •  Quality of ontologies [Palmonari] –  frameworks / metamodels / dimensions / metrics •  Conclusions [Palmonari+Batini] 2
  • 3. Outline •  Motivation: –  the Web era / information quality meeting ontologies / the ontology landscape / •  Quality of data (conceptual level) –  frameworks / metamodels / dimensions / metrics / groups of schemas •  Quality of ontologies –  frameworks / metamodels / dimensions / metrics •  Conclusions 3
  • 4. the Web era is characterized by… The “Big Data” phenomenon
  • 5. How to make sense of all these data? 5
  • 6. Documents’ and diplicates’ size along time 6
  • 7. How to make sense of all these data? Data management needs data quality 7
  • 8. How to make sense of all these data? Data management needs data quality 8
  • 9. Data/information heterogeneity in Information Systems Information is available in different formats and is represented according different models Place Country Population Main economic activity Portofino Italy 7.000 Tourism Need to consider information Image quality for heterogeneous Structured data information sources Portofino Map Dear Laure, I try to describe the wonder- ful harbour of Portofino as I have seen Text this morning a boat is going in, other boats are along the wharf. Small pretty buildings 9 and villas are looking on to the harbour.
  • 10. Tutorial Background - Data Quality (Structured Data) 23rd International Conference on Conceptual Modeling (ER 2004), Shangai A Survey of Data Quality Issues in Cooperative Information Systems Carlo Batini Monica Scannapieco Università di Milano “Bicocca” Università di Roma “La Sapienza” batini@disco.unimib.it monscan@dis.uniroma1.it
  • 11. Tutorial Background – Towards Information Quality (Heterogenous Data) Tutorial at ER 08, Barcelona, Spain Quality of Data, Textual Information and Images: a comparative survey Speaker: C. Batini Other authors: F. Cabitza, G. Pasi, R. Schettini Dipartimento di Informatica, Sistemistica e Comunicazione, Universita’ di Milano Bicocca, Milano, Italy batini@disco.unimib.it
  • 12. How to make sense of all these data? Together with automatic techniques for information extraction, processing & integration, also need automatic techniques for assessing the quality of information Information quality for information shared, consumed and delivered on the Web Increasing attention to information semantics 12
  • 13. Of course, the “Semantic Web” perspective •  Make the semantics of 1998 information explicit with Web- compliant ontologies* by –  sharing conceptualizations/ terminologies on the Web –  sharing data on the Web •  Models, languages & technologies –  E.g. RDF, RDFS, OWL, SKOS 2006 By now, let’s consider a very broad definition An ontology is a specification of a conceptualization. T. R. Gruber. A translation approach to portable ontologies. Knowledge Acquisition, 5(2):199-220, 1993. 13
  • 14. Ontologies out of the Semantic Web •  But also for the ones that are skeptic wrt the semantic Web, •  Ontologies (e.g. OWL ontologies, linked data, thesauri) can be considered useful external resources to use in –  Conceptual modeling –  Data integration –  Document management –  Service Oriented Computing –  Information retrieval –  … –  Software Engineering –  Information System Design 14
  • 15. Ontology + “Information Systems” 15
  • 16. Ontology + “Software Engineering” 16
  • 17. Ontologies & Semantic Resources •  KB - Axiomatic ontologies (e.g. SUMO) –  Terminological (intentional/schema) level: concepts, relationships, axioms specifying logical constraints –  Assertional (extensional/data) level: instances, typing, relations between instances •  LD - Linked data on the Web (e.g. DBpedia) –  RDF data, usually light-weight KBs •  Th – Thesauri (e.g. WordNet) –  Lexical ontologies: terms, no schema vs. instances •  In synthesis, the ontology landscape includes: –  Shared Vocabulary (KB,LD,Th) –  Modeling principles (KB) –  Logical theories supporting reasoning (KB) –  Web-compliant representations of models and data (KB,LD,Th) 17
  • 18. Need for ontology evaluation •  Ontology “Quality”  Ontology Evaluation •  Quality of ontologies matters! –  In particular, when ontologies: •  are built to support specific applications (their quality impacts on the application effectiveness) •  are searched on the Web, reused, extended –  Many ontologies to choose from! –  E.g. suppose that you need an ontology describing customer and the business domain 18
  • 19. Searching for “Customer” with Sindice 19
  • 22. Searching for “Customer” on Swoogle (refined search) 22
  • 23. Ontologies and semantic resources should be considered in comprehensive studies about information quality in the Web era Tough work! Let’s start from the beginning: ontologies and structured data 23
  • 24. Structured data and ontologies •  Structured data •  Ontologies (KB) Instances Instances Logical Schemas Schema Tight vs loose instance-schema Conceptual schemas coupling A - Concpetual level representations - Externalized models (semiotic objects) - Constraints on domain (data) Diagrammatic models (ER, UML,ORM) Logical models supporting reasoning 24
  • 25. Ontologies and their grandparents •  Structured data •  Ontologies (KB) Instances Instances Logical Schemas Schema / Terminologies Conceptual schemas In this (mini) tutorial we will: - focus on the modeling level: “Quality of Conceptual Schemas and Ontologies” A -  provide a guided tour on the topic by - Concpetual level representations discussing only part of the material (soon - Externalized models (semiotic objects) available online) on domain (data) - Constraints Diagrammatic models (ER, UML,ORM) on -  focus common aspects and, in Logical models particular, differences supporting reasoning 25
  • 26. Outline •  Motivation: –  the Web era / information quality meeting ontologies / the ontology landscape / •  Quality of data (conceptual level) –  frameworks / metamodels / dimensions / metrics / groups of schemas •  Quality of ontologies –  frameworks / metamodels / dimensions / metrics •  Conclusions 26
  • 27. Outline •  Motivation: –  the Web era / information quality meeting ontologies / the ontology landscape / •  Quality of Conceptual Schemas –  frameworks / metamodels / dimensions / metrics / groups of schemas •  Quality of ontologies –  frameworks / metamodels / dimensions / metrics •  Conclusions 27
  • 28. # of slides •  About 130  30 •  I will provide mainly a guided introduction to the slides
  • 29. In a database, quality can be investigated.. •  At model (language) level •  At schema (model) level •  Al instance (value/data) level 29
  • 31. Acronym Data Quality Dimension TDQM Accessibility, Appropriateness, Believability, Completeness, Concise/Consistent representation, Ease of manipulation, Value added, Free of error, Interpretability, Objectivity, Relevance, Reputation, Security, Timeliness, Understandability DWQ Correctness, Completeness, Minimality, Traceability, Interpretability, Metadata Evolution, Accessibility (System, Transactional, Security), Usefulness (Interpretability), Timeliness (Currency, Volatility), Responsiveness, Completeness, Credibility, Accuracy, Consistency, Interpretability TIQM Inherent dimensions: Definition conformance (consistency), Completeness, Business rules conformance, Accuracy (to surrogate source), Accuracy (to reality), Precision, Nonduplication, Equivalence of redundant data, Concurrency of redundant data, Pragmatic dimensions: accessibility, timeliness, contextual clarity, Derivation integrity, Usability, Rightness (fact completeness), cost. AIMQ Accessibility, Appropriateness, Believability, Completeness, Concise/Consistent representation, Ease of operation, Freedom from errors, Interpretability, Objectivity, Relevancy, Reputation, Security, Timeliness, Understandability CIHI Dimensions: Accuracy, Timeliness Comparability, Usability, Relevance Characteristics: Over-coverage, Under-coverage, Simple/correlated response variance, Reliability, Collection and capture, Unit/Item non response, Edit and imputation, Processing, Estimation, Timeliness, Comprehensiveness, Integration, Standardization, Equivalence, Linkage ability, Product/Historical comparability, Accessibility, Documentation, Interpretability, Adaptability, Value. DQA Accessibility, Appropriate amount of data, Believability, Completeness, Freedom from errors, Consistency, Concise Representation, Relevance, Ease of manipulation, Interpretability, Objectivity, Reputation, Security, Timeliness, Understandability, Value added. IQM Accessibility, Consistency, Timeliness, Conciseness, Maintainability, Currency, Applicability, Convenience, Speed, Comprehensiveness, Clarity, Accuracy, Traceability, Security, Correctness, Interactivity. ISTAT Accuracy, Completeness, Consistency AMEQ Consistent representation, Interpretability, Case of understanding, Concise representation, Timeliness, Completeness Value added, Relevance, Appropriateness, Meaningfulness, Lack of confusion, Arrangement, Readable, Reasonability, Precision, Reliability, Freedom from bias, Data Deficiency, Design Deficiency, Operation, Deficiencies, Accuracy, Cost, Objectivity, Believability, Reputation, Accessibility, Correctness, Unambiguity, Consistency COLDQ Schema: Clarity of definition, Comprehensiveness, Flexibility, Robustness, Essentialness, Attribute granularity, Precision of domains, Homogeneity, Identifiability, Obtainability, Relevance, Simplicity/Complexity, Semantic consistency, Syntactic consistency. Data: Accuracy, Null Values, Completeness, Consistency, Currency, Timeliness, Agreement of Usage, Stewardship, Ubiquity, Presentation: Appropriateness, Correct Interpretation, Flexibility, Format precision, Portability, Consistency, Use of storage, Information policy: Accessibility, Metadata, Privacy, Security, Redundancy, Cost. DaQuinCIS Accuracy, Completeness, Consistency, Currency, Trustworthiness QAFD Syntactic/Semantic accuracy, Internal/External consistency, Completeness, Currency, Uniqueness. CDQ Schema: Correctness with respect to the model, Correctness with respect to Requirements, Completeness, Pertinence, Readability, Normalization, Data: Syntactic/Semantic Accuracy, Semantic Accuracy, Completeness, Consistency, Currency, Timeliness, Volatility, Completability, Reputation, Accessibility, Cost. 31
  • 32. Reference for quality of data in databases 2006 32
  • 33. Here we focus on •  Model level • Schema level •  Data level 33
  • 34. Quality of Conceptual Schemas - contents •  Frameworks and Metamodels proposed •  Quality of Schemas –  Classifications, Dimensions & Metrics: main proposals –  Comparison of proposals –  Improving the quality of schemas •  Quality of groups of schemas –  Quality of Data Integration Architectures –  Quality of the documentation for large related groups of schemas 34
  • 36. Some figures on proposed approaches in the literature (from Mehmood 2009, citing Moody 2005) Research Practice Mixed # of proposals 29 8 2 Frameworks and % of total 74% 21% 5% metamodels Empirically validated 6 0 1 % 20% 0% 50% Generalizable 5 0 0 % 175 0% 0% Not generalizable 24 8 2 % 83% 100% 100% Generalizable means that the proposal can be applied to conceptual models in general and is not specific to, e.g., ER
  • 37. Metaschema of approaches Formal Meta Classification Framework schema One/two or three level taxnomies Quality Concepts and dimension Concepts and paradigms paradigms involved in involved in the life cycle a formally grounded of quality, namely in the approach to quality Quality production assessment subdimension and improvement activities Metrics Examples Experiments 37
  • 38. Krogstie & Solvberg (the Scandinavians) Proposals Meta Classification Formal schema Framework • Shanks Quality • Arab French dimension • Vassiliadis Quality origins – Batini et al. The subdimension • Scandinavians • Arab French • Moody Metrics • Genero et al. • Herden • Poels Examples Experiments 38
  • 39. Proposals Formal Meta Classification Framework schema Quality dimension Quality subdimension Metrics Examples Experiments 39
  • 40. Frameworks for schema quality 40
  • 41. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic Interpretation Goal of quality modeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 41
  • 42. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic Interpretation Goal of Correspondence quality between modeling the conceptual model and Physical Empirical Social quality quality domain the Pragmatic Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 42
  • 43. Krogstie and Solvberg framework Correspondence between participant knowledge and individual interpretation Social Participant quality knowledge Perceived Social actor Semantic Interpretation Goal of quality modeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 43
  • 44. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic Interpretation Goal of quality modeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Correspondence between the conceptual model and Technical actor Intepretation the language 44
  • 45. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic Interpretation Goal of quality modeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Correspondence between the conceptual model and the Technical actor audience’s interpetation Intepretation of it 45
  • 46. Correspondence between participant knowledge and Krogstie and Solvberg framework the externalized conceptual model ° Externalization: the knowledge of social actors has been externalized in the model Social ° Internalizability, the model is persistent Participant quality knowledge Perceived Social actor Semantic Interpretation Goal of quality modeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 46
  • 47. Krogstie and Solvberg framework Social It is reflected by the error frequency when a model is Participant quality Perceived read or written, so by readability and clarity Social actor knowledge Semantic Interpretation Goal of quality modeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 47
  • 48. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic Interpretation Goal of quality modeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Syntactic Language domain Agreement on participant knowledge Semantic Model quality extension quality Externalization and individual interpretation Technical Pragmatic quality Technical actor Intepretation 48
  • 49. More formally •  G, the goals of the modeling task. •  L, the language extension, i.e., the set of all statements that are possible to make according to the graphemes, vocabulary, and syntax of the modeling languages used. •  D, the domain, i.e., the set of all statements that can be stated about the situation at hand. •  M, the model (schema) itself. •  Ks, the relevant explicit knowledge of those being involved in modeling. A subset of these is actively involved in modeling, and their explicit knowledge is indicated by KM. •  I, the social actor interpretation, i.e., the set of all statements that the audience thinks that an externalized model consists of. •  T, the technical actor interpretation, i.e., the statements in the model as 'interpreted' by modeling tools. 49
  • 50. Main quality types •  Physical quality: The basic quality goal is that the model M is available for the audience. •  Empirical quality deals with predictable error frequencies when a model is read or written by different users, coding (e.g. shapes of boxes) and HCI-ergonomics for documentation and modeling-tools. For instance, graph layout to avoid crossing lines in a model is a mean to address the empirical quality of a model. •  Syntactic quality is the correspondence between the model M and the language extension L. •  Semantic quality is the correspondence between the model M and the domain D. This includes validity and completeness. •  Perceived semantic quality is the similar correspondence between the audience interpretation I of a model M and his or hers current knowledge K of the domain D. •  Pragmatic quality is the correspondence between the model M and the audience's interpretation and application of it (I). 50
  • 51. Framework for language (model) quality 51
  • 52. Framework for language (model) quality Participant Social actor knowledge Interpretation Participant appropriateness Goal of modeling Organizational Modeler appropr. Comprehensibility appropriateness appropriateness Model Externalization Language Modeling Domain extension domain Appropriateness Tool Appropriateness Technical actor Intepretation 52
  • 53. Main quality types Domain appropriateness. This relates the language and the domain. Ideally, the conceptual basis must be powerful enough to express anything in the domain, not having what terms construct deficit. On the other hand, you should not be able to express things that are not in the domain, i.e. what is termed construct excess. Domain appropriateness is primarily a mean to achieve semantic quality. Participant appropriateness relates the social actors’ explicit knowledge to the language. Participant appropriateness is primarily a mean to achieve pragmatic quality both for comprehension, learning and action. Modeler appropriateness: This area relates the language extension to the participant knowledge. The goal is that there are no statements in the explicit knowledge of the modeler that cannot be expressed in the language. Modeler appropriateness is primarily a mean to achieve semantic quality. 53
  • 54. Main quality types Comprehensibility appropriateness relates the language to the social actor interpretation. The goal is that the participants in the modeling effort using the language understand all the possible statements of the language. Comprehensibility appropriateness is primarily a mean to achieve empirical and pragmatic quality. Tool appropriateness relates the language to the technical audience interpretations. For tool interpretation, it is especially important that the language lend itself to automatic reasoning. This requires formality (i.e. both formal syntax and semantics being operational and/or logical), but formality is not necessarily enough, since the reasoning must also be efficient to be of practical use. This is covered by what we term analyzability (to exploit any mathematical semantics) and executability (to exploit any operational semantics). Different aspects of tool appropriateness are means to achieve syntactic, semantic and pragmatic quality (through formal syntax, mathematical semantics, and operational semantics). Organizational appropriateness relates the language to standards and other organizational needs within the organizational context of modeling. These are means to support organizational quality. 54
  • 56. Shanks et al. composite model Theory based Domain Quality type Means Language Goal Property Prqa Model Activity Audience Weighting Quality factor Rating Evaluation method Practice based 56
  • 57. Metamodels – Arab/French Mehmood, Chefri et al. 2009, based on goals, question, metrics Quality goal Q. Dimension Q. Attribute Model element Transformation Transformation Q. Metric step rule 57
  • 58. Metamodel instantiation Quality goal Ease of change Dimension Complexity Mantainability Quality Simplicity Structural Modu Under Modi attribute complexity larity standa fiabi bility lity Quality # of # of metric associations dependencies Transfor Merge Divide mation entities The model 58
  • 59. Metamodels – Vassiliadis et al. For DWs Quality goal Q. Dimension Improvement Factor process Interaction Measurem. Q. Metric method Transformation Information Measurem. System object value Date Data o. Process o. Model o. 59
  • 60. Quality goal Q. Dimension Comparison Improvement Factor process Interaction Measurem. Q. Metric method Transformation Vassiliadis Information System object Measurem. value Date Data o. Process o. Model o. Quality goal Q. Dimension Q. Attribute Model element Mehemood Transformation Transformation Q. Metric step rule
  • 62. The origins… Batini, Ceri, Navathe 1991 Formal Meta Classifica Frame schema tion work Quality dimension Quality subdimension Metrics Examples Experiments
  • 63. Batini, Ceri, Navathe 1991 Q. Dimension Definition Completeness Represents all (only) relevant features of Pertinence requirements Correctness - Concepts are properly defined in the schema Syntactic Correctness - Concepts are used according to their definitions Semantic Minimality Every aspect of reqs. appears only once in the schema Expressiveness Can be easily understood Readability Diagram respects aesthetic criteria Self-explaination Other formalisms and languages not needed Extensibility Easily adapted to changing requirements Normality From theory of normalization 63
  • 64. Completeness Completeness measures the extent to which a conceptual Students have a Schema includes all the code, a name, a place of birth. conceptual elements necessary to meet some specified requirements. It is possible that the designer has not included certain characteristics present in the requirements in the Code schema, e.g., attributes related to Student Name an entity Person; in this case, the schema is incomplete. 64
  • 65. Pertinence Pertinence measures how many unnecessary conceptual Students have a code and a name. elements are included in the Conceptual schema. In the case of a schema that is not pertinent, the designer has Gone too far in modeling the Code requirements, and has included Student Name Place_of too many concepts. Birth 65
  • 66. Correctness - syntactic Concerns the correct use of the categories of the model in representing requirements. Student Example – In the Entity Relationship model we may represent the (1,n) logical link between persons and their has first names using the two entities Person (1,1) and FirstName and a relationship between First Name them. The schema is not correct wrt the model since an entity should be used only when the concept has a unique existence in the real world and has an identifier. 66
  • 67. Correctness - semantic Correctness with respect to requirements concerns the correct representation of The requirements in terms of the model Manager categories. (1,n) Example - In an organization each department is headed by exactly one heads manager and each manager may head (1,1) exactly one department. Department If we represent Manager and Department as entities, the Relationship between them should be one-to-one; in this case, the Schema is correct wrt requirements. If we Use a one-to-many relationship, the schema is incorrect. 67
  • 68. Minimality/Redundancy 1,n A schema is minimal if every Student part of the requirements is 1,n represented only once in the Attends 1,n schema. In other words, it is Course Assigned to not possible to eliminate some 1,? element from the schema Teaches without compromising the 1,n Instructor information content. 1,n 68
  • 69. Expressiveness/Readability Intuitively, a schema is readable whenever it represents the meaning of the reality represented by the schema in a clear way for its intended use. This simple, qualitative definition is not easy to translate in a more formal way, since the evaluation expressed by the word clearly conveys some elements of subjectivity. In models, such as the Entity Relationship model, that provide a graphical representation of the schema, called readability concerns both the diagram and the schema itself. 69
  • 70. Diagrammatic readability With regard to the diagrammatic representation, readability can be expressed objectively by a number of aesthetic criteria that human beings adopt in drawing diagrams: 1.  crossings between lines should be minimized, 2.  graphic symbols should be embedded in a grid, 3.  lines should be made of horizontal or vertical segments, 4.  The number of bends in lines should be minimized, 5.  the total area of the diagram should be minimized, and, finally, 6.  Parents in generalization hierarchies should be positioned at a higher level in the diagram in respect to children. 7.  The children entities in the generalization hierarchy should be symmetrical with respect to the parent entity. 70
  • 71. Unreadable schema Works Manages Head Employee Floor Purchase Vendor Located Born In Department Warehouse Engineer Worker Of Produces Acquires Order Item Type City Warranty 71 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  • 72. A Readable schema Floor Located Manages Head Born City Department Employee Works Produces Vendor Worker Engineer Item In Warehouse Type Acquires Order Of Purchase Warranty 72 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  • 73. Is diagrammatic readability objective? SEM Place close entitities in SYNT Minimize generalizations bends Works Manages Head Employee SYNT Minimize Minimize Floor Purchase crossings crossings… Vendor Located SEM Place most Born Don’t change at all ! important In Department Warehouse Engineer Worker concept in the Of middle Produces Acquires Order Item Type City SYNT Use only Warranty horizontal Works Manages Floor Head Located Manages Employee Department Head Employee Born City Floor Purchase Works Vendor Located Born Produces Vendor Worker Engineer Department In Warehouse Engineer Worker Item In Of Produces Type Warehouse Acquires Order Item Type Warranty Acquires Order Of Purchase City Warranty 73 @C.Batini, 2009
  • 74. But ……personal experience in China, Beda University, about 1985 Question to chinese professors: Which one of the two diagrams do you like more? Works Manages Floor Located Manages Head Employee Head City Born Floor Department Employee Purchase Works Vendor Located Born Produces Vendor Worker Engineer In Department Warehouse Engineer Worker Item In Of Produces Acquires Order Warehouse Type Item Type City Acquires Order Of Purchase Warranty Warranty Answer: definitively the left one, we like asymmetry and movement … 74 @C.Batini, 2009
  • 75. Expressiveness The second issue addressed by readability is the compactness of schema representation. Among the different conceptual schemas that equivalently represent a certain reality, we prefer the one or the ones that are more compact, because compactness favors readability. 75
  • 76. Transformation the preserves information content and enhances compactness/expressiveness Employee Born City Employee Vendor Worker Engineer Vendor Worker Engineer Born Born City Born 76
  • 77. Normalization Unnormalized ER schema Employee-Project Employee # Salary Project # Budget Role Normalized ER schema Employee 1,n 1,n Project Assigned to Employee # Project # Salary Role Budget 77
  • 78. Scandinavians (1994- Formal Meta Classification Framework schema Quality dimension Quality subdimension Metrics Examples Experiments
  • 79. Scandinavians (1994- Formal Meta Classification Framework schema Quality dimension Quality subdimension Metrics Examples Experiments 79
  • 80. Main model (schema) quality dimensions Physical quality •  Externalization, number of statements on the domain not yet stated in the model/total # of stat. •  Interalizability –  Persistence, proptection against loss or damage –  Availability, usual meaning Empirical quality, deals with readability by the audience Expressed in terms of graph aesthetics and graph layout criteria Syntactic quality, correspondence between the model (schema) and the language (model), where errors are due by Syntactic invalidity, words or graphems not part of the language are used Syntactic incompleteness, the model lack constructs to obey the language’s grammar (e.g. usa only one cardinality to express minimum and max cards Semantic quality (feasible) Validity, the stements in the model are correct and relevant for the problem (feasible) Completeness, the model contains all the stements which would be correct and relevant Perceived semantic quality the correspondence between the actor interpetation of the model and her current knolwledge of the domain Validity Completeness Pragmatic quality, the correspondence between the model and the audience interpretation of it (Feasible) Comprehension the actors undesrstaod the moled, or else individual actors und. The part of the model relevant to them Social quality, Agreement in knowledge, Agreement in model interpretation Knowledge quality, that is perfect when the audience knew everything about the domain at a given time. Validity Completeness 80
  • 81. Language quality dimensions - 1 May refer a. to the language or else b. to the relationship btwn language and other issues. In the first case may refer to: –  the constructs of the language –  the external visual representation For both Perceptibility, how easy for persons is language comprehension Expressive power, what it is possible to espress in the language Expressive economy, hoe effectively can things be expressed in the lanugage Method/tool potential, how easily the language lends itself to proper method or tool support. Reducibility, what features are provided by the language to deal with large and complex models. 81
  • 82. Language quality dimensions - 2 Referring to the relationship btwn the language and other issues Domain appropriateness, there are not statements in the domain that cannot be expressed in the language Participant kn. appr., statements in the language models are part of the explicit knowledge of participants. Knowledge externalizability appr. There are no statements in the explicit kn. of the participants that cannot be expressed in the language Comprehensibility appr Technical actor interpretation appr. 82
  • 83. More of Pragmatic quality •  Social pragmatic quality (to what extent people understand and are able to use the models) and technical pragmatic quality (to what extent tools can be made that interpret the models). 83
  • 84. Arab French (2002- Formal Meta Classification Framework schema Quality dimension Quality subdimension Metrics Examples Experiments
  • 85. Chefri et al. classification •  Specification –  Legibility •  Clarity •  Minimality –  Non Redundancy –  Factorization degree –  Aggregation degree –  Expressiveness •  Concept expressiveness •  Schema expressiveness –  Simplicity –  Correctness •  Usage –  Understandability •  Documentation degree •  User Vocabulary •  Concept independence degree –  Completeness •  Requirements coverage degree •  Cross modeling completeness •  Implementation –  Implementability –  Maintainability •  Modifiability •  Cohesion •  Coupling 85
  • 86. Definitions – 1 Q. Dimension Definition Clarity is an aesthetic criterion, based on the graphical arrangement Minimality Every aspect of the requirements appears only once Min - Non Redundancy No concept can be canceled without decreasing the information content Min - Factorization degree Measures the effectiveness of inheritance hierarchies of the schema Min - Aggregation degree Measures the efficient use of aggregate attributes in the schema Expressiveness The schema can be easily understood without additional explaination Exp – Concept and schema expr Compactness Simplicity The schema contains the minimum possible constructs Correctness (syntactical) Concepts are properly defined in the schema Understandability (model) The easy with which the data model can be intepreted by the user Understandability (schema) How much modeling features are made explicit Und – Documentation degree Presence of additional documentation for concepts Und – User vocabulary rate Users are able to make easy correspondences btwn schema and reqs. Und Concept independ. degree “short paths” for semantic intercnnections (ex. A ISA B) 86
  • 87. Definitions - 1 Q. Dimension Definition Completeness The schema represents all relevant features in the requirements Comp – Requirements Correpondence btwn concepts in sch. and relevant terms in coverage reqs Comp – Cross modeling Presence in a sch S1 of all concepts in schemas in a set compl. Implementability Amount of effort to implement the schema Imp - Implementability Overall semantic distance btwn concept is the source m and conc in the target model Maintainability Ease with which the schema can evolve Man - Modifiability # of modif. related to a concept mod. deriving from dependencies Man - Cohesion Existence of clusters with high # of internal links btwn clusters compared with external links Man – Coupling Existence of clusters with low # of links btwn clusters 87
  • 88. Chefri et al. classification – metrics (examples) Specification Legibility –  Clarity # of concepts – number of crossings in the diagram –  Minimality •  Non Redundancy (# weight. conc. - # red. Conc.)/ total # weigh conc. •  Factorization degree •  Aggregation degree Expressiveness –  Concept expressiveness –  Schema expressiveness Simplicity Correctness 88
  • 89. Metrics for structural complexity •  # of associations •  # of dependencies •  # of aggregations •  Depth inheritance tree, longest path from the root of a hierachy to the leaves
  • 90. Moody 1998 - Meta Classification Formal schema Method for Framework Quality dimension Quality subdimension Metrics Examples Experiments
  • 91. Moody’s classification •  Completness •  Integrity •  Flexibility •  Understendability •  Correctness •  Simplicity •  Implementability •  Integration  Quality of related groups of schemas (see later) 91
  • 92. Moody’s classific. of Quality dim. and metrics - 1 Dimension Definition Completeness The schema contains all the information required to meet reqs. Completness M1 # of items that do not correspond to reqs. Completness M2 # of reqs. Not represented in the schema Completness M3 # of items that inacurrately represent reqs Completness M4 # of inconsistencies in the schema Integrity Extent to which the business rules on data are enforced by the sch. Integrity M1 # of business rules not enforced by the schema Integrity M2 # of integrity constr. In the schema not accurate in repr. Bus. rules Flexibility The ease with which the schema can cope with business change Flexibility M1 # of elements in the sch. Which are subject to change in the future Flexibility M2 Estimated cost of changes Flexibility M3 Strategic importance of change 92
  • 93. Moody’s classific. of Quality dim. and metrics - 2 Dimension Definition Understandability Ease with which the schema can be understood Understandability User rating M1 Understandability Ability of users to interpret the model correctly M2 Understandability Application developer rating M3 Correctness The schema conforms to the rules of the conceptual model Correctness M1 # of violations to model conventions Correctness M2 Intra ent. Redundancy: Number of normal form violations Correctness M3.a Inter ent. Redundancy: # of redund. concepts in the schema 93
  • 94. Moody’s classific. of Quality dim. and metrics - 3 Dimension Definition Simplicity The schema contains the minimum possible constructs Simplicity M1 # of entities Simplicity M2 # of entities + relationships Simplicity M3 # of entities + relationships + attributes Implementability Ease with which the schema can be implemented within time, budget, technology constraints Implement M1 Technical risk rating Implement M2 Schedule risk rating Implement M3 Development cost estimate 94
  • 95. Moody’s monumental contribution to empirical quality/ quality of diagrammatic notations (TSE 2009) Semiotic clarity – there should be a 1:1 correspondence between semantic constructs and graphical symbols Symbol redundancy Symbol overload Symbol excess Symbol deficit Perceptual discriminability: different symbols should be clearly distinguishable form each other Visual distance Discriminability treshold Semantic transparency: use visual representations whose appearenace suggests their meaning, where symbols can be Immediate Semantically opaque Semantically perverse Semantic translucent
  • 96. Moody’s monumental contribution to empirical quality/ quality of diagrammatic notations (TSE 2009) Complexity management: include explicit mechanisms for dealing with complexity Modularization Abstraction Cognitive integration: include explicit mechanisms to support integration of information for different diagrams Conceptual integration Contextualization Perceptual integration Wayfinding
  • 97. Moody’s monumental contribution to empirical quality/ quality of diagrammatic notations (TSE 2009) Visual expressiveness: use the full range and capacities of visual variables Degree of visual freedom Saturation Dual coding: use text to complement graphics Graphic economy: the number of different graphical symboles should be cognitively maneageble Symbol deficit Cognitive fit: use different visual dialects for different tasks and audiences Visual mono/plurilinguism
  • 98. Moody’s monumental contribution to empirical quality/ quality of diagrammatic notations (TSE 2009) Interactions among principles Semiotic Clarity can affect Graphic Economy either positively or negatively: Symbol excess and symbol redundancy increase graphic complexity, while symbol overload and symbol deficit reduce it. Perceptual Discriminability increases Visual Expressiveness as it involves using more visual variables and a wider range of values (a side effect of increasing visual distance); similarly, Visual Expressiveness is one of the primary ways of improving Perceptual Discriminability. Increasing Visual Expressiveness reduces the effects of graphic complexity, while Graphic Economy defines limits on Visual Expressiveness (how much information can be effectively encoded graphically). Increasing the number of symbols (Graphic Economy) makes it more difficult to discriminate between them (Perceptual Discriminability). Perceptual Discriminability, Complexity Management, Semantic Transparency, Graphic Economy, and Dual Coding improve effectiveness for novices, though Semantic Transparency can reduce effectiveness for experts (Cognitive Fit). Semantic Transparency and Visual Expressiveness can make hand drawing more difficult (Cognitive Fit)
  • 100. Genero et al. 2005 - Formal Meta Classifica Framework schema tion Quality dimension Quality subdimension Metrics Examples Experiments
  • 101. Genero et al classification Maintainability is influenced by the following subcharacteristics: •  Understandability: the ease with which the conceptual data model can be understood. •  Legibility: is the ease with which the conceptual data model can be read, with respect to certain aesthetic criteria [13]. •  Simplicity: means that the conceptual data model contains the minimum number of constructions possible. •  Analysability: the capability of the conceptual data model to be diagnosed for deficiencies or for parts to be modified. •  Modifiability: the capability of the conceptual data model to enable a specified modification to be implemented. •  Stability: the capability of the conceptual data model to avoid unexpected effects from modifications. •  Testability: the capability of the conceptual data model to enable modifications to be validated 101
  • 102. Herden Formal Meta Frame Classification schema work Quality Metadata dimension Quality subdimension Metrics Examples Experiments
  • 103. Herden classification •  Correctness •  Consistency •  Scope •  Level of detail •  Completeness •  Minimality •  Ability of integration (see later) •  Readability 103
  • 104. Herden Dimension Definition (Technical) Correctness Correctness of concepts w.r.t reqs. (Technical) Consistency Absence of contradiction Scope Comprehensive w.r.t. general user acceptance Level of detail Adequacy in detail w.r.t. user acceptance Completeness Completeness w.r.t. requirements Minimality Compactness and absence of redundancies Readability Completeness od documentation 104
  • 105. Metadata in Herden’s classification •  Description •  Relevance •  Measuring •  Metric •  Degree of automation •  Objectivity 105
  • 106. Poels et al Formal Meta Classifica Frame schema tion work Quality dimension Quality subdimension Metrics Examples Experiments
  • 107. Poels et al Interested in •  Perceived semantic quality •  Perceived pragmatic quality To understand their relationship with 1.  Perceived ease of use (efficiency) 2.  Perceived usefullness (effectiveness) and 3.  User information satisfaction
  • 108. Poels et al. classification Quality# Quality dimension Definition PSQ1 The schema represents the business process correctly PSQ2 The schema is a realistic representation of the business process PSQ3 The schema contains contradicting elements PSQ4 The schema contains redundant elements PSQ5 Elements must be added to faithfully represent the business process PSQ6 All the elements in the conceptual schema are relevant for the representation of the business process PSQ7 The schema gives a complete representation of the business process 108 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  • 109. Poels et al. classification Quality# Quality dimension Definition PSQ1 Correctness/ The schema represents the business process Validity correctly PSQ2 Feasible cor- The schema is a realistic representation of rectness/validity the business process PSQ3 Coherence The schema contains contradicting elements PSQ4 Non redundancy The schema contains redundant elements PSQ5 ??? Elements must be added to faithfully represent the business process PSQ6 Relevance All the elements in the conceptual schema are relevant for the representation of the business process PSQ7 Completeness The schema gives a complete representation of the business process 109 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  • 110. Poels et al general findings Perceived usefullness 0,1 0,38 Perceived User Semantic Infornation quality 0,58 satisfaction 0,35 0,29 Perceived ease of use
  • 112. Physical and empirical quality Author(s)/Types of Batini Scand. Moody ArabFrench Genero et Herden Poels qualities et al 91 94- 98- 02- 2005 Physical quality Externalization x Persistence x Availability x Empirical quality Minimality x x x Readability/legibility x x x x x Expressiveness x x x Simplicity/self x x x x explaination Graph aesthetics/ x x x readability/Clarity Understandability X-3 x x 112
  • 113. Syntactic and semantic quality Author(s)/Types of Batini Scand. Moody ArabFrench Genero Herden Poels qualities et al 91 94- 98- 02- et 2005 Syntactic quality Invalidity x x x x Incompleteness x Semantic quality Validity/Correctness x x X-1 x x Feasible validity x x Normality x Integrity X-2 x x Completeness x x X-4 x x Level of detail x Scope x Relevance/Pertinence x x x Perceived semanitc quality x Analyzability x Testability x 113
  • 114. Pragmatic, knowledge and process quality Author(s)/Types of Batini et Scand. Moody ArabFren Genero et Herden Poels qualities al 91- 94- 98- ch 02- 05 Pragmatic quality Comprehension x Social quality x Agreement in x knowledge Agreement in model interpret. Knowledge quality Completeness x Validity x Process quality Implementability x Stability x Maintainability/ Fle- x X - 3 x xibility/Extensibility 114
  • 116. Sheldon classification for Inheritance hierarchies Viewpoints. •  (1) The deeper a class is in the hierarchy, the higher the degree of methods inheritance, making it more complex to predict its behavior. •  (2) Deeper trees constitute greater design complexity, since more methods and classes are involved. •  (3) The deeper a particular class is in the hierarchy, the greater the potential reuse of inherited methods. 116 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  • 117. Sheldon classification for Inheritance hierarchies •  Maintainability •  Understandability •  Modifiability 117 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  • 118. Schema and Data Quality together
  • 119. Person When a schema is defined, quality ID Name Surname can be achieved working both on the schema and on the instance 1 John Smith 2 Mark Bauer 3 Ann Swenson Person Address ID Name Surname Address ID StreetPrefix StreetName Number City 1 John Smith 113 Sunset Avenue A11 Avenue Sunset 113 Chicago 60601 Chicago 2 Mark Bauer 113 Sunset Avenue A12 Street 4 Heroes null Denver 60601 Chicago 3 Ann Swenson 4 Heroes Street Denver ResidenceAddress (a) PersonID AddressID 1 A11 (b) 2 A11 3 A12 119
  • 120. Experimentally investigated by Arab French Quality at schema level Impact Quality at data level Interdependencies
  • 121. Improving the quality of schemas 121
  • 122. Methods •  Origins: achieving normal form  Decomposition techniques •  Scandinavian: derived from the framework •  Through schema transformations 122
  • 123. Derived from framework Syntactic quality •  Error prevention through syntax directed editors •  Error detection through syntactic checks 123
  • 124. Derived from framework Semantic quality •  Consistency checking –  Based on a logical description –  Based on constructivity, namely through properties of the generation process (Langefors et al.) –  Use of driving questions to improve completeness 124
  • 125. Derived from framework Pragmatic quality •  Audience training •  Inspection and walkthroughs •  Transformations (see also later) –  Rephrasing –  Filtering •  Translation –  Explaination generation –  Model execution •  Documentation •  Prototyping 125
  • 126. Derived from framework Social quality •  Integration –  Intra project –  Inter project –  Inter organizational •  Integration process –  Pre-integration –  Viewpoint comparison –  Viewpoint conforming –  Merginf and restructuring 126
  • 128. The Assenova Johannesons approach Dimensions considered Dimension Definition Explicitness Requirements are represented at the schema level, not at instance level Size # of entities + relationships + attributes Rule simplicity High # of business rules are represented by simple type of constraints Rule uniformity Cardinality constraints are uniform, Query simplicity Simple retrieval form requirements corresponds to simple queries on the schema Stability Small changes in requrements result in small changes in requirements 128
  • 129. Dimensions and transformations Explicit Size Rule sim- Rule uni- Query Stabi ness plicity formity Simplic. lity Partial attributes - + + Non surjective attributes - + + Partial attr. which are total in + + - + Union Non-surg. attributes surjective + + - + in Un. M-N attributes - + - + Lexical attributes - + - + Attributes with fixed ranges + - = + Two non disjoint entities + - + + Non unary “overloaded” +/- + attributes 129
  • 130. Example transformation Partial attribute - The size of the schema increases (-) - Introducing the entity EMPLOYEE results in -  increased rule uniformity (+) (all attributes are total) -  increased stability (+)  130
  • 131. Example of increased stability Introducing different categories of employees can be done in the new schema without violating rule simplicity The same cannot be done in the old schema Old schema New schema 131
  • 132. Quality in data integration architectures
  • 133. The approach of Akoka et al. (2007) General statement: In DI Architectures quality of data and quality of schemas have to be considered together •  Qualities at schema level –  Completeness, –  Understandability –  Minimality –  Expressiveness •  Qualities at data level –  Completeness •  Coverage •  Density –  Uniqueness –  Consistency –  Freshness •  Currency •  Timeliness –  Accuracy •  Semantic •  Syntactic •  Precision
  • 134. The approach of De Conseicao et al (2007) Relevant qualities to be evaluated in DI arch. Given a DI Architecture defined in terms of •  [Data, Local Schemas, Global Schema, Data sources] DI Element IQ Criteria Data Sources Reputation; Verifiability; Availability; Response Time Schema Schema completeness, Minimality, Type Consistency Data Data Completeness, Timeliness, Accuracy
  • 135. The approach of De Conseicao et al (2007) Relevant qualities to be evaluated in DI arch. Given a DI Architecture defined in terms of •  [Data, Local Schemas, Global Schema, Data sources] Quality Definition Refers to Metrics Detailed in terms of dimension Completeness Coverage of global Global schema 1 – (# of incomplete schema concepts items / # total wrt the application items) domain Minimality Extent in which the Global schema 1 – (# redundant Attrib. in an entity schema is schema elements / Attrib. in diff. Ent. compaclty modeled # total items) Ent. Redundancy degree and without Redundant Relationship redundancy Entity Redund. of a Schema Relationsh. Red. Of a Schema Schema Minimality Type Data Type Global schema 1 – (# of Data type consistency Consistency uniformity across + inconsistent schema Attribute type consistency the schemas Local schemas elements / # total Schema data type consistency schema elements )
  • 136. The H. Dai et al. approach (2006) •  Focus on Column Heterogeneity e-mail, phone n. Many e-mail and Only E-mail addr. And socsec n. phone numb. And Few phone numb Socsec numbers B more heterogeneous than a B more heterogeneous than c B more heterogeneous than d
  • 137. The H. Dai et al. approach (2006) Focus on Column Heterogeneity Heterogeneity dimensions –  Number of semantic types resulting in different clusters –  Cluster entropy –  Probabilistic soft clustering
  • 138. The Moody’s approach Classification of schemas related by integration Quality categ. Definition Integration Level of consistency of the schema with the rest of the org. data Integr M1 # of data conflicts with the Corporate Schema Integr M1.a # of entity conflicts Integr M1.b # of data element conflicts, namely, defs. and domains Integr M1.c # of naming conflicts (synonims + homonims) Integr. M2 # of data conflicts with existing systems (ES) Integr M2.a # of data element conflicts, namely, defs. and domains with ES Integr M2.b # of key conflicts, namely, defs. and domains with ES Integr M2.c # of naming conflicts (synonims + homonims) with ES Integr M3 # of data elements with duplicate data elem. in ES Integr M4 Rating by representatives of other business areas 138
  • 139. The Chai approach Matchability of schemas •  Focus on the evolution of a Data Integration system, and the cost of maintaining the mediated schema S •  Quality observed: the matchability of S against a matching tool M, defined as •  the average of accuracies of matching S with future schemas F1, F2, …Fn (that we assume known at least to some extent) using M
  • 140. Cases for matching mistakes •  Predict a spurious match •  Miss a match •  Predict a wrong match •  Strategy to improve matchability –  Change concepts in M using rules that minimize error probability
  • 141. Batini et al. 2010 Potential information content
  • 142. The data architecture of a set of databases is the allocation of concepts and tables across the DB data schemas Example of change of data architecture due to improving access efficiency Employee Employee # Distribute Salary d DB Assigned-to Employee # Project # Role Centralized DB Project Project # Budget
  • 143. Data integration technologies •  Virtual data integration •  Data Warehouses •  Application integration •  Consolidation
  • 144. Potential information content Global Boat has schema Tax payer declares Income Find CF, Name of Tax Payer that Tax payer Boat declares <= 30.000 € and has declares has >= 1 Boat Income Tax payer Sources
  • 145. Potential information content •  Given a schema I, global schemas resulting from virtual integration of schemas S1, S2, .., Sn, the potential information content of I is the set of queries that can be performed on I and cannot be performed on S1, S2, .., Sn.
  • 146. Example E1   E2   E6   Q11 Q12 S1 E3   S2 E4   Q21 E5   146
  • 147. Quality of the documentation for large related groups of schemas 147
  • 148. Why integration alone is not enough? ? Hundreds of schemas
  • 149. Relationships investigated •  Integration •  Abstraction •  Abstraction/Integration 149
  • 150. Abstraction 150
  • 151. Abstraction Department Employee City Department Employee Seller Item Order Item in Order of Purchaser Floor Department Employee City Department Employee City Seller Seller Engineer Clerk Item in Order Item in Order of Purchaser of Warranty Warehouse Purchaser 151
  • 153. First case: integration + abstraction Company Production Sales Department structure Department Employee Item Order Department Employee City Seller Item in Order of Purchaser Floor Floor Floor Employee Department Employee City Department Employee City Department Employee Engineer Seller Seller Engineer Clerk Clerk Item in Order City Item Item of in Order of Warehouse Warranty Purchaser Warranty Warehouse Purchaser 153