SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Data warehousing logical design

        Mirjana Mazuran
      mazuran@elet.polimi.it



        December 15, 2009




                                  1/18
Outline




      Data Warehouse logical design
          ROLAP model
               star schema
               snowflake schema
      Exercise 1: wine company
      Exercise 2: real estate agency




                                       2/18
Introduction
Logical design




          Starting from the conceptual design it is necessary to determin
          the logical schema of data
          We use ROLAP (Relational On-Line Analytical Processing)
          model to represent multidimensional data
          ROLAP uses the relational data model, which means that
          data is stored in relations
          Given the DFM representation of multidimensional data, two
          schemas are used:
                 star schema
                 snowflake schema




                                                                            3/18
ROLAP
Star schema


         Each dimension is represented by a relation such that:
              the primary key of the relation is the primary key of the
              dimension
              the attributes of the relation describe all aggregation levels of
              the dimension
         A fact is represented by a relation such that:
              the primary key of the relation is the set of primary keys
              imported from all the dimension tables
              the attributes of the relation are the measures of the fact

    Pros and Cons
         few joins are needed during query execution
         dimension tables are denormalized
         denormalization introduces redundancy

                                                                                  4/18
ROLAP
Star schema: example


                 MaxAmount                     Time                           Policy
       Amount                 Class            IdTime                         IdPolicy
                                               Day                            Class
      Year                                     Month                          Amount
                                               Year                           MaxAmount
                            IdPolicy
         Month                                              Accident
                 Accident         Motivation                IdTime
                                                            IdPolicy
             NrOfAccidents                                  IdCustomer
      Day    Cost                                           IdMotivation
                                                            NrOfAccidents
                         IDCustomer                         Cost

                                               Customer                     Motivation
                                               IdCustomer                   IdMotivation
        BYear                City Region       Birthday                     Motivation
                                               Gender
                   Sex
                                               City
                                               Region


                                                                                           5/18
ROLAP
Snowflake schema
        Each (primary) dimension is represented by a relation:
             the primary key of the relation is the primary key of the
             dimension
             the attributes of the relation directly depend by the primary key
             a set of foreign keys is used to access information at different
             levels of aggregation. Such information is part of the
             secondary dimensions and is stored in dedicated relations
        A fact is represented by a relation such that:
             the primary key of the relation is the set of primary keys
             imported from all and only the primary dimension tables
             the attributes of the relation are the measures of the fact

    Pros and Cons
        denormalization is reduced
        less memory space is required
        a lot of joins can be required if they involve attributes in
        secondary dimension tables
                                                                                 6/18
ROLAP
Snowflake schema: example



                Category                        Time                       Furniture
                                                IdTime                     IdFurniture
             Type         Material
                                                Day                        Material
     Year                                       Month                      Type
                                                Year                       Category
                          IdFurniture
       Month                                                 Sale
                   Sale                                      IdTime
                                                             IdFurniture
              Quantity
                                                             IdCustomer
              Income
     Day      Discount                                       Quantity
                                                             Income
                      IDCustomer                             Discount

                              Region            Customer                     Place
                                                IdCustomer                   IdPlace
            BYear          City         State   IdPlace                      City
                                                BirthYear                    Region
                    Sex                         Sex                          State



                                                                                         7/18
Exercise 1
Wine company

    An online order wine company requires the designing of a
    datawarehouse to record the quantity and sales of its wines to its
    customers. Part of the original database is composed by the
    following tables:
    CUSTOMER (Code, Name, Address, Phone, BDay, Gender)
          WINE (Code, Name, Type, Vintage, BottlePrice, CasePrice,
               Class)
         CLASS (Code, Name, Region)
          TIME (TimeStamp, Date, Year)
        ORDER (Customer, Wine, Time, nrBottles, nrCases)
    Note that the tables represent the main entities of the ER schema,
    thus it is necessary to derive the significant relashionships among
    them in order to correctly design the data warehouse.

                                                                         8/18
Exercise 1: A possible solution
Snowflake schema

          FACT Sales
    MEASURES Quantity, Cost
    DIMENSIONS Customer, Area, Time, Wine → Class

          Customer                        Wine
         CustomerCode                     WineCode
         Birthday                         ClassCode
         Gender                           Type
                         SalesTable
                                          Vintage
                         WineCode
                         CustomerCode
                         OrderTimeCode
                         Quantity
         Time                             Class
                         Cost
         TimeCode                         ClassCode
         Date                             Region
         Year




                                                      9/18
Exercise 2
Real estate agency
    Let us consider the case of a real estate agency whose database is
    composed by the following tables:
        OWNER (IDOwner, Name, Surname, Address, City, Phone)
        ESTATE (IDEstate, IDOwner, Category, Area, City, Province,
                  Rooms, Bedrooms, Garage, Meters)
    CUSTOMER (IDCust, Name, Surname, Budget, Address, City,
                  Phone)
         AGENT (IDAgent, Name, Surname, Office, Address, City,
                  Phone)
       AGENDA (IDAgent, Data, Hour, IDEstate, ClientName)
           VISIT (IDEstate,IDAgent, IDCust, Date, Duration)
           SALE (IDEstate,IDAgent, IDCust, Date, AgreedPrice,
                  Status)
          RENT (IDEstate,IDAgent, IDCust, Date, Price, Status,
                  Time)
                                                                         10/18
Exercise 2
Real estate agency
          Goal:
               Provide a supervisor with an overview of the situation. The
               supervisor must have a global view of the business, in terms of
               the estates the agency deals with and of the agents’ work.
          Questions:
            1. Design a conceptual schema for the DW.
            2. What facts and dimensions do you consider?
            3. Design a Star Schema or Snowflake Schema for the DW.
          Write the following SQL queries:
               How many customers have visited properties of at least 3
               different categories?
               What is the average duration of visits per property category?
               Who has paid the highest price among the customers that
               have viewed properties of at least 3 different categories?
               Who has bought a flat for the highest price w.r.t. each month?
               What kind of property sold for the highest price w.r.t each city
               and month?
                                                                                  11/18
Exercise 2: A possible solution
Facts and dimensions
    Points 1 and 2 are left as homework. In particular it is required to
    discuss the facts of interest (with respect to your point of view)
    and then define, for each fact, the attribute tree, dimensions and
    measures with the corresponding glossary.
    The following ideas will be used during the solution of the exercise:
         supervisors should be able to control the sales of the agency
                FACT Sales
         MEASURES OfferPrice, AgreedPrice, Status
         DIMENSIONS EstateID, OwnerID, CustomerID, AgentID,
                       TimeID
         supervisors should be able to control the work of the agents by
         analyzing the visits to the estates, which the agents are in
         charge of
                FACT Viewing
         MEASURES Duration
         DIMENSIONS EstateID, CustomerID, AgentID, TimeID
                                                                            12/18
Exercise 2: A possible solution
Star schema
       Estate                           Owner
       EstateID          SalesTable     OwnerID
       Category          EstateID       Name
       Area              OwnerID        Surname
       City              CustomerID     Address
       Province          AgentID        City
       Rooms             TimeID         Phone
       Bedrooms          OfferPrice
                                         Agent
       Garage            AgreedPrice
                                        AgentID
       Meters            Status
                                        Name
       Sheet
                                        Surname
       Map
                                        Office
                                        Address
        Customer                        City
       CustomerID                       Phone
                         ViewingTable
       Name
                         EstateID
       Surname                           Time
                         CustomerID
       Budget                           TimeID
                         AgentID
       Address                          Day
                         TimeID
       City                             Month
                         Duration
       Phone                            Year
                                                  13/18
Exercise 2: A possible solution
Snowflake schema
       Estate            SalesTable      Time
      EstateID          EstateID        TimeID
      PlaceID           TimeID          Day
      Category          CustomerID      Month
      Rooms              OwnerID        Year
      Bedrooms          AgentID         Owner
      Garage            OfferPrice      OwnerID
      Meters            AgreedPrice     PlaceID
      Sheet             Status          Name
      Map                               Surname
       Customer          ViewingTable
                                        Address
      CustomerID        TimeID
                                        Phone
      PlaceID           EstateID
                                         Agent
      Name               CustomerID
                                        AgentID
      Surname           AgentID         PlaceID
      Budget            Duration        Name
      Address                           Surname
                         Place
      Phone                             Office
                        PlaceID
                                        Address
                        City
                                        Phone
                        Province
                        Area

                                                  14/18
Exercise 2: A possible solution
SQL queries wrt the star schema



          How many customers have visited properties of at least 3
          different categories?

          CREATE VIEW Cust3Cat AS
             SELECT V.CustomerID
             FROM ViewingTable V, Estate E
             WHERE V.EstateID = E.EstateID
             GROUP BY V.CustomerID
             HAVING COUNT(DISTINCT E.Category) >=3

          SELECT COUNT(*)
          FROM Cust3Cat



                                                                     15/18
Exercise 2: A possible solution
SQL queries wrt the star schema
          Who has paid the highest price among the customers that
          have viewed properties of at least 3 different categories?
          SELECT C.CustomerID
          FROM Cust3Cat C, SalesTable S
          WHERE C.CustID = S.CustID AND S.AgreedPrice IN
          (SELECT MAX(S.AgreedPrice)
                    FROM Cust3Cat C1, SalesTable S1
                    WHERE C1.CustomerID = S1.CustomerID)

          Average duration of visits per property category
          SELECT E.Category, AVG(V.Duration)
          FROM ViewingTable V, Estate E
          WHERE V.EstateID = E.EstateID
          GROUP BY E.Category
                                                                      16/18
Exercise 2: A possible solution
SQL queries wrt the star schema


          Who has bought a flat for the highest price w.r.t. each month?

          SELECT S.CustomerID, T.Month, T.Year, S.AgreedPrice
          FROM SalesTable S, Estate E, Time T
          WHERE S.EstateID = E.EstateID AND S.TimeID =
          T.TimeID AND E.Category = “flat” AND (T.Month,
          T.Year, S.AgreedPrice) IN (
               SELECT T1.Month, T1.Year,
               MAX(S1.AgreedPrice)
               FROM SalesTable S1, Estate E1, Time T1
               WHERE S1.EstateID = E1.EstateID AND
               S1.TimeID = T1.TimeID AND E1.Category =
               “flat”
               GROUP BY T1.Month, T1.Year

                                                                          17/18
Exercise 2: A possible solution
SQL queries wrt the star schema
          What kind of property sold for the highest price w.r.t each city
          and month?
          SELECT E.Category, E.City, T.Moth, T.Year,
                 E.AgreedPrice
          FROM SalesTable S, Time T, Estate E
          WHERE S.TimeID = T.TimeID AND E.EstateID =
          S.EstateID AND (P.AgreedPrice, P.City, T.month,
          T.year) IN (
                    SELECT MAX(E1.AgreedPrice), E1.City,
                    T1.Month, T1.Year)
                    FROM SalesTable S1, Time T1, Estate E1
                    WHERE S1.TimeID = T1.TimeID AND
                    E1.EstateID = S1.EstateID
                    GROUP BY T.Month, T.Year, E.City)

                                                                             18/18

Contenu connexe

Tendances

Dfd examples
Dfd examplesDfd examples
Dfd examplesMohit
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data modeljagdish_93
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesKrish_ver2
 
MDM for Customer data with Talend
MDM for Customer data with Talend MDM for Customer data with Talend
MDM for Customer data with Talend Jean-Michel Franco
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To AnalyticsAlex Meadows
 
1. Introduction to IoT
1. Introduction to IoT1. Introduction to IoT
1. Introduction to IoTAbhishek Das
 
Capacity Planning for Cloud Computing
Capacity Planning for Cloud ComputingCapacity Planning for Cloud Computing
Capacity Planning for Cloud ComputingAdrian Cockcroft
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data ManagementSung Kuan
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data VisualizationCenterline Digital
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Seerat Malik
 
Unit 3 object analysis-classification
Unit 3 object analysis-classificationUnit 3 object analysis-classification
Unit 3 object analysis-classificationgopal10scs185
 
10. XML in DBMS
10. XML in DBMS10. XML in DBMS
10. XML in DBMSkoolkampus
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsJOSEPH FRANCIS
 

Tendances (20)

Dfd examples
Dfd examplesDfd examples
Dfd examples
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 
3 data visualization
3 data visualization3 data visualization
3 data visualization
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
MDM for Customer data with Talend
MDM for Customer data with Talend MDM for Customer data with Talend
MDM for Customer data with Talend
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
 
1. Introduction to IoT
1. Introduction to IoT1. Introduction to IoT
1. Introduction to IoT
 
Capacity Planning for Cloud Computing
Capacity Planning for Cloud ComputingCapacity Planning for Cloud Computing
Capacity Planning for Cloud Computing
 
DBMS Unit 2 - Relational Model
DBMS Unit 2 - Relational ModelDBMS Unit 2 - Relational Model
DBMS Unit 2 - Relational Model
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data Visualization
 
Data mart
Data martData mart
Data mart
 
Screen based controls in HCI
Screen based controls in HCIScreen based controls in HCI
Screen based controls in HCI
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 
Unit 3 object analysis-classification
Unit 3 object analysis-classificationUnit 3 object analysis-classification
Unit 3 object analysis-classification
 
MDM and Reference Data
MDM and Reference DataMDM and Reference Data
MDM and Reference Data
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
 
10. XML in DBMS
10. XML in DBMS10. XML in DBMS
10. XML in DBMS
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
 

En vedette

Wise real estate planning
Wise real estate planningWise real estate planning
Wise real estate planningPaul Cheema
 
ROLAP partitioning in MS SQL Server 2016
ROLAP partitioning in MS SQL Server 2016ROLAP partitioning in MS SQL Server 2016
ROLAP partitioning in MS SQL Server 2016Andrej Zafka
 
Difference between molap, rolap and holap in ssas
Difference between molap, rolap and holap  in ssasDifference between molap, rolap and holap  in ssas
Difference between molap, rolap and holap in ssasUmar Ali
 
Decision Support Systems
Decision Support SystemsDecision Support Systems
Decision Support SystemsShigem
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Internet of Things and its applications
Internet of Things and its applicationsInternet of Things and its applications
Internet of Things and its applicationsPasquale Puzio
 

En vedette (7)

Wise real estate planning
Wise real estate planningWise real estate planning
Wise real estate planning
 
ROLAP partitioning in MS SQL Server 2016
ROLAP partitioning in MS SQL Server 2016ROLAP partitioning in MS SQL Server 2016
ROLAP partitioning in MS SQL Server 2016
 
Difference between molap, rolap and holap in ssas
Difference between molap, rolap and holap  in ssasDifference between molap, rolap and holap  in ssas
Difference between molap, rolap and holap in ssas
 
Decision Support Systems
Decision Support SystemsDecision Support Systems
Decision Support Systems
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Internet of Things and its applications
Internet of Things and its applicationsInternet of Things and its applications
Internet of Things and its applications
 

Similaire à Dwlogical0910

Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...FrederikN
 
Kishore jaladi-dw
Kishore jaladi-dwKishore jaladi-dw
Kishore jaladi-dwsam2sung2
 
Tirta ERP - Business Intelligence Layer
Tirta ERP - Business Intelligence LayerTirta ERP - Business Intelligence Layer
Tirta ERP - Business Intelligence LayerWildan Maulana
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modelingjamessnape
 
Marino mercedesportfolio201212
Marino mercedesportfolio201212Marino mercedesportfolio201212
Marino mercedesportfolio201212Marino Mercedes
 
Marino-mercedesportfolio201212
Marino-mercedesportfolio201212Marino-mercedesportfolio201212
Marino-mercedesportfolio201212Marino Mercedes
 
Greenfield Development with CQRS and Windows Azure
Greenfield Development with CQRS and Windows AzureGreenfield Development with CQRS and Windows Azure
Greenfield Development with CQRS and Windows AzureDavid Hoerster
 
Carnegie Presentation
Carnegie PresentationCarnegie Presentation
Carnegie Presentationgopack21
 
8 sony - innovating with tag management
8   sony - innovating with tag management8   sony - innovating with tag management
8 sony - innovating with tag managementEnsighten
 
Dw design 2_conceptual_model
Dw design 2_conceptual_modelDw design 2_conceptual_model
Dw design 2_conceptual_modelClaudia Gomez
 
StrikeIron Address Verification for Salesforce.com Datasheet
StrikeIron Address Verification for Salesforce.com DatasheetStrikeIron Address Verification for Salesforce.com Datasheet
StrikeIron Address Verification for Salesforce.com Datasheetmflannigan
 
Micro strategy Reporting Suite
Micro strategy Reporting SuiteMicro strategy Reporting Suite
Micro strategy Reporting SuiteClassic Polo
 
retail marketing
retail marketingretail marketing
retail marketingsabasri
 
Document process v7
Document process v7Document process v7
Document process v7Bala Kris
 
ICA Incentives
ICA IncentivesICA Incentives
ICA IncentivesBirkhoff
 
A case study on Movie Retail Business
A case study on Movie Retail BusinessA case study on Movie Retail Business
A case study on Movie Retail BusinessAshish Singh
 

Similaire à Dwlogical0910 (20)

Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
 
Kishore jaladi-dw
Kishore jaladi-dwKishore jaladi-dw
Kishore jaladi-dw
 
Tirta ERP - Business Intelligence Layer
Tirta ERP - Business Intelligence LayerTirta ERP - Business Intelligence Layer
Tirta ERP - Business Intelligence Layer
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Marino mercedesportfolio201212
Marino mercedesportfolio201212Marino mercedesportfolio201212
Marino mercedesportfolio201212
 
Marino-mercedesportfolio201212
Marino-mercedesportfolio201212Marino-mercedesportfolio201212
Marino-mercedesportfolio201212
 
Greenfield Development with CQRS and Windows Azure
Greenfield Development with CQRS and Windows AzureGreenfield Development with CQRS and Windows Azure
Greenfield Development with CQRS and Windows Azure
 
SAP Power Designer
SAP Power DesignerSAP Power Designer
SAP Power Designer
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Carnegie Presentation
Carnegie PresentationCarnegie Presentation
Carnegie Presentation
 
8 sony - innovating with tag management
8   sony - innovating with tag management8   sony - innovating with tag management
8 sony - innovating with tag management
 
Dw design 2_conceptual_model
Dw design 2_conceptual_modelDw design 2_conceptual_model
Dw design 2_conceptual_model
 
Competitve Intelligence and Pricing
Competitve Intelligence and PricingCompetitve Intelligence and Pricing
Competitve Intelligence and Pricing
 
StrikeIron Address Verification for Salesforce.com Datasheet
StrikeIron Address Verification for Salesforce.com DatasheetStrikeIron Address Verification for Salesforce.com Datasheet
StrikeIron Address Verification for Salesforce.com Datasheet
 
Micro strategy Reporting Suite
Micro strategy Reporting SuiteMicro strategy Reporting Suite
Micro strategy Reporting Suite
 
retail marketing
retail marketingretail marketing
retail marketing
 
Document process v7
Document process v7Document process v7
Document process v7
 
Birdie Analysis
Birdie AnalysisBirdie Analysis
Birdie Analysis
 
ICA Incentives
ICA IncentivesICA Incentives
ICA Incentives
 
A case study on Movie Retail Business
A case study on Movie Retail BusinessA case study on Movie Retail Business
A case study on Movie Retail Business
 

Dernier

ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 

Dernier (20)

ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 

Dwlogical0910

  • 1. Data warehousing logical design Mirjana Mazuran mazuran@elet.polimi.it December 15, 2009 1/18
  • 2. Outline Data Warehouse logical design ROLAP model star schema snowflake schema Exercise 1: wine company Exercise 2: real estate agency 2/18
  • 3. Introduction Logical design Starting from the conceptual design it is necessary to determin the logical schema of data We use ROLAP (Relational On-Line Analytical Processing) model to represent multidimensional data ROLAP uses the relational data model, which means that data is stored in relations Given the DFM representation of multidimensional data, two schemas are used: star schema snowflake schema 3/18
  • 4. ROLAP Star schema Each dimension is represented by a relation such that: the primary key of the relation is the primary key of the dimension the attributes of the relation describe all aggregation levels of the dimension A fact is represented by a relation such that: the primary key of the relation is the set of primary keys imported from all the dimension tables the attributes of the relation are the measures of the fact Pros and Cons few joins are needed during query execution dimension tables are denormalized denormalization introduces redundancy 4/18
  • 5. ROLAP Star schema: example MaxAmount Time Policy Amount Class IdTime IdPolicy Day Class Year Month Amount Year MaxAmount IdPolicy Month Accident Accident Motivation IdTime IdPolicy NrOfAccidents IdCustomer Day Cost IdMotivation NrOfAccidents IDCustomer Cost Customer Motivation IdCustomer IdMotivation BYear City Region Birthday Motivation Gender Sex City Region 5/18
  • 6. ROLAP Snowflake schema Each (primary) dimension is represented by a relation: the primary key of the relation is the primary key of the dimension the attributes of the relation directly depend by the primary key a set of foreign keys is used to access information at different levels of aggregation. Such information is part of the secondary dimensions and is stored in dedicated relations A fact is represented by a relation such that: the primary key of the relation is the set of primary keys imported from all and only the primary dimension tables the attributes of the relation are the measures of the fact Pros and Cons denormalization is reduced less memory space is required a lot of joins can be required if they involve attributes in secondary dimension tables 6/18
  • 7. ROLAP Snowflake schema: example Category Time Furniture IdTime IdFurniture Type Material Day Material Year Month Type Year Category IdFurniture Month Sale Sale IdTime IdFurniture Quantity IdCustomer Income Day Discount Quantity Income IDCustomer Discount Region Customer Place IdCustomer IdPlace BYear City State IdPlace City BirthYear Region Sex Sex State 7/18
  • 8. Exercise 1 Wine company An online order wine company requires the designing of a datawarehouse to record the quantity and sales of its wines to its customers. Part of the original database is composed by the following tables: CUSTOMER (Code, Name, Address, Phone, BDay, Gender) WINE (Code, Name, Type, Vintage, BottlePrice, CasePrice, Class) CLASS (Code, Name, Region) TIME (TimeStamp, Date, Year) ORDER (Customer, Wine, Time, nrBottles, nrCases) Note that the tables represent the main entities of the ER schema, thus it is necessary to derive the significant relashionships among them in order to correctly design the data warehouse. 8/18
  • 9. Exercise 1: A possible solution Snowflake schema FACT Sales MEASURES Quantity, Cost DIMENSIONS Customer, Area, Time, Wine → Class Customer Wine CustomerCode WineCode Birthday ClassCode Gender Type SalesTable Vintage WineCode CustomerCode OrderTimeCode Quantity Time Class Cost TimeCode ClassCode Date Region Year 9/18
  • 10. Exercise 2 Real estate agency Let us consider the case of a real estate agency whose database is composed by the following tables: OWNER (IDOwner, Name, Surname, Address, City, Phone) ESTATE (IDEstate, IDOwner, Category, Area, City, Province, Rooms, Bedrooms, Garage, Meters) CUSTOMER (IDCust, Name, Surname, Budget, Address, City, Phone) AGENT (IDAgent, Name, Surname, Office, Address, City, Phone) AGENDA (IDAgent, Data, Hour, IDEstate, ClientName) VISIT (IDEstate,IDAgent, IDCust, Date, Duration) SALE (IDEstate,IDAgent, IDCust, Date, AgreedPrice, Status) RENT (IDEstate,IDAgent, IDCust, Date, Price, Status, Time) 10/18
  • 11. Exercise 2 Real estate agency Goal: Provide a supervisor with an overview of the situation. The supervisor must have a global view of the business, in terms of the estates the agency deals with and of the agents’ work. Questions: 1. Design a conceptual schema for the DW. 2. What facts and dimensions do you consider? 3. Design a Star Schema or Snowflake Schema for the DW. Write the following SQL queries: How many customers have visited properties of at least 3 different categories? What is the average duration of visits per property category? Who has paid the highest price among the customers that have viewed properties of at least 3 different categories? Who has bought a flat for the highest price w.r.t. each month? What kind of property sold for the highest price w.r.t each city and month? 11/18
  • 12. Exercise 2: A possible solution Facts and dimensions Points 1 and 2 are left as homework. In particular it is required to discuss the facts of interest (with respect to your point of view) and then define, for each fact, the attribute tree, dimensions and measures with the corresponding glossary. The following ideas will be used during the solution of the exercise: supervisors should be able to control the sales of the agency FACT Sales MEASURES OfferPrice, AgreedPrice, Status DIMENSIONS EstateID, OwnerID, CustomerID, AgentID, TimeID supervisors should be able to control the work of the agents by analyzing the visits to the estates, which the agents are in charge of FACT Viewing MEASURES Duration DIMENSIONS EstateID, CustomerID, AgentID, TimeID 12/18
  • 13. Exercise 2: A possible solution Star schema Estate Owner EstateID SalesTable OwnerID Category EstateID Name Area OwnerID Surname City CustomerID Address Province AgentID City Rooms TimeID Phone Bedrooms OfferPrice Agent Garage AgreedPrice AgentID Meters Status Name Sheet Surname Map Office Address Customer City CustomerID Phone ViewingTable Name EstateID Surname Time CustomerID Budget TimeID AgentID Address Day TimeID City Month Duration Phone Year 13/18
  • 14. Exercise 2: A possible solution Snowflake schema Estate SalesTable Time EstateID EstateID TimeID PlaceID TimeID Day Category CustomerID Month Rooms OwnerID Year Bedrooms AgentID Owner Garage OfferPrice OwnerID Meters AgreedPrice PlaceID Sheet Status Name Map Surname Customer ViewingTable Address CustomerID TimeID Phone PlaceID EstateID Agent Name CustomerID AgentID Surname AgentID PlaceID Budget Duration Name Address Surname Place Phone Office PlaceID Address City Phone Province Area 14/18
  • 15. Exercise 2: A possible solution SQL queries wrt the star schema How many customers have visited properties of at least 3 different categories? CREATE VIEW Cust3Cat AS SELECT V.CustomerID FROM ViewingTable V, Estate E WHERE V.EstateID = E.EstateID GROUP BY V.CustomerID HAVING COUNT(DISTINCT E.Category) >=3 SELECT COUNT(*) FROM Cust3Cat 15/18
  • 16. Exercise 2: A possible solution SQL queries wrt the star schema Who has paid the highest price among the customers that have viewed properties of at least 3 different categories? SELECT C.CustomerID FROM Cust3Cat C, SalesTable S WHERE C.CustID = S.CustID AND S.AgreedPrice IN (SELECT MAX(S.AgreedPrice) FROM Cust3Cat C1, SalesTable S1 WHERE C1.CustomerID = S1.CustomerID) Average duration of visits per property category SELECT E.Category, AVG(V.Duration) FROM ViewingTable V, Estate E WHERE V.EstateID = E.EstateID GROUP BY E.Category 16/18
  • 17. Exercise 2: A possible solution SQL queries wrt the star schema Who has bought a flat for the highest price w.r.t. each month? SELECT S.CustomerID, T.Month, T.Year, S.AgreedPrice FROM SalesTable S, Estate E, Time T WHERE S.EstateID = E.EstateID AND S.TimeID = T.TimeID AND E.Category = “flat” AND (T.Month, T.Year, S.AgreedPrice) IN ( SELECT T1.Month, T1.Year, MAX(S1.AgreedPrice) FROM SalesTable S1, Estate E1, Time T1 WHERE S1.EstateID = E1.EstateID AND S1.TimeID = T1.TimeID AND E1.Category = “flat” GROUP BY T1.Month, T1.Year 17/18
  • 18. Exercise 2: A possible solution SQL queries wrt the star schema What kind of property sold for the highest price w.r.t each city and month? SELECT E.Category, E.City, T.Moth, T.Year, E.AgreedPrice FROM SalesTable S, Time T, Estate E WHERE S.TimeID = T.TimeID AND E.EstateID = S.EstateID AND (P.AgreedPrice, P.City, T.month, T.year) IN ( SELECT MAX(E1.AgreedPrice), E1.City, T1.Month, T1.Year) FROM SalesTable S1, Time T1, Estate E1 WHERE S1.TimeID = T1.TimeID AND E1.EstateID = S1.EstateID GROUP BY T.Month, T.Year, E.City) 18/18