SlideShare une entreprise Scribd logo
1  sur  58
Data Warehouse techniques on Intermediate Census
              and Demographic Statistics Web sites
                        “View data from different points of view”
                                                                    Vincenzo Patruno - ISTAT
                                                                    TES course:
                                                                    Techniques for Data
                                                                    Dissemination

                                                                    Madrid 9th April 2003

Http://cens.istat.it




Http://demo.istat.it




      M. C. Escher
           Relativity
                                                                                  1
Contents

•What a Data Warehouse is
•The User Interface: How users make queries
•Data modelling: Two ways to organise data
•Software environment
•Costs and maintenance




                                              2
Data Warehouse
•A data warehouse is a central repository for all or significant
parts of the data that an enterprise's various business systems
collect
•A data warehouse is a collection of data designed to support
management decision making
•A data warehouse is a computer system designed to give
business decision makers instant access to information by
copying data from existing systems and storing it for use by
executives.


                                                                   3
Data Warehouse
  A data warehouse is a copy of transaction data specifically
           structured for querying and reporting
        (Ralph Kimball's definition on page 310 of The Data Warehouse Toolkit - John Wiley & Sons 1996)




Queries and reports generated from data stored in a data warehouse may or
may not be used for analysis.




                                                                                                          4
User Interface
Is the door to access data
The goal of demo.istat.it and cens.istat.it has been to obtain a
user interface to permit:
   •Easy data access
   •Handy parameter selection
   •Fast Data Base queries
and suitable to easily show data from different points of view
using Internet technologies

                                                                   5
User Interface

On the Web that means:
•building the User interface as an HTML page
•using available technologies (DHTML)
•using HTTP protocol to send queries and to obtain results




                                                             6
User Interface

Building an User interface on the Web is completely different to
traditional programming because of:
   •DHTML limits
   •Cross-Browser problems
   •The nature of HTTP




                                                              7
User Interface

Because of this, in general:
•Web programmers tend to develop systems in depth
•It means users have to click many times to obtain results


      Our goal has been to obtain a user interface with
        a behaviour similar to traditional applications



                                                             8
User Interface
                                  2


                   1              3


The adopted solution has been placing into the same window the
 frames 1 and 2 to select parameters and variables and using the
frame 3 to show results according to the reporting policy adopted

                                                             9
User Interface
In this way, users can always use the same window
   •to perform queries
   •to see results
   •to save data
Advantages
   •Easy and fast access to data
Disadvantages
   •It doesn’t look so good

                                                    10
User Interface




 How they work



                 11
Data Model
During the Interface analysis we talked about:
   •Analysis Variables
   •Dimensions
   •Data shown from different points of view



 Both systems are built according to the modelling techniques
              used to build up Data Warehouses

                                                                12
Data Model
The user interface is built around a data structure suitable to be
queried from different points of view.
                                      but
      To do this, it is very important to build first of all a good
                          conceptual data model


   “The conceptual data model isn’t an exercise in intellectual gymnastics for
        engineers but the starting point to build good software systems”



                                                                          13
Dimensional Modelling

•DM is a favourite modelling technique in data warehousing
•In DM, a model of tables and relations is built with the
purpose of maximising decision support and query
performance in relational databases
•It’s an excellent technique to build data models to optimise
OLAP performances




                                                                14
Dimensional Modelling

In contrast, conventional E-R models are constituted to
   •Removing the redundancy in data models
   •Facilitating the retrieval of individual records having certain
   critical identifiers
   •Optimising OLTP performance




                                                                 15
Dimensional Modelling

OLTP: On line transaction processing
Is a class of programs that facilitates and manages
transaction-oriented applications (typically for data entry and
retrieval transactions)


OLAP: On Line Analytical Processing
Enables a user to easily and selectively extract and view data
from different points of view


                                                             16
Dimensional Modelling


OLTP based system=“gets the data in”


OLAP based system =“gets the data out”




                                         17
Dimensional Modelling
•The data warehouse exists to answer questions people have about
the “business”
•Dimensional modelling techniques ensure that the DW design
reflects the way users think about the “business” and that the DW
can be used to answer their questions.
•A dimensional model (DM) captures the measurement of
importance to a “business” and the parameters by which the
measurements are broken out.
•Is an excellent tool for identifying and classifying the important
business components in a subject area.

                                                                  18
Dimensional Modelling
                How to build a Dimensional Model


•DM is built around a “business subject” (in our case “statistical
subject”)
•It means we have firstly to identify our subject to be modelled
and all the measures that describe our subject.
•At the same time we have to identify the parameters by which a
measurement can be viewed.




                                                                     19
Dimensional Modelling
The measurements are referred to as FACTS
The parameters by which a fact can be viewed are referred to as
DIMENSIONS
The level of detail of measures in the fact table is referred to as
GRAIN


N.B.
   It is crucial that every row in the “fact” table be recorded at
                   exactly the same level of detail

                                                                      20
Dimensional Modelling

                           Class of Employees

                           EmployeesClass_PK
                           EmployeesClass_Name
       Enterprise Facts
       Employees
       Profits
       Loss
       EmployeesClass_FK
       Geography_FK        Geography

                           Geography_PK
                           Municipality
                           Province
                           Region
                           Geographical Area



                                                 21
Dimensional Modelling
… or, speaking about demo.istat.it


                           Resident Population
                                  Facts
                            Male                 Geography
                            Female
        Age                                      Geography_PK
                            Male Married
        Age_PK                                   Municipality
                            Female Married
        Age_desc                                 Province
                            ……
                                                 Region
                            Age_FK
                                                 Geographical Area
                            Geography_FK




                                                                     22
Dimensional Modelling

•A Dimensional Model does not change much when implemented
in a relational database. (The DM is referred to as Star Schema)
•Each box of dimension attributes becomes a table in the
database, referred to as a dimension table.
•The fact table becomes a very large table containing a very large
number of rows. It contains the measures plus foreign keys that
relate each measurement to the appropriate rows in each of the
dimension tables.


                                                              23
Dimensional Modelling

                This is
         what we have done with




We have implemented the fact table in a
         relational database


                                          24
Dimensional Modelling

We have also created aggregate tables by “Geography” dimension

                         Resident Population
                                Facts
                        AGG_BY_PROVINCE

                           Male                Province
                           Female
       Age                                     Province_PK
                           Male Married
       Age_PK                                  Province
                           Female Married
       Age_desc
                           ……
                           Age_FK
                           Province_FK




                                                             25
Dimensional Modelling

Resident Population Facts
    AGG_BY_ITALY




Resident Population Facts                 Resident Population Facts
AGG_BY_GEOGRAPHI                            AGG_BY_REGION
    CAL_AREA



                                       Resident Population Facts
                                        AGG_BY_PROVINCE


                                   BASE Resident
                                  Population Facts


                                                                      26
Dimensional Modelling

The aggregate tables contain the same facts as the base fact table,
          but they are recorded at a different GRAIN



       It’s an excellent way to manage hierarchies




                                                                27
Dimensional Modelling

  Dimensional Models are not always implemented in relational
                           data bases.



Several vendors offer multidimensional databases (MDDB) which
 store information in a different format often referred to as cube




                                                              28
Dimensional Modelling

                This is

        what we have done with




We have implemented the fact table in a
     multidimensional database


                                          29
MDDB

A multidimensional database (MDDB) is a specialised storage
facility that allows data to be stored in a matrix-like format


•It contains all possible values resulting from crossing all
dimensions and all measures.

•The whole of these values is referred to as Nway Cube




                                                                 30
MDDB

Besides, the process of building an MDDB, it summarises also the
raw data according to hierarchical dimensions
Summarised data is stored in data structures referred to as
subcubes


An MDDB stores its data as an Nway Cube and zero or more
                        subcubes




                                                              31
MDDB

Subcubes are built to enhance reporting speed
If a subcube does not exist for a particular aggregate query, that
is, if no subcube defines the exact crossing required to answer the
query, the aggregate data will be derived from the smallest
subcube that can provide the data.
If no subcube can provide the data, it is derived from the Nway
cube




                                                                  32
MDDB


If you know of common queries that can be answered using a
smaller set of crossings, you could create a subcube that specifies
the exact crossing required.


         Subcubes are often referred to as Data Marts




                                                                33
Summary

We have seen two ways to implement the fact table.
•As a Relational Database (for demo.istat.it)
•As a Multidimensional Database (for cens.istat.it)
Now we are going to see the tool to implement the DB and how the
programs to connect the DB with the User Interface have been built




                                                             34
Developing environment

To implement demo.istat.it we’ve used:
•mSQL as Relational Database (http://www.hughes.com.au/)
•PHP as programming language (http://www.php.net/)


 PHP is a server-side, cross-platform, HTML embedded scripting
                            language.




                                                           35
HTML
Welcome.html
                                      HTML pages
<HTML>
<BODY bgcolor=“#FFFFFF”>
          <H1>Welcome!<H1>
</BODY>
</HTML>                                          1   2


1. The Client required Welcome.html
2. The Web Server send Welcome.html to the Client


Welcome.html is interpreted by the Browser and
displayed on the screen
                                                         36
PHP
Welcome.php                                      2
                                                        PHP
<?php
                                     Php pages       interpreter
print “<HTML>
     <BODY bgcolor=“#FFFFFF”>
           <H1>Welcome!<H1>
     </BODY>
      </HTML>”;
                                             1          RDBMS
?>
                                                 3
1. The Client required Welcome.php
2. The PHP interpreter runs
Welcome.php
3. Results are sent to the client

                                                            37
SAS

In contrast, to implement cens.istat.it we’ve used:
•SAS/MDDB (to build Multidimensional Databases)
•SAS/IntrNet (to run SAS programs on the Web)
•DAB (is a tool to generate automatically all programs and all
javascript to query the MDDB via Web)




                                                                 38
SAS
                                          2
1. The client                                     Cgi-bin          Broker
required to run a      HTML pages
SAS program (i.e.
sending a form)
2. The SAS broker                                           3   Sas programs
(cgi-bin program)
calls the SAS
                              1               4
program stored in an
independent area
3. SAS program runs
and accesses data                                                 MDDB
4. Results are sent
to the client


                                                                            39
Costs and maintenance

How much it costs to build and to maintain both systems in
terms of

•Money
•People
•Time




                                                             40
Costs and maintenance
To build demo.istat.it we needed to build:

   •The relational database containing the Fact Table

   •The Aggregate tables

   •The Dimension tables

   •All programs to query tables and to format outputs as html
   page and as csv file.

   •All JavaScripts to manage the user interface

                                                                 41
Costs and maintenance
To build cens.istat.it we needed to:

   •Build the MDDB containing the NWAY and all Data Marts
   using SAS tools.

   •Generate all SAS programs to query tables and to format
   outputs as html page and as csv file.

   •Generate all JavaScripts to manage the user interface.

SAS programs and JavaScripts are generated automatically by
SAS/DAB

                                                              42
Costs and maintenance
mSQL (free of charge for certain      3 Weeks work
organisations. Otherwise US $ 250)

PHP (Completely free of charge)       2 People


Hardware: Workstation IBM/
AIX RS/6000 43P 9GB-HD


                Each Database changes once a year
                  Every year we create a new DB
             Time needed to charge new data: 5 minutes

                                                         43
Costs and maintenance

SAS                                3 Months work
SAS/MDDB
SAS/IntrNet                        10 People
SAS/DAB (free of charge)
                                   1 SAS adviser

Hardware: Server
IBM AIX - 40GB HD




                    Databases don’t change

                                                   44
Summary
  We have seen so far ...



What a Data Warehouse is
The Data structures features
The User interface features
The Developing environment




                               45
46
Resident population on 1st January 2001
         Age=18
         Region = Lazio


                   Single   Married Divorced         Total
   Province        Males     Males   Males     …     Males   …

Viterbo            1574       ...      ...     ...   1575    ...
Rieti               830       ...      ...     ...    830    ...
Rome              19510       ...      ...     ...   19511   ...
Latina            3314        ...      ...     ...   3317    ...
Frosinone          3297        ...     ...     ...   3304    ...




                                                                   47
Resident population on 1st January 2001
         Age=All
         Region = Lazio


                    Single   Married Divorced          Total
   Province         Males     Males   Males     …      Males    …

Viterbo             59131     79068     ...     ...   143470    ...
Rieti               31060    40010      ...     ...    73819    ...
Rome               829036 948885        ...     ...   1843238   ...
Latina             112325    133470     ...     ...   252280    ...
Frosinone          104665 129906        ...     ...   242108    ...




                                                                      48
Resident population on 1st January 2001
              Age=All
              Region = Lazio


                 Single   Married Divorced          Total     Total
   Province                                  …               Males     F/M
                 Males     Males   Males            Males
                                                             Density
Viterbo          59131     79068     ...     ...   143470     79,0     ...
Rieti            31060     40010     ...     ...    73819     52,6     ...
Rome            829036 948885        ...     ...   1843238   668,7     ...
Latina          112325    133470     ...     ...   252280    217,6     ...
Frosinone       104665 129906        ...     ...   242108    147,3     ...




                                                                             49
50
51
52
GIS
                     Geographic Information System

Essentially, a GIS is a computer-assisted information management
system of geographically referenced data.
It contains two closely integrated databases:
•The spatial database contains information in the form of digital co-
ordinates. These can be points, lines, or polygons.
•The attribute database contains information about the
characteristics or qualities of the spatial features (i.e. demographic
information).


GIS is sometimes seen as a set of tools for analysing spatial data.
                                                                 53
Questions


Is it possible to build up a Web based GIS System?


Is it possible to combine a Web warehouse system with a
GIS component?




                                                          54
55
Web System Architecture

                                Client
     HTTP                                                 HTTP




Linux RedHat
Apache HTTP Server
PHP4
Postgesql
PostGIS
Mapserver
Mapscript


               Data Warehouse            Spatial and Statistical Data




                                                                        56
Conclusions

•A data warehouse is a central repository for   •The Geomarketing is to use the Geography
all or significant parts of the data that an    to make efficient business decisions.
enterprise's various business systems collect
•A data warehouse is a collection of data       •The Geomarketing answers to crucial
designed to support management decision         questions concerning marketing, company
making                                          sales and other fields.
•A data warehouse is a computer system          •The Geomarketing is a complete database of
designed to give business decision makers       commercial and marketing information built
instant access to information by copying data   around a geographical system
from existing systems and storing it for use
by executives.
•A data warehouse is a copy of transaction
data specifically structured for querying and
reporting




                                                                                              57
Thank you for your attention

My E-mail: patruno@istat.it

My address: Vincenzo Patruno
            ISTAT - DCIT
            - Central Direction of Information Technology
            - Security and Web Technologies
            Via C. Balbo, 16 00184 Rome - Italy




                                                            58

Contenu connexe

Tendances

algebraic&transdential equations
algebraic&transdential equationsalgebraic&transdential equations
algebraic&transdential equations8laddu8
 
1608 probability and statistics in engineering
1608 probability and statistics in engineering1608 probability and statistics in engineering
1608 probability and statistics in engineeringDr Fereidoun Dejahang
 
IMAGE SEGMENTATION TECHNIQUES
IMAGE SEGMENTATION TECHNIQUESIMAGE SEGMENTATION TECHNIQUES
IMAGE SEGMENTATION TECHNIQUESVicky Kumar
 
Longest common subsequences in Algorithm Analysis
Longest common subsequences in Algorithm AnalysisLongest common subsequences in Algorithm Analysis
Longest common subsequences in Algorithm AnalysisRajendran
 
Discreate time system and z transform
Discreate time system and z transformDiscreate time system and z transform
Discreate time system and z transformVIKAS KUMAR MANJHI
 
6 spatial filtering p2
6 spatial filtering p26 spatial filtering p2
6 spatial filtering p2Gichelle Amon
 
Fourier transforms of discrete signals (DSP) 5
Fourier transforms of discrete signals (DSP) 5Fourier transforms of discrete signals (DSP) 5
Fourier transforms of discrete signals (DSP) 5HIMANSHU DIWAKAR
 
Image degradation and noise by Md.Naseem Ashraf
Image degradation and noise by Md.Naseem AshrafImage degradation and noise by Md.Naseem Ashraf
Image degradation and noise by Md.Naseem AshrafMD Naseem Ashraf
 
Erosion and dilation
Erosion and dilationErosion and dilation
Erosion and dilationAkhil .B
 
Image processing SaltPepper Noise
Image processing SaltPepper NoiseImage processing SaltPepper Noise
Image processing SaltPepper NoiseAnkush Srivastava
 
Fixed point iteration
Fixed point iterationFixed point iteration
Fixed point iterationIsaac Yowetu
 
I/O devices - Computer graphics
I/O devices -  Computer graphicsI/O devices -  Computer graphics
I/O devices - Computer graphicsAmritha Davis
 

Tendances (20)

algebraic&transdential equations
algebraic&transdential equationsalgebraic&transdential equations
algebraic&transdential equations
 
Hasse diagram
Hasse diagramHasse diagram
Hasse diagram
 
Z transform
Z transformZ transform
Z transform
 
Information theory
Information theoryInformation theory
Information theory
 
1608 probability and statistics in engineering
1608 probability and statistics in engineering1608 probability and statistics in engineering
1608 probability and statistics in engineering
 
Secant Method
Secant MethodSecant Method
Secant Method
 
Color image processing
Color image processingColor image processing
Color image processing
 
Cs manual 2021
Cs  manual 2021Cs  manual 2021
Cs manual 2021
 
IMAGE SEGMENTATION TECHNIQUES
IMAGE SEGMENTATION TECHNIQUESIMAGE SEGMENTATION TECHNIQUES
IMAGE SEGMENTATION TECHNIQUES
 
Chapter 5
Chapter 5Chapter 5
Chapter 5
 
Longest common subsequences in Algorithm Analysis
Longest common subsequences in Algorithm AnalysisLongest common subsequences in Algorithm Analysis
Longest common subsequences in Algorithm Analysis
 
Discreate time system and z transform
Discreate time system and z transformDiscreate time system and z transform
Discreate time system and z transform
 
6 spatial filtering p2
6 spatial filtering p26 spatial filtering p2
6 spatial filtering p2
 
Fourier transforms of discrete signals (DSP) 5
Fourier transforms of discrete signals (DSP) 5Fourier transforms of discrete signals (DSP) 5
Fourier transforms of discrete signals (DSP) 5
 
Image degradation and noise by Md.Naseem Ashraf
Image degradation and noise by Md.Naseem AshrafImage degradation and noise by Md.Naseem Ashraf
Image degradation and noise by Md.Naseem Ashraf
 
Erosion and dilation
Erosion and dilationErosion and dilation
Erosion and dilation
 
Image processing SaltPepper Noise
Image processing SaltPepper NoiseImage processing SaltPepper Noise
Image processing SaltPepper Noise
 
Image transforms
Image transformsImage transforms
Image transforms
 
Fixed point iteration
Fixed point iterationFixed point iteration
Fixed point iteration
 
I/O devices - Computer graphics
I/O devices -  Computer graphicsI/O devices -  Computer graphics
I/O devices - Computer graphics
 

Similaire à Data Warehouse Techniques for Census and Demographic Sites

Multi dimensional model vs (1)
Multi dimensional model vs (1)Multi dimensional model vs (1)
Multi dimensional model vs (1)JamesDempsey1
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationSunderland City Council
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Denodo
 
3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your PortfolioDenodo
 
Dbms and it infrastructure
Dbms and  it infrastructureDbms and  it infrastructure
Dbms and it infrastructureprojectandppt
 
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationMyth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationDenodo
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsNeo4j
 
Unit 3 3 architectural design
Unit 3 3 architectural designUnit 3 3 architectural design
Unit 3 3 architectural designHiren Selani
 
Data visualization in a Nutshell
Data visualization in a NutshellData visualization in a Nutshell
Data visualization in a NutshellWingChan46
 
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...Jochem van Grondelle
 
Big data visualization
Big data visualizationBig data visualization
Big data visualizationAnurag Gupta
 
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)Stefan Popowycz
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big datawebwinkelvakdag
 
Webinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSync
Webinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSyncWebinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSync
Webinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSyncAPPSeCONNECT
 
Health Plan Survey Paper
Health Plan Survey PaperHealth Plan Survey Paper
Health Plan Survey PaperLisa Olive
 

Similaire à Data Warehouse Techniques for Census and Demographic Sites (20)

Multi dimensional model vs (1)
Multi dimensional model vs (1)Multi dimensional model vs (1)
Multi dimensional model vs (1)
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 
3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio
 
Dbms and it infrastructure
Dbms and  it infrastructureDbms and  it infrastructure
Dbms and it infrastructure
 
Data visualization
Data visualizationData visualization
Data visualization
 
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationMyth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 Standards
 
Unit 3 3 architectural design
Unit 3 3 architectural designUnit 3 3 architectural design
Unit 3 3 architectural design
 
Data visualization in a Nutshell
Data visualization in a NutshellData visualization in a Nutshell
Data visualization in a Nutshell
 
PANKAJ SINGH-061.pptx
PANKAJ SINGH-061.pptxPANKAJ SINGH-061.pptx
PANKAJ SINGH-061.pptx
 
Ch03
Ch03Ch03
Ch03
 
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
 
Big data visualization
Big data visualizationBig data visualization
Big data visualization
 
semana1.pptx
semana1.pptxsemana1.pptx
semana1.pptx
 
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
 
Webinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSync
Webinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSyncWebinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSync
Webinar: Successful Data Migration to Microsoft Dynamics 365 CRM | InSync
 
Health Plan Survey Paper
Health Plan Survey PaperHealth Plan Survey Paper
Health Plan Survey Paper
 

Plus de Vincenzo Patruno

AUMENTARE IL VALORE DEI DATI DELLA STATISTICA PUBBLICA
AUMENTARE IL VALORE DEI DATI DELLA STATISTICA PUBBLICAAUMENTARE IL VALORE DEI DATI DELLA STATISTICA PUBBLICA
AUMENTARE IL VALORE DEI DATI DELLA STATISTICA PUBBLICAVincenzo Patruno
 
Dati pubblici per capire la pandemia
Dati pubblici per capire  la pandemiaDati pubblici per capire  la pandemia
Dati pubblici per capire la pandemiaVincenzo Patruno
 
I dati per capire le emergenze
I dati per capire le emergenzeI dati per capire le emergenze
I dati per capire le emergenzeVincenzo Patruno
 
L'importanza degli Open Data per il monitoraggio della spesa pubblica
L'importanza degli Open Data per il monitoraggio della spesa pubblicaL'importanza degli Open Data per il monitoraggio della spesa pubblica
L'importanza degli Open Data per il monitoraggio della spesa pubblicaVincenzo Patruno
 
La statistica ufficiale e i trasporti marittimi nell'era dei Big Data
La statistica ufficiale e i trasporti marittimi nell'era dei Big DataLa statistica ufficiale e i trasporti marittimi nell'era dei Big Data
La statistica ufficiale e i trasporti marittimi nell'era dei Big DataVincenzo Patruno
 
Aumentare le potenzialità degli Open Data tra spazio e tempo
Aumentare le potenzialità degli Open Data tra spazio e tempoAumentare le potenzialità degli Open Data tra spazio e tempo
Aumentare le potenzialità degli Open Data tra spazio e tempoVincenzo Patruno
 
Hacking civico e Smart Citizen. Chi abita la Smart City?
Hacking civico e Smart Citizen. Chi abita la Smart City?Hacking civico e Smart Citizen. Chi abita la Smart City?
Hacking civico e Smart Citizen. Chi abita la Smart City?Vincenzo Patruno
 
Open Data: come trattarli e visualizzarli quando diventano Big
Open Data: come trattarli e visualizzarli quando diventano BigOpen Data: come trattarli e visualizzarli quando diventano Big
Open Data: come trattarli e visualizzarli quando diventano BigVincenzo Patruno
 
Riusare i dati del turismo per generare valore
Riusare i dati del turismo per generare valoreRiusare i dati del turismo per generare valore
Riusare i dati del turismo per generare valoreVincenzo Patruno
 
Il valore dei dati, le politiche e le strategie di gestione degli stessi e le...
Il valore dei dati, le politiche e le strategie di gestione degli stessi e le...Il valore dei dati, le politiche e le strategie di gestione degli stessi e le...
Il valore dei dati, le politiche e le strategie di gestione degli stessi e le...Vincenzo Patruno
 
Open Data – i benefici per i cittadini, le imprese e la PA
Open Data – i benefici per i cittadini, le imprese e la PAOpen Data – i benefici per i cittadini, le imprese e la PA
Open Data – i benefici per i cittadini, le imprese e la PAVincenzo Patruno
 
Big Data e Open Data per monitorare la città
Big Data e Open Data per monitorare la cittàBig Data e Open Data per monitorare la città
Big Data e Open Data per monitorare la cittàVincenzo Patruno
 
L’innovazione dei dati, dei big data e degli open data
L’innovazione dei dati, dei big data e degli open dataL’innovazione dei dati, dei big data e degli open data
L’innovazione dei dati, dei big data e degli open dataVincenzo Patruno
 
Dati geografici e indicatori territoriali: Il ruolo delle comunità
Dati geografici e indicatori territoriali: Il ruolo delle comunitàDati geografici e indicatori territoriali: Il ruolo delle comunità
Dati geografici e indicatori territoriali: Il ruolo delle comunitàVincenzo Patruno
 
Connettere le applicazioni ai dati. Cosa sono le API, come si utilizzano e p...
Connettere le applicazioni ai dati.  Cosa sono le API, come si utilizzano e p...Connettere le applicazioni ai dati.  Cosa sono le API, come si utilizzano e p...
Connettere le applicazioni ai dati. Cosa sono le API, come si utilizzano e p...Vincenzo Patruno
 
Il valore degli #opendata. Esperienze a confronto
Il valore degli #opendata. Esperienze a confrontoIl valore degli #opendata. Esperienze a confronto
Il valore degli #opendata. Esperienze a confrontoVincenzo Patruno
 
Open Data e le opportunità per il territorio
Open Data e le opportunità per il territorioOpen Data e le opportunità per il territorio
Open Data e le opportunità per il territorioVincenzo Patruno
 
ISTAT: la strategia Open Data e il framework SDMX per lo scambio di dati stat...
ISTAT: la strategia Open Data e il framework SDMX per lo scambio di dati stat...ISTAT: la strategia Open Data e il framework SDMX per lo scambio di dati stat...
ISTAT: la strategia Open Data e il framework SDMX per lo scambio di dati stat...Vincenzo Patruno
 

Plus de Vincenzo Patruno (20)

Perché aprire i dati
Perché aprire i datiPerché aprire i dati
Perché aprire i dati
 
AUMENTARE IL VALORE DEI DATI DELLA STATISTICA PUBBLICA
AUMENTARE IL VALORE DEI DATI DELLA STATISTICA PUBBLICAAUMENTARE IL VALORE DEI DATI DELLA STATISTICA PUBBLICA
AUMENTARE IL VALORE DEI DATI DELLA STATISTICA PUBBLICA
 
Dati pubblici per capire la pandemia
Dati pubblici per capire  la pandemiaDati pubblici per capire  la pandemia
Dati pubblici per capire la pandemia
 
I dati per capire le emergenze
I dati per capire le emergenzeI dati per capire le emergenze
I dati per capire le emergenze
 
L'importanza degli Open Data per il monitoraggio della spesa pubblica
L'importanza degli Open Data per il monitoraggio della spesa pubblicaL'importanza degli Open Data per il monitoraggio della spesa pubblica
L'importanza degli Open Data per il monitoraggio della spesa pubblica
 
La statistica ufficiale e i trasporti marittimi nell'era dei Big Data
La statistica ufficiale e i trasporti marittimi nell'era dei Big DataLa statistica ufficiale e i trasporti marittimi nell'era dei Big Data
La statistica ufficiale e i trasporti marittimi nell'era dei Big Data
 
Aumentare le potenzialità degli Open Data tra spazio e tempo
Aumentare le potenzialità degli Open Data tra spazio e tempoAumentare le potenzialità degli Open Data tra spazio e tempo
Aumentare le potenzialità degli Open Data tra spazio e tempo
 
Hacking civico e Smart Citizen. Chi abita la Smart City?
Hacking civico e Smart Citizen. Chi abita la Smart City?Hacking civico e Smart Citizen. Chi abita la Smart City?
Hacking civico e Smart Citizen. Chi abita la Smart City?
 
Open Data: come trattarli e visualizzarli quando diventano Big
Open Data: come trattarli e visualizzarli quando diventano BigOpen Data: come trattarli e visualizzarli quando diventano Big
Open Data: come trattarli e visualizzarli quando diventano Big
 
Il valore dei dati
Il valore dei datiIl valore dei dati
Il valore dei dati
 
Riusare i dati del turismo per generare valore
Riusare i dati del turismo per generare valoreRiusare i dati del turismo per generare valore
Riusare i dati del turismo per generare valore
 
Il valore dei dati, le politiche e le strategie di gestione degli stessi e le...
Il valore dei dati, le politiche e le strategie di gestione degli stessi e le...Il valore dei dati, le politiche e le strategie di gestione degli stessi e le...
Il valore dei dati, le politiche e le strategie di gestione degli stessi e le...
 
Open Data – i benefici per i cittadini, le imprese e la PA
Open Data – i benefici per i cittadini, le imprese e la PAOpen Data – i benefici per i cittadini, le imprese e la PA
Open Data – i benefici per i cittadini, le imprese e la PA
 
Big Data e Open Data per monitorare la città
Big Data e Open Data per monitorare la cittàBig Data e Open Data per monitorare la città
Big Data e Open Data per monitorare la città
 
L’innovazione dei dati, dei big data e degli open data
L’innovazione dei dati, dei big data e degli open dataL’innovazione dei dati, dei big data e degli open data
L’innovazione dei dati, dei big data e degli open data
 
Dati geografici e indicatori territoriali: Il ruolo delle comunità
Dati geografici e indicatori territoriali: Il ruolo delle comunitàDati geografici e indicatori territoriali: Il ruolo delle comunità
Dati geografici e indicatori territoriali: Il ruolo delle comunità
 
Connettere le applicazioni ai dati. Cosa sono le API, come si utilizzano e p...
Connettere le applicazioni ai dati.  Cosa sono le API, come si utilizzano e p...Connettere le applicazioni ai dati.  Cosa sono le API, come si utilizzano e p...
Connettere le applicazioni ai dati. Cosa sono le API, come si utilizzano e p...
 
Il valore degli #opendata. Esperienze a confronto
Il valore degli #opendata. Esperienze a confrontoIl valore degli #opendata. Esperienze a confronto
Il valore degli #opendata. Esperienze a confronto
 
Open Data e le opportunità per il territorio
Open Data e le opportunità per il territorioOpen Data e le opportunità per il territorio
Open Data e le opportunità per il territorio
 
ISTAT: la strategia Open Data e il framework SDMX per lo scambio di dati stat...
ISTAT: la strategia Open Data e il framework SDMX per lo scambio di dati stat...ISTAT: la strategia Open Data e il framework SDMX per lo scambio di dati stat...
ISTAT: la strategia Open Data e il framework SDMX per lo scambio di dati stat...
 

Dernier

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Dernier (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Data Warehouse Techniques for Census and Demographic Sites

  • 1. Data Warehouse techniques on Intermediate Census and Demographic Statistics Web sites “View data from different points of view” Vincenzo Patruno - ISTAT TES course: Techniques for Data Dissemination Madrid 9th April 2003 Http://cens.istat.it Http://demo.istat.it M. C. Escher Relativity 1
  • 2. Contents •What a Data Warehouse is •The User Interface: How users make queries •Data modelling: Two ways to organise data •Software environment •Costs and maintenance 2
  • 3. Data Warehouse •A data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect •A data warehouse is a collection of data designed to support management decision making •A data warehouse is a computer system designed to give business decision makers instant access to information by copying data from existing systems and storing it for use by executives. 3
  • 4. Data Warehouse A data warehouse is a copy of transaction data specifically structured for querying and reporting (Ralph Kimball's definition on page 310 of The Data Warehouse Toolkit - John Wiley & Sons 1996) Queries and reports generated from data stored in a data warehouse may or may not be used for analysis. 4
  • 5. User Interface Is the door to access data The goal of demo.istat.it and cens.istat.it has been to obtain a user interface to permit: •Easy data access •Handy parameter selection •Fast Data Base queries and suitable to easily show data from different points of view using Internet technologies 5
  • 6. User Interface On the Web that means: •building the User interface as an HTML page •using available technologies (DHTML) •using HTTP protocol to send queries and to obtain results 6
  • 7. User Interface Building an User interface on the Web is completely different to traditional programming because of: •DHTML limits •Cross-Browser problems •The nature of HTTP 7
  • 8. User Interface Because of this, in general: •Web programmers tend to develop systems in depth •It means users have to click many times to obtain results Our goal has been to obtain a user interface with a behaviour similar to traditional applications 8
  • 9. User Interface 2 1 3 The adopted solution has been placing into the same window the frames 1 and 2 to select parameters and variables and using the frame 3 to show results according to the reporting policy adopted 9
  • 10. User Interface In this way, users can always use the same window •to perform queries •to see results •to save data Advantages •Easy and fast access to data Disadvantages •It doesn’t look so good 10
  • 11. User Interface How they work 11
  • 12. Data Model During the Interface analysis we talked about: •Analysis Variables •Dimensions •Data shown from different points of view Both systems are built according to the modelling techniques used to build up Data Warehouses 12
  • 13. Data Model The user interface is built around a data structure suitable to be queried from different points of view. but To do this, it is very important to build first of all a good conceptual data model “The conceptual data model isn’t an exercise in intellectual gymnastics for engineers but the starting point to build good software systems” 13
  • 14. Dimensional Modelling •DM is a favourite modelling technique in data warehousing •In DM, a model of tables and relations is built with the purpose of maximising decision support and query performance in relational databases •It’s an excellent technique to build data models to optimise OLAP performances 14
  • 15. Dimensional Modelling In contrast, conventional E-R models are constituted to •Removing the redundancy in data models •Facilitating the retrieval of individual records having certain critical identifiers •Optimising OLTP performance 15
  • 16. Dimensional Modelling OLTP: On line transaction processing Is a class of programs that facilitates and manages transaction-oriented applications (typically for data entry and retrieval transactions) OLAP: On Line Analytical Processing Enables a user to easily and selectively extract and view data from different points of view 16
  • 17. Dimensional Modelling OLTP based system=“gets the data in” OLAP based system =“gets the data out” 17
  • 18. Dimensional Modelling •The data warehouse exists to answer questions people have about the “business” •Dimensional modelling techniques ensure that the DW design reflects the way users think about the “business” and that the DW can be used to answer their questions. •A dimensional model (DM) captures the measurement of importance to a “business” and the parameters by which the measurements are broken out. •Is an excellent tool for identifying and classifying the important business components in a subject area. 18
  • 19. Dimensional Modelling How to build a Dimensional Model •DM is built around a “business subject” (in our case “statistical subject”) •It means we have firstly to identify our subject to be modelled and all the measures that describe our subject. •At the same time we have to identify the parameters by which a measurement can be viewed. 19
  • 20. Dimensional Modelling The measurements are referred to as FACTS The parameters by which a fact can be viewed are referred to as DIMENSIONS The level of detail of measures in the fact table is referred to as GRAIN N.B. It is crucial that every row in the “fact” table be recorded at exactly the same level of detail 20
  • 21. Dimensional Modelling Class of Employees EmployeesClass_PK EmployeesClass_Name Enterprise Facts Employees Profits Loss EmployeesClass_FK Geography_FK Geography Geography_PK Municipality Province Region Geographical Area 21
  • 22. Dimensional Modelling … or, speaking about demo.istat.it Resident Population Facts Male Geography Female Age Geography_PK Male Married Age_PK Municipality Female Married Age_desc Province …… Region Age_FK Geographical Area Geography_FK 22
  • 23. Dimensional Modelling •A Dimensional Model does not change much when implemented in a relational database. (The DM is referred to as Star Schema) •Each box of dimension attributes becomes a table in the database, referred to as a dimension table. •The fact table becomes a very large table containing a very large number of rows. It contains the measures plus foreign keys that relate each measurement to the appropriate rows in each of the dimension tables. 23
  • 24. Dimensional Modelling This is what we have done with We have implemented the fact table in a relational database 24
  • 25. Dimensional Modelling We have also created aggregate tables by “Geography” dimension Resident Population Facts AGG_BY_PROVINCE Male Province Female Age Province_PK Male Married Age_PK Province Female Married Age_desc …… Age_FK Province_FK 25
  • 26. Dimensional Modelling Resident Population Facts AGG_BY_ITALY Resident Population Facts Resident Population Facts AGG_BY_GEOGRAPHI AGG_BY_REGION CAL_AREA Resident Population Facts AGG_BY_PROVINCE BASE Resident Population Facts 26
  • 27. Dimensional Modelling The aggregate tables contain the same facts as the base fact table, but they are recorded at a different GRAIN It’s an excellent way to manage hierarchies 27
  • 28. Dimensional Modelling Dimensional Models are not always implemented in relational data bases. Several vendors offer multidimensional databases (MDDB) which store information in a different format often referred to as cube 28
  • 29. Dimensional Modelling This is what we have done with We have implemented the fact table in a multidimensional database 29
  • 30. MDDB A multidimensional database (MDDB) is a specialised storage facility that allows data to be stored in a matrix-like format •It contains all possible values resulting from crossing all dimensions and all measures. •The whole of these values is referred to as Nway Cube 30
  • 31. MDDB Besides, the process of building an MDDB, it summarises also the raw data according to hierarchical dimensions Summarised data is stored in data structures referred to as subcubes An MDDB stores its data as an Nway Cube and zero or more subcubes 31
  • 32. MDDB Subcubes are built to enhance reporting speed If a subcube does not exist for a particular aggregate query, that is, if no subcube defines the exact crossing required to answer the query, the aggregate data will be derived from the smallest subcube that can provide the data. If no subcube can provide the data, it is derived from the Nway cube 32
  • 33. MDDB If you know of common queries that can be answered using a smaller set of crossings, you could create a subcube that specifies the exact crossing required. Subcubes are often referred to as Data Marts 33
  • 34. Summary We have seen two ways to implement the fact table. •As a Relational Database (for demo.istat.it) •As a Multidimensional Database (for cens.istat.it) Now we are going to see the tool to implement the DB and how the programs to connect the DB with the User Interface have been built 34
  • 35. Developing environment To implement demo.istat.it we’ve used: •mSQL as Relational Database (http://www.hughes.com.au/) •PHP as programming language (http://www.php.net/) PHP is a server-side, cross-platform, HTML embedded scripting language. 35
  • 36. HTML Welcome.html HTML pages <HTML> <BODY bgcolor=“#FFFFFF”> <H1>Welcome!<H1> </BODY> </HTML> 1 2 1. The Client required Welcome.html 2. The Web Server send Welcome.html to the Client Welcome.html is interpreted by the Browser and displayed on the screen 36
  • 37. PHP Welcome.php 2 PHP <?php Php pages interpreter print “<HTML> <BODY bgcolor=“#FFFFFF”> <H1>Welcome!<H1> </BODY> </HTML>”; 1 RDBMS ?> 3 1. The Client required Welcome.php 2. The PHP interpreter runs Welcome.php 3. Results are sent to the client 37
  • 38. SAS In contrast, to implement cens.istat.it we’ve used: •SAS/MDDB (to build Multidimensional Databases) •SAS/IntrNet (to run SAS programs on the Web) •DAB (is a tool to generate automatically all programs and all javascript to query the MDDB via Web) 38
  • 39. SAS 2 1. The client Cgi-bin Broker required to run a HTML pages SAS program (i.e. sending a form) 2. The SAS broker 3 Sas programs (cgi-bin program) calls the SAS 1 4 program stored in an independent area 3. SAS program runs and accesses data MDDB 4. Results are sent to the client 39
  • 40. Costs and maintenance How much it costs to build and to maintain both systems in terms of •Money •People •Time 40
  • 41. Costs and maintenance To build demo.istat.it we needed to build: •The relational database containing the Fact Table •The Aggregate tables •The Dimension tables •All programs to query tables and to format outputs as html page and as csv file. •All JavaScripts to manage the user interface 41
  • 42. Costs and maintenance To build cens.istat.it we needed to: •Build the MDDB containing the NWAY and all Data Marts using SAS tools. •Generate all SAS programs to query tables and to format outputs as html page and as csv file. •Generate all JavaScripts to manage the user interface. SAS programs and JavaScripts are generated automatically by SAS/DAB 42
  • 43. Costs and maintenance mSQL (free of charge for certain 3 Weeks work organisations. Otherwise US $ 250) PHP (Completely free of charge) 2 People Hardware: Workstation IBM/ AIX RS/6000 43P 9GB-HD Each Database changes once a year Every year we create a new DB Time needed to charge new data: 5 minutes 43
  • 44. Costs and maintenance SAS 3 Months work SAS/MDDB SAS/IntrNet 10 People SAS/DAB (free of charge) 1 SAS adviser Hardware: Server IBM AIX - 40GB HD Databases don’t change 44
  • 45. Summary We have seen so far ... What a Data Warehouse is The Data structures features The User interface features The Developing environment 45
  • 46. 46
  • 47. Resident population on 1st January 2001 Age=18 Region = Lazio Single Married Divorced Total Province Males Males Males … Males … Viterbo 1574 ... ... ... 1575 ... Rieti 830 ... ... ... 830 ... Rome 19510 ... ... ... 19511 ... Latina 3314 ... ... ... 3317 ... Frosinone 3297 ... ... ... 3304 ... 47
  • 48. Resident population on 1st January 2001 Age=All Region = Lazio Single Married Divorced Total Province Males Males Males … Males … Viterbo 59131 79068 ... ... 143470 ... Rieti 31060 40010 ... ... 73819 ... Rome 829036 948885 ... ... 1843238 ... Latina 112325 133470 ... ... 252280 ... Frosinone 104665 129906 ... ... 242108 ... 48
  • 49. Resident population on 1st January 2001 Age=All Region = Lazio Single Married Divorced Total Total Province … Males F/M Males Males Males Males Density Viterbo 59131 79068 ... ... 143470 79,0 ... Rieti 31060 40010 ... ... 73819 52,6 ... Rome 829036 948885 ... ... 1843238 668,7 ... Latina 112325 133470 ... ... 252280 217,6 ... Frosinone 104665 129906 ... ... 242108 147,3 ... 49
  • 50. 50
  • 51. 51
  • 52. 52
  • 53. GIS Geographic Information System Essentially, a GIS is a computer-assisted information management system of geographically referenced data. It contains two closely integrated databases: •The spatial database contains information in the form of digital co- ordinates. These can be points, lines, or polygons. •The attribute database contains information about the characteristics or qualities of the spatial features (i.e. demographic information). GIS is sometimes seen as a set of tools for analysing spatial data. 53
  • 54. Questions Is it possible to build up a Web based GIS System? Is it possible to combine a Web warehouse system with a GIS component? 54
  • 55. 55
  • 56. Web System Architecture Client HTTP HTTP Linux RedHat Apache HTTP Server PHP4 Postgesql PostGIS Mapserver Mapscript Data Warehouse Spatial and Statistical Data 56
  • 57. Conclusions •A data warehouse is a central repository for •The Geomarketing is to use the Geography all or significant parts of the data that an to make efficient business decisions. enterprise's various business systems collect •A data warehouse is a collection of data •The Geomarketing answers to crucial designed to support management decision questions concerning marketing, company making sales and other fields. •A data warehouse is a computer system •The Geomarketing is a complete database of designed to give business decision makers commercial and marketing information built instant access to information by copying data around a geographical system from existing systems and storing it for use by executives. •A data warehouse is a copy of transaction data specifically structured for querying and reporting 57
  • 58. Thank you for your attention My E-mail: patruno@istat.it My address: Vincenzo Patruno ISTAT - DCIT - Central Direction of Information Technology - Security and Web Technologies Via C. Balbo, 16 00184 Rome - Italy 58