A Web-based GIS system is possible to build that combines spatial and attribute data. Such a system would include:
- A client side (web browser) that communicates with a web server using HTTP
- A web server running on Linux with Apache, PHP and spatial databases like Postgresql and PostGIS
- Mapserver and Mapscript to serve maps and spatial queries
- A data warehouse containing statistical and demographic data
- Spatial and statistical data stored and linked to allow integrated analysis across dimensions like geography.
Such an architecture allows building a web-based system that integrates the capabilities of a data warehouse with GIS functionality for visualization and spatial analysis of demographic and statistical data.
Data Warehouse Techniques for Census and Demographic Sites
1. Data Warehouse techniques on Intermediate Census
and Demographic Statistics Web sites
“View data from different points of view”
Vincenzo Patruno - ISTAT
TES course:
Techniques for Data
Dissemination
Madrid 9th April 2003
Http://cens.istat.it
Http://demo.istat.it
M. C. Escher
Relativity
1
2. Contents
•What a Data Warehouse is
•The User Interface: How users make queries
•Data modelling: Two ways to organise data
•Software environment
•Costs and maintenance
2
3. Data Warehouse
•A data warehouse is a central repository for all or significant
parts of the data that an enterprise's various business systems
collect
•A data warehouse is a collection of data designed to support
management decision making
•A data warehouse is a computer system designed to give
business decision makers instant access to information by
copying data from existing systems and storing it for use by
executives.
3
4. Data Warehouse
A data warehouse is a copy of transaction data specifically
structured for querying and reporting
(Ralph Kimball's definition on page 310 of The Data Warehouse Toolkit - John Wiley & Sons 1996)
Queries and reports generated from data stored in a data warehouse may or
may not be used for analysis.
4
5. User Interface
Is the door to access data
The goal of demo.istat.it and cens.istat.it has been to obtain a
user interface to permit:
•Easy data access
•Handy parameter selection
•Fast Data Base queries
and suitable to easily show data from different points of view
using Internet technologies
5
6. User Interface
On the Web that means:
•building the User interface as an HTML page
•using available technologies (DHTML)
•using HTTP protocol to send queries and to obtain results
6
7. User Interface
Building an User interface on the Web is completely different to
traditional programming because of:
•DHTML limits
•Cross-Browser problems
•The nature of HTTP
7
8. User Interface
Because of this, in general:
•Web programmers tend to develop systems in depth
•It means users have to click many times to obtain results
Our goal has been to obtain a user interface with
a behaviour similar to traditional applications
8
9. User Interface
2
1 3
The adopted solution has been placing into the same window the
frames 1 and 2 to select parameters and variables and using the
frame 3 to show results according to the reporting policy adopted
9
10. User Interface
In this way, users can always use the same window
•to perform queries
•to see results
•to save data
Advantages
•Easy and fast access to data
Disadvantages
•It doesn’t look so good
10
12. Data Model
During the Interface analysis we talked about:
•Analysis Variables
•Dimensions
•Data shown from different points of view
Both systems are built according to the modelling techniques
used to build up Data Warehouses
12
13. Data Model
The user interface is built around a data structure suitable to be
queried from different points of view.
but
To do this, it is very important to build first of all a good
conceptual data model
“The conceptual data model isn’t an exercise in intellectual gymnastics for
engineers but the starting point to build good software systems”
13
14. Dimensional Modelling
•DM is a favourite modelling technique in data warehousing
•In DM, a model of tables and relations is built with the
purpose of maximising decision support and query
performance in relational databases
•It’s an excellent technique to build data models to optimise
OLAP performances
14
15. Dimensional Modelling
In contrast, conventional E-R models are constituted to
•Removing the redundancy in data models
•Facilitating the retrieval of individual records having certain
critical identifiers
•Optimising OLTP performance
15
16. Dimensional Modelling
OLTP: On line transaction processing
Is a class of programs that facilitates and manages
transaction-oriented applications (typically for data entry and
retrieval transactions)
OLAP: On Line Analytical Processing
Enables a user to easily and selectively extract and view data
from different points of view
16
18. Dimensional Modelling
•The data warehouse exists to answer questions people have about
the “business”
•Dimensional modelling techniques ensure that the DW design
reflects the way users think about the “business” and that the DW
can be used to answer their questions.
•A dimensional model (DM) captures the measurement of
importance to a “business” and the parameters by which the
measurements are broken out.
•Is an excellent tool for identifying and classifying the important
business components in a subject area.
18
19. Dimensional Modelling
How to build a Dimensional Model
•DM is built around a “business subject” (in our case “statistical
subject”)
•It means we have firstly to identify our subject to be modelled
and all the measures that describe our subject.
•At the same time we have to identify the parameters by which a
measurement can be viewed.
19
20. Dimensional Modelling
The measurements are referred to as FACTS
The parameters by which a fact can be viewed are referred to as
DIMENSIONS
The level of detail of measures in the fact table is referred to as
GRAIN
N.B.
It is crucial that every row in the “fact” table be recorded at
exactly the same level of detail
20
21. Dimensional Modelling
Class of Employees
EmployeesClass_PK
EmployeesClass_Name
Enterprise Facts
Employees
Profits
Loss
EmployeesClass_FK
Geography_FK Geography
Geography_PK
Municipality
Province
Region
Geographical Area
21
22. Dimensional Modelling
… or, speaking about demo.istat.it
Resident Population
Facts
Male Geography
Female
Age Geography_PK
Male Married
Age_PK Municipality
Female Married
Age_desc Province
……
Region
Age_FK
Geographical Area
Geography_FK
22
23. Dimensional Modelling
•A Dimensional Model does not change much when implemented
in a relational database. (The DM is referred to as Star Schema)
•Each box of dimension attributes becomes a table in the
database, referred to as a dimension table.
•The fact table becomes a very large table containing a very large
number of rows. It contains the measures plus foreign keys that
relate each measurement to the appropriate rows in each of the
dimension tables.
23
24. Dimensional Modelling
This is
what we have done with
We have implemented the fact table in a
relational database
24
25. Dimensional Modelling
We have also created aggregate tables by “Geography” dimension
Resident Population
Facts
AGG_BY_PROVINCE
Male Province
Female
Age Province_PK
Male Married
Age_PK Province
Female Married
Age_desc
……
Age_FK
Province_FK
25
26. Dimensional Modelling
Resident Population Facts
AGG_BY_ITALY
Resident Population Facts Resident Population Facts
AGG_BY_GEOGRAPHI AGG_BY_REGION
CAL_AREA
Resident Population Facts
AGG_BY_PROVINCE
BASE Resident
Population Facts
26
27. Dimensional Modelling
The aggregate tables contain the same facts as the base fact table,
but they are recorded at a different GRAIN
It’s an excellent way to manage hierarchies
27
28. Dimensional Modelling
Dimensional Models are not always implemented in relational
data bases.
Several vendors offer multidimensional databases (MDDB) which
store information in a different format often referred to as cube
28
29. Dimensional Modelling
This is
what we have done with
We have implemented the fact table in a
multidimensional database
29
30. MDDB
A multidimensional database (MDDB) is a specialised storage
facility that allows data to be stored in a matrix-like format
•It contains all possible values resulting from crossing all
dimensions and all measures.
•The whole of these values is referred to as Nway Cube
30
31. MDDB
Besides, the process of building an MDDB, it summarises also the
raw data according to hierarchical dimensions
Summarised data is stored in data structures referred to as
subcubes
An MDDB stores its data as an Nway Cube and zero or more
subcubes
31
32. MDDB
Subcubes are built to enhance reporting speed
If a subcube does not exist for a particular aggregate query, that
is, if no subcube defines the exact crossing required to answer the
query, the aggregate data will be derived from the smallest
subcube that can provide the data.
If no subcube can provide the data, it is derived from the Nway
cube
32
33. MDDB
If you know of common queries that can be answered using a
smaller set of crossings, you could create a subcube that specifies
the exact crossing required.
Subcubes are often referred to as Data Marts
33
34. Summary
We have seen two ways to implement the fact table.
•As a Relational Database (for demo.istat.it)
•As a Multidimensional Database (for cens.istat.it)
Now we are going to see the tool to implement the DB and how the
programs to connect the DB with the User Interface have been built
34
35. Developing environment
To implement demo.istat.it we’ve used:
•mSQL as Relational Database (http://www.hughes.com.au/)
•PHP as programming language (http://www.php.net/)
PHP is a server-side, cross-platform, HTML embedded scripting
language.
35
36. HTML
Welcome.html
HTML pages
<HTML>
<BODY bgcolor=“#FFFFFF”>
<H1>Welcome!<H1>
</BODY>
</HTML> 1 2
1. The Client required Welcome.html
2. The Web Server send Welcome.html to the Client
Welcome.html is interpreted by the Browser and
displayed on the screen
36
37. PHP
Welcome.php 2
PHP
<?php
Php pages interpreter
print “<HTML>
<BODY bgcolor=“#FFFFFF”>
<H1>Welcome!<H1>
</BODY>
</HTML>”;
1 RDBMS
?>
3
1. The Client required Welcome.php
2. The PHP interpreter runs
Welcome.php
3. Results are sent to the client
37
38. SAS
In contrast, to implement cens.istat.it we’ve used:
•SAS/MDDB (to build Multidimensional Databases)
•SAS/IntrNet (to run SAS programs on the Web)
•DAB (is a tool to generate automatically all programs and all
javascript to query the MDDB via Web)
38
39. SAS
2
1. The client Cgi-bin Broker
required to run a HTML pages
SAS program (i.e.
sending a form)
2. The SAS broker 3 Sas programs
(cgi-bin program)
calls the SAS
1 4
program stored in an
independent area
3. SAS program runs
and accesses data MDDB
4. Results are sent
to the client
39
40. Costs and maintenance
How much it costs to build and to maintain both systems in
terms of
•Money
•People
•Time
40
41. Costs and maintenance
To build demo.istat.it we needed to build:
•The relational database containing the Fact Table
•The Aggregate tables
•The Dimension tables
•All programs to query tables and to format outputs as html
page and as csv file.
•All JavaScripts to manage the user interface
41
42. Costs and maintenance
To build cens.istat.it we needed to:
•Build the MDDB containing the NWAY and all Data Marts
using SAS tools.
•Generate all SAS programs to query tables and to format
outputs as html page and as csv file.
•Generate all JavaScripts to manage the user interface.
SAS programs and JavaScripts are generated automatically by
SAS/DAB
42
43. Costs and maintenance
mSQL (free of charge for certain 3 Weeks work
organisations. Otherwise US $ 250)
PHP (Completely free of charge) 2 People
Hardware: Workstation IBM/
AIX RS/6000 43P 9GB-HD
Each Database changes once a year
Every year we create a new DB
Time needed to charge new data: 5 minutes
43
44. Costs and maintenance
SAS 3 Months work
SAS/MDDB
SAS/IntrNet 10 People
SAS/DAB (free of charge)
1 SAS adviser
Hardware: Server
IBM AIX - 40GB HD
Databases don’t change
44
45. Summary
We have seen so far ...
What a Data Warehouse is
The Data structures features
The User interface features
The Developing environment
45
53. GIS
Geographic Information System
Essentially, a GIS is a computer-assisted information management
system of geographically referenced data.
It contains two closely integrated databases:
•The spatial database contains information in the form of digital co-
ordinates. These can be points, lines, or polygons.
•The attribute database contains information about the
characteristics or qualities of the spatial features (i.e. demographic
information).
GIS is sometimes seen as a set of tools for analysing spatial data.
53
54. Questions
Is it possible to build up a Web based GIS System?
Is it possible to combine a Web warehouse system with a
GIS component?
54
56. Web System Architecture
Client
HTTP HTTP
Linux RedHat
Apache HTTP Server
PHP4
Postgesql
PostGIS
Mapserver
Mapscript
Data Warehouse Spatial and Statistical Data
56
57. Conclusions
•A data warehouse is a central repository for •The Geomarketing is to use the Geography
all or significant parts of the data that an to make efficient business decisions.
enterprise's various business systems collect
•A data warehouse is a collection of data •The Geomarketing answers to crucial
designed to support management decision questions concerning marketing, company
making sales and other fields.
•A data warehouse is a computer system •The Geomarketing is a complete database of
designed to give business decision makers commercial and marketing information built
instant access to information by copying data around a geographical system
from existing systems and storing it for use
by executives.
•A data warehouse is a copy of transaction
data specifically structured for querying and
reporting
57
58. Thank you for your attention
My E-mail: patruno@istat.it
My address: Vincenzo Patruno
ISTAT - DCIT
- Central Direction of Information Technology
- Security and Web Technologies
Via C. Balbo, 16 00184 Rome - Italy
58