3. Scenario
Unilever is a company with branches at UK,
India, America and Japan. The Sales Manager
wants quarterly sales report. Each branch has a
separate operational system.
4. Scenario 1 : Unilever company.
India
UK
Sales per item type per branch Sales
for first quarter. Manager
America
Japan
5. Solution : Unilever company.
Extract sales information from each database.
Store the information in a common repository at a
single site.
6. Solution : Unilever company.
India
Report
UK
Query & Sales
Data Analysis tools Manager
Warehouse
America
Japan
7. Scenario :
Hindustan Unilever is a small,new company.
President of the company wants his company should
grow. He needs information so that he can make
correct decisions.
8. Solution :
Improve the quality of data before loading it
into the warehouse.
Perform data cleaning and transformation
before loading the data.
Use query analysis tools to support adhoc
queries.
9. Solution
Expansio
n
sales
Data Query and Analysis President
Warehouse tool
time
Improvemen
t
11. Inmons’s definition :
A data warehouse is
-subject-oriented,
-integrated,
-time-variant,
-nonvolatile
collection of data in support of management’s
decision making process.
12. Subject-oriented
Data warehouse is organized around subjects such
as sales,product,customer.
It focuses on modeling and analysis of data for
decision makers.
Excludes data not useful in decision support
process.
13. Integration
Data Warehouse is constructed by integrating
multiple heterogeneous sources.
Data Preprocessing are applied to ensure
consistency.
RDBMS
Data
Legacy Warehouse
System
Flat File Data Processing
Data Transformation
14. Integration
In terms of data.
– encoding structures.
– Measurement of
attributes.
– physical attribute.
of data remarks
– naming conventions.
– Data type format
15. Time-variant
Provides information from historical perspective
e.g. past 5-10 years
Every key structure contains either implicitly or
explicitly an element of time
16. Nonvolatile
Data once recorded cannot be updated.
Data warehouse requires two operations in data
accessing
– Initial loading of data
– Access of data
load
access
18. Data Warehouse Architecture
Data Warehouse server
– almost always a relational DBMS,rarely flat
files
OLAP servers
– to support and operate on multi-dimensional
data structures
Clients
– Query and reporting tools
– Analysis tools
– Data mining tools
21. Star Schema
A star schema consists of at least one
fact table and a number of dimension
tables.
Star
Schema is highly recommended
schema for SSAS cubes.
22. Star Schema
Store Dimension Fact Table Time Dimension
Store Key Store Key Period Key
Store Name Product Key Year
City Period Key Quarter
State Units Month
Region Price
Product Key
Product Desc
Product
Dimension
Benefits: Easy to understand, easy to define hierarchies, reduces
no. of physical
joins.
23.
24. SnowFlake Schema
Variant of star schema model.
A single,large and central fact table and one
or more tables for each dimension.
Dimension tables are normalized i.e. split
dimension table data into additional tables
25. SnowFlake Schema
Store Dimension Fact Table Time Dimension
Store Key Period Key
Store Key
Product Key Year
Store Name
Period Key Quarter
City Key
Units Month
Price
City Dimension
City Key
Product Key
City
Product Desc
State
Region Product
Dimension
Drawbacks: Time consuming joins,report generation slow
26. Fact Constellation
Multiple fact tables share dimension tables.
This schema is viewed as collection of stars
hence called galaxy schema or fact
constellation.
Sophisticated application requires such
schema.
27. Fact Constellation
Sales Shipping
Fact Table Product
Dimension Fact Table
Store Key
Shipper Key
Product Key Product Key Store Key
Period Key Product Desc Product Key
Units
Period Key
Price
Units
Price
Store Dimension
Store Key
Store Name
City
State
Region
28. Fact Constellation
Sales Shipping
Fact Table Product
Dimension Fact Table
Store Key
Shipper Key
Product Key Product Key Store Key
Period Key Product Desc Product Key
Units
Period Key
Price
Units
Price
Store Dimension
Store Key
Store Name
City
State
Region
29. Building Data Warehouse
Data Selection
Data Preprocessing
– Fill missing values
– Remove inconsistency
Data Transformation & Integration
Data Loading
Data in warehouse is stored in form of fact tables
and dimension tables.
30. Case Study
Unilever is a new company which produces
soaps,paste and baverages products with
production unit located at NA.
There products are sold in North,North West and
Western region of India.
They have sales units at India, America , UK and
Japan.
The President of the company wants sales
information.
31. Sales Information
Report: The number of units sold.
113
Report: The number of units sold over time and date
January February March April
14 41 33 25
32. Sales Information
Report : The number of items sold for each product with
time
Jan Feb Mar Apr
Soaps 6 17
Paste 6 16 6 8
Time
bread 8 25 21
Product
33. Sales Information
Report: The number of items sold in each country for each
product with time
Jan Feb Mar Apr
India Soaps 3 10 City
Paste 3 16 6
bread 4 16 6
Time
UK soaps 3 7
paste 3 8 Product
bread 4 9 15
34. Sales Information
Report: The number of items sold and income in each region for
each product with time.
Jan Feb Mar Apr
Rs U Rs U Rs U Rs U
India Soaps 7.44 3 24.80 10
Paste 7.95 3 42.40 16 15.90 6
bread 7.32 4 29.98 16 10.98 6
UK Soaps 7.44 3 17.36 7
paste 7.95 3 21.20 8
bread 7.32 4 16.47 9 27.45 15
36. Sales Data Warehouse Model
Fact Table
Country Product Month Units Rupees
India Soaps January 3 7.95
India Paste January 4 7.32
UK Soaps January 3 7.95
UK Paste January 4 7.32
India Bread February 16 42.40
we invent something only if there is a need for that thing….today we are going to see what data warehousing is…data warehouse is evolved to satisfy some needs….we will see some of these need now
When we need to extract data from various sources, some may be manually maintained on paper, some on different legacy systems and integrating the data is a laborious work. Many systems provide some DTS systems to convert data in appropriate format and provide necessary transformations
We need subject oriented and multidimensional data amodel fro data warehouse which facilitates online analysis