A data warehouse is a database used for reporting and analysis that integrates data from multiple sources. It provides strategic information through analysis that cannot be done by operational systems. A data warehouse contains integrated, subject-oriented data that is periodically updated and stored over time for decision making. It supports analytical tools and access for management rather than daily transactions.
2. What is data warehousing?
data warehouse is a
database used for reporting
and analysis
Integrated collection of
ENTERPRISE-WIDE DATA, oriented to
decision making
Provides strategic information
Performing Information analysis that could
not done by operating system
3. Need for data warehousing
Maintain data history
Even if the source transaction systems do not.
Integrate data from multiple source systems,
Improve data quality by providing consistent
codes and descriptions
Provides a flexible, conducive and interactive
source of strategic information
Performing Information analysis that could not
done by operating system
4. Data Rich, but Information Poor
• Data is stored, not explored :
by its volume and complexity
it represents a burden,
not a support
• Data overload results in
uninformed decisions,
contradictory information,
higher overhead,
wrong decisions,
increased costs
• Data is not designed and
is not structured for
successful management
decision making
6. Operational data stores
Data focuses on transaction functions
such as bank card withdrawals
and deposits
It is organised by application ODS
It contains the current values
It supports day-to-day operational decision
supports information
it is detailed , nonredundant and updateable
7. Informational data stores
Itis organised around subject
such as customer, product
It is
summarized, archived, derived
Data is static until refreshed
Data is nonupdateable
8. Difference between operational
&informational data stores
Operational Informational
Data data
Data content Current value Summarized, archived,
derived
Data organization By application By subject
Data stability Dynamic Static until refreshed
Data structure Optimized for transaction Optimized for complex
Queries
Access frequency High Medium to low
Access type Read/update/delete Read/aggregate
Field by field Added to
Response time Subsecond(<1s) to2-3s Several second to minute
9. Data warehousing is defined as
A data warehouse is a subject-oriented, integrated,
time-variant, non-volatile collection of data in
support of management decision
A data warehouse is designed for easy access by
users to large amounts of information, and data
access is typically supported by specialized analytical
tools and applications.
10.
11. Data Warehouse Characteristics
It is database designed for analytical
tasks, using data from multiple application
It supports a relatively small numbers of users
with relatively long interaction
Its content is periodically updated
It contains current and historical data to
provide a historical perspective of information
It contains a few large tables
12. Integrated
• Data is stored once in a single integrated location
(e.g. insurance company)
Auto Policy
Processing Data Warehouse
System Database
Customer
Fire Policy
data Processing
stored System
in several
databases
Subject = Customer
FACTS, LIFE
Commercial, Accounting
Applications
12
13. Time - Variant
• Data is stored as a series of snapshots or views which record how it is
collected across time.
Data Warehouse Data
Time Data
{
Key
Data is tagged with some element of time - creation date, as of
date, etc.
Data is available on-line for long periods of time for trend
analysis and forecasting. For example, five or more years
13
14. Non-Volatile
• Existing data in the warehouse is not overwritten or
updated. External
Sources
Production Data
Databases Warehouse
Data Database
Production
Warehouse
Applications
Environment
• Load
• Update
• Insert • Read-Only
• Delete
14
15. Subject Oriented
• Example for an insurance company :
Applications Area Data Warehouse
Auto and Fire
Policy
Commercial
Processing Customer Policy
and Life
Systems
Insurance
Systems
Data
Data
Claims
Losses Premium
Accounting Processing
System Billing System
System
15
16. Data Warehouse Architecture
It is based on a
relational database
management system
server that function
as the central repository
for informational data
17. Operational System Data Warehouse
Ad-hoc
Reporting
Conversion
& Interface OLAP
Cubes
Canned
Reports
ODS Staging Area
Data Marts
17
18. Data Warehouse Architecture
The source data for it is operational application
During processing data is transformed into an
integrated structure and format
The transformation process may involve
conversion, summarization, filtering and
condensation of data
19.
20. References:
Introduction to data warehousing
.wikipedia.org/wiki/Data_warehouse
www.slideshare.net/datacleaners11/datawar
ehousingppt
www.4shared.com/office/pLEWhceH/Data_W
arehousing.html
www.cse.iitb.ac.in/dbms/Data/Talks/krithi-
talk-impact.ppt