1. Presented to:
Prof. A.M. Wani
Data
Warehouses
Presented by:
Nazir Ahmad
Enroll No.: 110215
2nd semester
2. Data warehouse
A “data warehouse” is a repository of data collected
from the various operational systems of an organization.
A Data Warehouse is a
• Subject-oriented
• Integrated
• Time-variant
• Non-volatile
collection of data in support of management decisions.
3. Subject Oriented Data
• Data is stored by subjects, not by applications.
• In a bank, SB account, CD account etc are application areas.
• Account is a data subject.
4. • The data in the data warehouse comes from several operational
systems.
• Different data formats ,attributes in different operational systems.
• integrating data and removing inconsistencies becomes necessary.
Integrated Data
Subject=
Account
SB Account
CD Account
Loan
Account
Data
from
applications
5. Time Variant Data
• Each record in a data warehouse relates to a specific time element.
A time–stamped data is maintained.
• Because of its very nature and purpose, data warehouse has to
contain historic data.
6. Non-Volatile Data
• data from the operational systems are moved into the data
warehouse at specific time intervals.
• Contains historic data , in form of summaries.
• Once loaded in warehouse, data is not changed i.e. read-only
7. Operational systems vs Data warehouse
systems
Operational system Data warehouse
Holds current data Holds historical data
Data is Dynamic Data is largely Static
Read/write access Read Only access
Application Oriented Subject Oriented
Used by clerical Staff for day-to-day
operations
Used by Top Managers for analysis
Must be Optimized for small queries Must be Optimized for Complex Queries
8. Components
• Source data component
• Data staging component
• Data storage component
• Metadata component
• Information delivery component
9. Data Source component
Components from which the source data is extracted
Can be categorized into
• Production Data: data comes from various operational
systems of the enterprise.
• Archived Data: in every operational system, the old data is
taken periodically and stored in archived files.
• External Data: source other than the organization. statistics
produced by some external agency.
10. Data Staging Component (ETL)
Data Extraction: This function has to deal with numerous data
sources to extract data. Appropriate technique has to be employed
for each data source.
Data Transformation: cleaning data i.e. inconsistencies are
removed. Standardization of data elements forms a larger part of
data transformation. Synonyms, are resolved.
Data Loading: in this stage, the data is loaded into the DW. Initial
load moves large volumes of data. As DW starts functioning,
incremental data revisions are fed. Refreshing the DW means
completely overwriting it.
11. Data Storage Component
• It is a separate repository.
• Kept separate from the data storage for operational systems
• Read- Only repository
• Available, Accessible
• Data is kept in structures suitable for analysis, less normalized
than of operational system databases.