3. Data Warehouse
It is a repository of data used for analysis
and reporting.
The data stored in data warehouse is
loaded from different business source
system.
Data may pass through operational data
store for cleaning before it is used in D/w
for reporting.
3
5. OPERATIONAL SOURCE SYSTEM:
5
In data warehouse data may be uploaded from one or
many operational source system.
Suppose retail industry and sale consumer product
generating revenues for a company through different
channel.
DIFFERENT CHANNELS ARE:-
1. SALES
2. ECOM
3. POS
4. MARKETING
7. Non Persistent:- Data from sale, Ecom or
from different channel it just replicate
data in a Non Persistent storage area in a
specific given time(like 8-11 a.m).
7
8. Operational Data store: It redefined form
of data . It may have data from different
channel in single table or in mulitple table
but in readable format.
This is called Integration Area.
8
9. Informatica Power Centre:
Access, transform and integrate data from
any system , in any format and deliver the
data throughout the enterprise.
ETL :- is used to put records into data
warehouse.
9
12. Extraction: we extract data from source
system and make it accessible for further
processing.
Extraction Strategies:
o Full extraction
o Partial extraction
12
13. Transform:
Data extracted into a staging server is a raw data
and can’t be used as it is .
It needs to be cleaned, mapped and
transformed before finally loaded into data
warehouse
13
14. Basic Transformation Tasks:Basic Transformation Tasks:
Selection
Matching
Data cleaning and enrichment
Consolidation or summarization
15. Loading:
In loading we load the data from staging
server into data warehouse.
Data loading fetches prepare data, applies
it into the data warehouse and store it in
the database
15
16. Types of loading:
Initial Load:- It is done when first time data to
be loaded into data warehouse.
Incremental Load:- applying energy changes
as necessary in a periodic manner.
Full Refresh:- completely erasing the contents
of one or more tables and reloading with fresh
data.
16
19. Why we store data into Data Marts
from Data Warehouse?
In d/w it may have record 10 year old
record .
It is time consuming process.
In Data Mart we create table and store
only summarized data and store a weakly
data.
19
22. 22
Data Warehousing - Schemas
Schema is a logical description of the entire database. It includes
the name and description of records of all record types including all
associated data-items and aggregates. Much like a database, a data
warehouse also requires to maintain a schema. A database uses
relational model, while a data warehouse uses :
•Star schema
•Snowflake schema
• Fact Constellation schema
23. 23
Star Schema:
•Each dimension in a star schema is represented with only one-
dimension table.
•This dimension table contains the set of attributes.
•The following diagram shows the sales data of a company with
respect to the four dimensions, namely time, item, branch, and
location.
24. 24
•There is a fact table at the center. It contains the keys to each of
four dimensions.
•The fact table also contains the attributes, namely dollars sold and
units sold.
25. 25
Snowflake Schema:
•Some dimension tables in the Snowflake schema are normalized.
•The normalization splits up the data into additional tables.
•Unlike Star schema, the dimensions table in a snowflake schema
are normalized. For example, the item dimension table in star
schema is normalized and split into two dimension tables, namely
item and supplier table.
•Now the item dimension table contains the attributes item_key,
item_name, type, brand, and supplier-key.
•The supplier key is linked to the supplier dimension table. The
supplier dimension table contains the attributes supplier_key and
supplier_type.
27. 27
Fact Constellation Schema:
•A fact constellation has multiple fact tables. It is also known as
galaxy schema.
•The following diagram shows two fact tables, namely sales and
shipping.
•The sales fact table is same as that in the star schema.
•The shipping fact table has the five dimensions, namely item _ key,
time _ key , shipper _ key , from _ location, to _ location.
•The shipping fact table also contains two measures, namely dollars
sold and units sold.
•It is also possible to share dimension tables between fact tables. For
example, time, item, and location dimension tables are shared
between the sales and shipping fact table.
29. 29
Types of Data Stored in a Data
Warehouse:
The term data warehouse is used to distinguish a
database that is used for business analysis
(OLAP) rather than transaction processing
(OLTP)
Your data warehouse will store these types of
data:
• Historical data
•Derived data
• Metadata
30. 30
Historical Data
A data warehouse typically contains several years of
historical data. The amount of data that you decide to
make available depends on available disk space and the
types of analysis that you want to support. This data can
come from your transactional database archives or other
sources.
Some applications might perform analyses that require
data at lower levels than users typically view it. You
will need to check with the application builder or the
application's documentation for those types of data
requirements.
31. 31
Derived Data
Derived data is generated from existing data using a
mathematical operation or a data transformation. It can be
created as part of a database maintenance operation or
generated at run-time in response to a query.
Metadata
Metadata is data that describes the data and schema objects,
and is used by applications to fetch and compute the data
correctly.
32. 32
Useful URLs
Ralph Kimball’s home page
http://www.rkimball.com
Larry Greenfield’s Data Warehouse Information
Center
http://pwp.starnetinc.com/larryg/
Data Warehousing Institute
http://www.dw-institute.com/
OLAP Council
http://www.olapcouncil.com/