2. The issus I’ll make focus
on…
What is data ware house?
Architecture of data ware house?
Olap server and its various types and
their working?
Data marts?
4. A data warehouse is a
Subject-oriented ->DATABASE AND DATAWARE HOUSE ARE 2
DIFFERENT THINGS SO DIFFERENT APPROACH S OF STORING DATA INTO THEM .
Integrated -> BRINGING INTO A COMMON FORMAT
Time-varying ->HISTORICAL DATA ,DATE ASSOCIATED WITH TIME
Non-volatile -> UNDELETABLE AND NON UPDATABLE FORMAT
collection of data that is used
primarily in organizational decision
making.
5. Subject oriented??
5
Operational Database
Application -orientation
Order
processing
Saving account
Data ware house
Subject-orientation
sales
account
Stock mgmt
Billing
Current account
Loan account
Business
Bank
6. Explanation
As we can see in both business and bank example the
databases store the data application wise . It simply
means that for every operational application of the
organization there is a storage associated in which
that application specific data are stored. These
storages are called database.
But in the case of data ware house of the organization
the data are stored subject wise , this subject is most
important aspect of the organization . for bank
account is important for business sale is important
7. Integrated ??
•Data in DW comes from several operational systems.
•Different datasets in these operational system have different file formats.
•Example: Data for subject Account comes from 3 different data
sources.(AS SHOWN IN FIGURE)
Account
savings
current
Loan
Subject = account
Operational environment
8. o So variations could be there, like:
1. Naming conventions could be different.
Example: Saving account no. could be of 8 bytes long but only 6 bytes for
checking accounts.
1. Number of total Attributes for data items could be different.
Example :saving account can have 5 attribute while checking account can
have 7 attribute associated with it.
9. Time variant??
Data warehouse
The operational database stores only current data but the data
ware house stores all present as well as past data in order to full fill
its purposes.
Data is stored as series of snapshots each representing a period of
time.
Data is tagged with some element of time - creation date, as of
date, etc.
Data is available on-line for long periods of time for trend analysis
and forecasting. For example, five or more years
10. Non-volatile??
Data from operational systems are moved into DW after specific
intervals.(process is called refreashing)
Business transaction don’t update in Data ware house.
Data from Data ware house is not deleted.
11. The 3 tier architecture of Data Ware
house---
•When all the components of the system are combined together to
form the complete system then the style of designing(combining) of
that structure is known as the architecture of the system.(ex-the
architecture of a school building).
•In data ware house the components are-
1. Data acquisition
2. Data storage
3. Data processing
4. Data delivery
Layers(ex. Osi reference model in computer network ) means the
system is made by logically separated components and tier means
the system is made by physically separated components.
12. The various possible architecture
while dealing with database:
Hare database (in the
form of files) is itself
stored in the client
computer.
Hare database server is
present in the distant
place and client machine
and database are
connected via network.
13. Here between the client
machine and the database
server we have included an
application server which is
mainly at server side and does
the processing and return
results to the client machine.
15. The architecture of data ware house
Data tier logic tier presentation
Information Sources Data Warehouse
Server
(Tier 1)
OLAP Servers
(Tier 2)
Clients
(Tier 3)
External
sources
Operational
DB’s
extract
transform
load
Data
Warehous
e
Data Marts
MOLAP
serve
ROLAP
OLAP
Query/Report
ing
Data Mining
serve
serve
tier
16. The bottom most:
Operational databases
External sourse
•These are the application
specific database which are
used to store all the daily
basis transactional data of
the organization.
•This is the database which
is used to store all
important external
information.
17. Database vs. data ware house
OLTP (on-line transaction processing)
Major task of traditional relational DBMS
Day-to-day operations: purchasing, inventory,
banking, manufacturing, payroll, registration,
accounting, etc.
OLAP (on-line analytical processing)
Major task of data warehouse system.
Data analysis and decision making.
Forecasting, monitoring of business.
18. How loading is done of the
Warehouse??
This is done using back end tools. To
know about back end tools go to the next
page.
19. Data extraction:
get data from multiple, heterogeneous, and external sources.
Data cleaning:
correcting values.
Data transformation:
converting from one format to another format. (pond kg ,
age dob)
Load:
summarize tables are loaded into data ware house.
Refresh:
propagate the updates from the data sources to the warehouse.
20. Tier 1 :data ware house
It is the data ware house that is
loaded with strategy making
information.
This tier also consists of data marts.
21. Tier 2
This tier consists of Olap server which
are used for the processing purposes. Here
the following issues are also handled—
Security of data.(you are not letting user directly communicate
with data base)
Business logic(here you can decide what kind of information to be
shown to a particular kind of query ).
Translation(users high level query are converted into low level sql
query).
Intermediate calculations(removes burden from user
interface and database )
22. Olap server
Rolap server Molap server
Choose this if space is
important for you
Choose this if time is
important for you
24. Multi dimensional
view
Desktop
client
Rolap server
Creating data cube
dynamically (on the
fly)
Rdbms
server
Data
ware
house
ROLAP
25. DETAILS
Relational online analytical processing (ROLAP) is a
form of online analytical processing (olap) that
performs multidimensional analysis of data which is
stored in a relational database rather than in
a multidimensional database.
In a three-tiered architecture, the user submits a
request for multidimensional analysis and the ROLAP
engine converts the request to SQL for submission to
the relational database. Then the operation is
performed in reverse: the engine converts the
resulting data from SQL to a multidimensional
format(on the fly) before it is returned to the client
for viewing.
26. Add up total sale amount by day
In SQL: SELECT date, sum(amt) FROM
SALE GROUP BY date
ans date sum
1 81
2 48
sale prodId storeId date amt
p1 s1 1 12
p2 s1 1 11
p1 s3 1 50
p2 s2 1 8
p1 s1 2 44
p1 s2 2 4
QUERY
28. Multi dimensional
view
Desktop
client
Molap server
Rdbms
server
Data
ware
house
Multidimensional database
Molap
29. POINTS ABOUT MOLAP:
Here we use Multidimensional database for the
purpose of data fetching when an analytical query is
submitted by user.
Facts (fact table)are stored in multi-dimensional
arrays.
Dimensions(dimension table) used to index the arrays.
One of the major distinctions of molap against a rolap
tool is that data are pre-summarized pre-calculated and
are stored in an optimized format in a multidimensional
cube, instead of in a relational database , in accordance
with a client’s reporting requirements .
30. MOLAP is more optimized for fast query performance and
retrieval of summarized information.
There are certain limitations to implementation of a MOLAP system, one
primary weakness of which is that MOLAP tool is less scalable than a
ROLAP tool as the former is capable of handling only a limited amount of
data.
Pre-calculating or pre-consolidating transactional data improves speed.
31. The MOLAP Cube
Add up total sale amount by day
Fact table view: Multi-dimensional cube:
sale prodId storeId amt
p1 s1 12
p2 s1 11
p1 s3 50
p2 s2 8
s1 s2 s3
p1 12 50
p2 11 8
dimensions = 2
32. Add up total sale amount by day
Fact table view: Multi-dimensional cube:
dimensions = 3
sale prodId storeId date amt
p1 s1 1 12
p2 s1 1 11
p1 s3 1 50
p2 s2 1 8
p1 s1 2 44
p1 s2 2 4
day 2 s1 s2 s3
p1 44 4
p2
s1 s2 s3
p1 12 50
p2 11 8
day 1
33. The total sale of of computers in year 2008 at the location asia is 200 unit
The total sale of of books in year 2008 at the location Europe is 200
34. Hybrid OLAP (HOLAP)
HOLAP = Hybrid OLAP:
Best of both worlds
Storing detailed data in RDBMS
Storing aggregated data in MDBMS
User access via MOLAP tools
35. Data Flow in HOLAP
MDBMS Server Client
Multi-dimensiona
l access Multidimensiona
l Viewer
Relational
Viewer
Multi-dimension
aldata
SQL-Read
RDBMS Server
User
data Meta data
Derived
data
SQL-Reach
Through
SQL-Read
36. Pie chart
reports
Front end tools
Mobile phone
computer
Query result Graphs Bar chart