SlideShare une entreprise Scribd logo
1  sur  17
Data Warehouse - Dimensional Modeling and
Design
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Ver Rev. 1.0 i
Table of Contents
1 Introduction............................................................................................................1
2 What is Data Warehousing....................................................................................1
2.1 DATA WAREHOUSE TERMINOLOGY.........................................................................2
2.1.1 Data Mart....................................................................................................2
2.1.2 Metadata.....................................................................................................3
2.1.3 Cube...........................................................................................................3
2.1.4 Data Cleansing............................................................................................4
2.1.5 Extraction Transformation and loading (ETL)..............................................4
2.1.6 Data Mining.................................................................................................5
2.2 ISSUES IN BUILDING A DATA WAREHOUSE................................................................5
2.3 BENEFITS OF DATA WAREHOUSING........................................................................6
3 Data Warehouse Modeling Techniques...............................................................6
3.1 WHAT IS ENTITY RELATIONSHIP (E-R) MODELING..................................................6
3.2 E-R MODELING APPROACH....................................................................................7
3.2.1 Entities........................................................................................................7
3.2.2 Attributes.....................................................................................................7
3.2.3 Relationships...............................................................................................7
3.2.4 Normalization..............................................................................................7
3.3 LIMITATIONS OF E-R MODELING FOR DW DESIGNING.............................................7
3.4 DIMENSIONAL MODELING APPROACH FOR DW DESIGNING......................................8
3.5 WHAT IS DIMENSIONAL MODELING (DM)................................................................8
3.6 DM TERMINOLOGY................................................................................................8
3.6.1 Grain...........................................................................................................8
3.6.2 Fact Table...................................................................................................8
3.6.3 Dimension Table.........................................................................................9
3.6.4 Star Schema...............................................................................................9
3.6.5 Snow Flake Schema...................................................................................9
3.7 APPROACH FOR DIMENSIONAL MODELING..............................................................9
3.8 WHY DM FOR DESIGNING DATA WAREHOUSE......................................................11
3.9 DESIGNING DM FOR E-R MODELING....................................................................11
4 What is OLAP.......................................................................................................12
4.1 OLAP FEATURES................................................................................................12
4.1.1 Multidimensional views..............................................................................12
4.1.2 Calculation-intensive.................................................................................13
4.1.3 Time Intelligence.......................................................................................13
4.2 BENEFITS OF OLAP............................................................................................13
4.3 TYPES OF OLAP SYSTEM (DATA STORAGE FORMAT)...........................................14
4.3.1 Multidimensional OLAP (MOLAP).............................................................14
4.3.2 Relational OLAP (ROLAP)........................................................................14
4.3.3 Hybrid OLAP (HOLAP)..............................................................................15
5 Abbreviations Used.............................................................................................15
6 References...........................................................................................................15
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Ver Rev. 1.0 ii
1 Introduction
For most organizations managing and using the enterprise-wide information to make
effective business decisions is one of the most challenging tasks in today’s ever
changing business environment. Business data is the key asset for any organization
and these organizations have started to realize the need to make best use of this
data for making timely business decisions. In a typical business environment
thousands of transactions take place every day and collecting this information in such
a way that can help business users to analyze and present it in a effective manner is
a crucial requirement for organization’s growth.
One of the most dramatic turnaround in database design, the dimensional Data
Warehouse is a powerful database model and a prominent doorway, that enhances
the capability of a business user to analyze huge, multidimensional sets of data.
Data Warehouse technology has emerged as an increasingly popular and powerful
concept of applying information technology to turn huge islands of data into meaning
information for quick and effective business decisions.
The Data Warehouse technology is said to have embedded business intelligence in it
and can be termed as a practical approach for decision support systems.
2 What is Data Warehousing
A data warehouse is an integrated data store that contains the data residing in
organizations desperate data sources in a globally accepted way, which has
standard encoding structure and storage format. These data sources could be on
heterogeneous platform and have some information that may not be relevant at all to
be included in the data warehouse, as it doesn’t help the business users in analyzing
the business trends and/or making forecasting decisions.
A data warehouse is a copy of transactional data specifically structured for querying
and reporting. The form of the stored data has nothing to do with whether something
is a data warehouse. A data warehouse can be normalized or denormalized. It can
be a relational database, multidimensional database, flat file, hierarchical database,
object database, etc.
As compared to transactional system, a key advantage of using a data warehouse
apart from ease of data analysis is that it dramatically increases the query response
time. This important feature of the data warehouse is possible because it uses an
entirely different modeling approach for designing, known as dimensional modeling.
Other distinguishable features of a data warehouse are as follows:
• Contains historical data
The primary purpose of having a data warehouse in place is to analyze the business
trends. Therefore the data warehouse is designed to contain historical data as
compared to 3-6 month old data in OLTP systems. This data could be as long as 4-
10 years depending upon the needs of your business organization. After this period
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 1 of 15
is over, the data is purged from the data warehouse and the backup of the same
could be taken on stable media’s like magnetic tapes etc. for the purpose of future
references.
• No frequent updates
Since most of the time, the data warehouse information is used for reporting
purposes, the design approach should be in such a manner, which doesn’t support
frequent updates. Some times, there may be a business requirement where the
modeler may require few updates.
• Independent of all transactional data sources
To relieve the different transactional processing systems from the burden of frequent
querying and reporting operations, the data warehouse server should be organized
as an independent system, which should provide quick query response and hence
increase productivity.
• Subject-oriented
All the related data about a business process is stored and kept as a single set in a
meaningful and useful format, e.g. Sales, finance, marketing, customer etc.
• Non-volatile
Non-volatile means the data is loaded in the data warehouse on a scheduled basis
and henceforth accessed from the warehouse itself. The schedule time depends
upon the specific business requirements and could be daily, weekly, monthly etc.
It is to be noticed that data warehouse is not a software package, which can be
produced by a software vendor and purchased in the market. It is a set of software,
hardware and tools that has to be customized according to the needs of a particular
business requirement. Data warehousing is not just data in the data warehouse, but
also the architecture and tools to collect, query, analyze and present the information
to help business users make strategic business decisions, increase productivity and
keep a edge in the global competitive business environment.
2.1 Data Warehouse Terminology
Before taking an insight into the data warehouse world, it is very handy to have a
look and understand the following basic terms, which are frequently used in the data
warehousing literature.
2.1.1 Data Mart
Data marts are subsets or departmental warehouses, which contain the data about a
specific subject out of the whole set of subject oriented data warehouse. The
concept of data mart is similar to the concept of views in relational databases. They
are designed to meet the needs of a specific set of users of an organization.
Examples of data mart could be campaign management, marketing, finance, sales
etc.
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 2 of 15
There are two approaches to building a data warehouse and data mart. One is the
top down approach where the first step is to create the data warehouse and then
make the small sized data marts from them depending upon the needs of concerned
business organization. Second approach involves the design of a set of data marts,
simulating organization’s various processes, like sales, production etc. and then
building the data warehouse from the set of data marts. This approach is generally
termed as bottom up approach of building a data warehouse.
Data marts are less expensive and take less time for implementation with quick ROI.
They are scalable to full data warehouses and at times are summarized subsets of
more detailed, pre-existing data warehouses.
2.1.2 Metadata
Metadata describes the data that is contained in data warehouse. This includes the
data elements and the business logic. Apart from this basic definition, metadata also
contains the definition of extraction and transformation logic, required to extract the
data from disparate data sources and put it into the data warehouse. Metadata can
be equated to a data warehouse’s information map or directory, dictionary, or card
catalog.
2.1.3 Cube
Cubes are the fundamental unit for data storage and retrieval in an OLAP system.
Business measures can be stored for various dimensions in a cube. The concept of
cube is similar to the concept of table in relational database systems.
The dimensions of a cube are the perspectives from which the data can be viewed
and analyzed. It’s easy to visualize the two dimensions of a relational system-rows
and columns. But usually a cube has more than two dimensions. For example
Microsoft OLAP cube can have up to 64 dimensions.
OLAP cubes provide the capability to the end users to perform slicing and dicing
operations. Slicing a cube indicates that the end user can keep on extracting the
information for a particular dimension up to any level he wants, keeping all other
dimensions constant. For example a business user can see the information at
product segment, product series, product family or product level keeping all other
dimensions such as time, location etc. fixed. On the other hand dicing of cubes
indicates that while extracting the information for a particular dimension, the business
user can switch to other dimensions, and hence can make his business data analysis
even more effective.
Similar to the concept of views in relational database systems, there is a
methodology by which the data warehouse designer can extend the cubes and
combine their definitions to form a logical entity, which are already defined in an
OLAP system. These logical units are termed as virtual cubes. Thus the definition of
a virtual cube consists of the combination of definitions of at least one cube in an
OLAP system. They are derived from one or more other cubes.
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 3 of 15
2.1.4 Data Cleansing
A crucial factor in the success of a data warehousing project is the accuracy of
customer information. Therefore, data cleansing process plays a significant role in
such knowledge management projects as it improves the quality of the customer
data.
In a typical data cleansing tool, the various data cleansing operations can be
obtained using following processes:
• Formatting and enhancements
• Address verification
• Name parsing
• Duplication customer record check
Formatting is usually the first step in data cleansing. This process removes the junk
characters and formats the data for subsequent operations.
Data cleansing tools contain large databases for various countries to store the
information about their geographical hierarchy. Verification in data cleansing ensures
data integrity.
Parsing is the process in data cleansing which enables rearrangement of customer
information into a preferred and consistent format.
Matching process identifies duplicate records, thereby ensuring availability of unique
customer information.
2.1.5 Extraction Transformation and loading (ETL)
Extraction is the process of pulling the data from disparate data sources or legacy
systems and putting it into an intermediate stage before loading it into the data
warehouse. This intermediate stage is sometimes called as staging area.
Staging area generally reflects the whole enterprise data as a consolidated unit,
which is dispersed in heterogeneous data sources. The difference is that the
data in different data sources could be in various storage formats or encoding
structures, but staging area is a database, which is a replica of different data
sources and unites dispersed data in a uniform format, so that cleansing and
transformation processes can be carried out smoothly.
Transformation is the next step in the process of migrating the data from different
data sources to big data warehouse. It includes converting the data into a format
and presenting it in such a manner, which facilitates the easy understanding of
data and enhances the business user’s capability to carry out the business data
analysis and hence supports the process of quick and effective corporate
decision making. It is a good practice to put the transformed data into a second
intermediate stage and not onto actual data warehouse server to relieve it from
the burden of handling and managing large and complex data transformation
processes. This second intermediate stage is sometimes termed as operational
data store (ODS). This leaves the data warehouse server concentrate only to
handle large and complex queries without any performance hit.
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 4 of 15
Loading is the final process of migrating the data from data source to data
warehouse. Once the transformed data has been captured into operational data
store, the final task is to upload it into data warehouse server by converting the
data into a format, which compiles into the design methodology of the data
warehouse, which is known as dimensional data modeling. It is this design
methodology, which embodies the business intelligence into data warehouse and
hence prepares a full-fledged decision support system, which is then ready to be
used by various reporting tools to generate the business information for making
corporate wide effective business decisions.
2.1.6 Data Mining
Data mining predicts the future trends and behaviors, allowing businesses to make
proactive, knowledge driven decisions. Data mining is a mechanism, which uses
intelligent algorithms to discover patterns, clusters and models from data. These
patterns and hypotheses are then rendered in operational forms that are easy for
users to visualize and understand. It is about searching for patterns in data that are
relevant to the business. One way of looking at data mining is that input data goes in,
and patterns come out. The patterns may be descriptions of interesting features in
the data, or they may be predictions about the future.
Data mining is the process of analyzing business data in the data warehouse to find
unknown patterns or rules of information that can be used to tailor business
operations. For instance, data mining can find patterns in data to answer questions
like:
• What item purchased in a given transaction triggers the purchase of additional
related items?
• How do purchasing patterns change with store location?
• Did the same customer purchase related items at another time?
2.2 Issues in building a data warehouse
Before taking a data warehousing project into hand, it’s very essential to analyze the
risk factors involved into it. Studies have shown that 10 to 90 percent of data
warehousing projects end with scrap baskets. Some potential issues that could
evolve during the entire life cycle of a data warehousing project are discussed below.
• Data warehousing systems can complicate business processes
significantly.
Data warehousing, if unchecked, can foster the "institutionalization" of easily created
reports whose reason for being quickly is forgotten while people still toil to process
these reports. If the organization does not know how to throw out processes, data
warehousing can quickly add clutter to the business environment.
• Data warehousing can have a learning curve that may be too long.
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 5 of 15
Currently IT industry has shortage on data warehousing experts. A lot of research
work is going on the various aspects of data warehousing architecture. One of them
is data mining. There are limited numbers of people available who have worked with
the full data warehousing system project "life cycle". Despite the speed of the data
warehousing development effort, it takes time for an organization to figure how it can
change its business practices to get a substantial return on its data warehousing
investment.
2.3 Benefits of Data Warehousing
A well designed and implemented data warehouse can be used to:
• Understand business trends and make better forecasting decisions
• Bring better products to market in a more timely manner
• Analyze daily sales information and make quick decisions that can significantly
affect company's performance
Data warehousing can be a key differentiator in many different industries. At present,
some of the most popular data warehouse applications include:
• Sales and marketing analysis across all industries
• Inventory turn and product tracking in manufacturing
• Category management, vendor analysis, and marketing program effectiveness
analysis in retail
3 Data Warehouse Modeling Techniques
The most efficient way to build an effective data warehouse is to use a dimensional
model to design it. Dimensional modeling is a database design methodology that is
used to design data warehouses. Dimensional modeling for a data warehouse is
similar to entity relationship modeling (E-R modeling) for an RDBMS. Both modeling
techniques provide a methodology that facilitates the creation of effective well-
designed databases. They differ, however, in their approach to business questions
and design goals. An entity relationship model focuses on data integrity and
efficiency of data entry so that you can enter each piece of data, such as customer
address, only once. In contrast, a dimensional model focuses on business processes
and business questions.
This section first discusses the E-R modeling technique concisely and then discusses
why this approach is not suitable for designing a data warehouse.
3.1 What is Entity Relationship (E-R) Modeling
Entity Relation (E-R) model is developed to answers the following issues of
conventional Data Base Management System (DBMS).
(i) Redundancy of data
(ii) Lack of integration and
(iii) Lack of flexibility,
This modeling is based on the relational theory and abides by the 13 rules proposed
by E.F. Codd that a DBMS implementation must follow to be qualified as truly
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 6 of 15
relational. The data in E-R model is presented in a simple form of two-dimensional
tables.
3.2 E-R Modeling Approach
In E-R model the database is designed using:
3.2.1 Entities
These are the real life objects, which are being modeled. Entities are capable of
independent existence and can be uniquely identified.
3.2.2 Attributes
These are the properties/characteristics of the entities, which uniquely identify a
given entity in the real world.
3.2.3 Relationships
This represents the association among the entities and the way in which entities
interacts.
3.2.4 Normalization
Normalization is a process of decomposing the tables to prevent redundancy, insert
& update anomalies.
The design approach consists of following steps:
• Identify the Entities about which descriptive information is to be stored
• Identify the Attributes (i.e. the properties) of the entities, which meaningfully
describe the entities
• Define the relationship between entities. The functionality of the relationship can
be one-to-one (1:1), one-to-many (1:N) or many-to-many (N:M)
• The E-R model is then transformed to 1st
, 2nd
, 3rd or BCNF normal form
3.3 Limitations of E-R modeling for DW Designing
E-R modeling is a powerful technique to design and develop OLTP systems in
relational environment. However E-R model defeats the basic needs of data ware
housing, namely prompt and high-performance retrieval of data. E-R model is
characterized by:
• Highly normalized designed to prevent redundancy of data
• Frequent insertion and updated to support OLTP (On Line Transactional
Processing) concept
• Less indexing to support frequent inserts and updates
• As the design is supportive to OLTP system the data storage life is small. Also as
the insertion rate is high the amount of data is huge. So to keep the data volume
within manageable limits the data storage life is small
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 7 of 15
Data warehouse on the contrary has following features:
• Fast processing speed of queries
• Very minimal inserts and updates to support OLAP (On Line Analysis
Processing) concept
• The DW database uses indexing to make query processing faster
• The data stored is historical in nature as trend analysis is the main idea for
designing DW
Because of the above contradictions in the features, E-R model modeling approach
is not preferred for the design of data warehouse.
3.4 Dimensional Modeling Approach for DW Designing
As the features of E-R model don't support the design of DW, there is always a need
for a model to support the OLAP (On Line Analytical Processing) systems, the
concept on which the data warehouse design is based. The modeling that can be
used for the DW design:
• Should have a mechanism that allows low updates and should use indexing for
faster query processing
• Should store historical data for proper analysis of trends
• Should have less number of joints between tables for query performance
• Should create analysis-oriented database, which can be queried
These features are met by Dimensional Modeling technique, which is accepted
widely for the design of Data warehouse.
3.5 What is Dimensional Modeling (DM)
Dimensional Modeling is the favorite modeling technique for data warehouse
designing. It is a design technique, which represents the data in a standard
framework that supports prompt and high performance access of data. Like E-R
model, DM is also based on the concept of relational modeling with some restrictions.
The DM consists of a Fact table, having multiple parts Primary key, and several
Dimension tables having singled part Primary Key.
3.6 DM Terminology
3.6.1 Grain
Each record in a fact table is known as a grain.
3.6.2 Fact Table
The fact table is the central table in a star schema diagram (described in the next
section). It can consist of millions of rows and contains the additive or factual data
about a business that can help answer the business questions better. It brings
together data that would reside in multiple tables throughout the database in
traditional relational databases.
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 8 of 15
Fact table consists of two parts:
• Key Attributes: These are the multiple part Primary keys and are usually Foreign
Keys from Dimension tables
• Non-Key Attributes: These are the most useful attributes (Facts) in the fact table
and have numeric and additive values.
3.6.3 Dimension Table
Dimension tables are the secondary tables in a dimensional model. Dimension tables
have fewer rows than fact tables and contain descriptive information about the
business such as customers or product information. These tables enable the
business user to quickly drill down from the fact table to additional information in key
business areas.
Fact tables are the entry points in the data warehouse. Like Fact tables these tables
also have two parts:
• Key Attributes: These are the single part Primary key and usually surrogate keys
are used for this purpose.
• Non-Key Attributes: These are text like attributes and describes the nature and
characteristics of the dimension.
3.6.4 Star Schema
Star schema for data warehouse design consists of a central Fact table with detail
and summary data and only one foreign key from each dimension table. The
dimension table is highly denormalized and large in size. Dimension tables surround
the central fact table. Scope of using star schema for data warehouse designing is
summarized below:
• Advantages: This schema is easy to understand and has less number of physical
joins
• Disadvantages: Fat dimensional tables are difficult to maintain.
3.6.5 Snow Flake Schema
Snowflake schema for data warehouse design consists of dimension tables, which
are highly normalized by decomposing them into various hierarchical levels. Each
dimension table has one primary key at every level of hierarchy. The most granular
level of dimensional hierarchy is the entry point to the fact table.
• Advantage: Best performer when query involves aggregates.
• Disadvantage: Number of dimensional tables is large.
3.7 Approach for Dimensional Modeling
Following steps are followed for designing a data warehouse using DM:
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 9 of 15
1. Get the grain statement for the process to be modeled. This can be specific to the
users’ requirement.
Example: What is the sale of a particular product in a given region from a
specified sales channel in a specified time period?
2. Identify the Dimensions from the grain statement.
Example: Product, Region, Sales Channel, and Time Period.
3. Identify the Key and non-key attributes of the Dimensions.
Example: For the product dimension Product Key can be the Primary Key (Key
attribute). The non-key attributes can be the size of product, weight of product
etc.
4. Define and identify the dimensional hierarchy if required.
Example: The product dimension can have following hierarchy. Product
SegmentProduct LineProduct Type Product FamilyProduct
SeriesProduct
Similarly Region dimension can have the following hierarchy. Super
RegionRegionCountryStateDistrictCityLocality.
The hierarchy identified for a dimension can be stored in two different ways
depending upon the SCHEMA followed for design:
1. All the hierarchical level stored in a single dimension table with an additional
column for each row indicating the level of the hierarchy. In this case the
dimension table is denormalized and is very large size-wise. This design is
followed in Star schema design.
2. Dimensions can be normalized and each level can be stored in separate
table. Each level will have separate Primary key and the level below will have
the primary keys of the level above them. Usually the entry to the fact table is
made using the primary key of the lowest level of dimensional hierarchy. This
design is followed in snowflake schema.
5. Identify the Fact from the Grain statement.
Example: Sales figures in terms of quantity and revenue, cost of the product can
be identified as Facts.
Note: Numeric values, which can be calculated using two or more existing facts
should not be taken as fact. For example, Sales Margin, which can be calculated
using Revenue-Cost, should not be considered as Fact in FACT table.
Also at time fact table can store a snapshot of the event, which can later be
counted to give a numeric figure of organization’s interest. For example a survey
fact table of an automobile company can have a field “Vehicle Owned” which can
have values “Y” or “N” depending upon the whether respondent owns a vehicle
presently or not.
This field does not have a numeric additive value but a count query on this can
tell how many numbers of respondents have vehicle currently. These are known
as FACTLESS Fact table.
6. Identify the Key and Non Key attributes for the FACT table.
Note: Each record of the FACT table represents the grain statement.
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 10 of 15
Example: Primary Key (key attribute) can be the combination of primary keys of
Product, Region, Sales Channel and Time dimension. The Non Key attributes are
the actual facts like Sales Quantity, Sales Revenue, Cost or a factless fact as
cited above.
7. Identify the AGGREGATES.
Example: If the data in Sales fact table is stored at the date level (lowest level) of
time dimension and if
1. The frequency of report generated on month level is more OR
2. The number of reports generated on month level is more.
A separate FACT table/View/Materialized View can be created to store the
aggregated fact at month level for better query performance.
3.8 Why DM for Designing Data Warehouse
There are many advantages of using DM for designing a data warehouse:
1. As DM is designed from the users’ own perspective of a process the DM
model is easy to understand
2. DM model is predictable and has standard framework
3. As the design is symmetric every dimension table is equivalent and can be
thought of as asymmetric entry point into the Fact table. This design can
withstand the unexpected change in user behavior
4. New data element like new Fact in the fact table completely new dimension
and their attributes can be accommodated in the design
The design enhances the query speed due to aggregates.
3.9 Designing DM for E-R Modeling
As both DM and E-R techniques are based on the relational theory, existing E-R
model can be referred to for designing data warehouse using DM. A single E-R
diagram breaks into multiple Fact tables diagrams.
Following steps should be followed to analyze an E-R data model as DM:
1. Identify the various related processes represented by the E-R diagram and
separate them into discrete business process
2. Identify many-to-many relationship in E-R model, which contains numeric and
additive values. These can be designated as FACT table
3. Denormalized the remaining tables having single part key. These can be
designated as Dimension table and can be connected directly to the fact
table. In case same dimension table is joint to more than one Fact table these
dimension are designated as Conform dimension
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 11 of 15
4 What is OLAP
Relational databases store data in a two-dimensional format: tables of data
represented by rows and columns. Multi-dimensional analysis solutions commonly
referred to as On-Line Analytical Processing (OLAP) solutions, offer an extension to
the relational model to provide a multi-dimensional view of the data. For example, in
multi-dimensional analysis, data entities such as products, geographies, time
periods, store locations, promotions and sales channels may all represent different
dimensions. Multi-dimensional solutions provide the ability to:
• Analyze potentially large amounts of data with very fast response times
• "Slice and dice" through the data, and drill down or roll up through various
dimensions as defined by the data structure
• Quickly identify trends or problem areas that would otherwise be missed
The basic advantage of OLAP systems is that they can be used to study different
scenarios by the questions "what if?". If we take the example of a manufacturing
organization, a sample question would be, "what if the organization sends a
brochure, who has the details of upcoming products of the company, to those
customers who have made a purchase of related products more than once in the
current month? How would that affect the company revenue?" This unique feature
makes OLAP a great decision making tool that could help determine the best courses
of action for the company's business. OLAP and data warehouses complement each
other. The data warehouse stores and manages the data, while OLAP converts the
stored data into useful information. OLAP techniques may range from simple
navigation and browsing of the data (often referred to as slicing and dicing), to more
detailed analyses, such as time-series and complex modeling.
4.1 OLAP Features
OLAP applications are found in a wide variety of functional areas of an organization.
However, no matter what functions are served by an OLAP application, it must
always have the following elements.
4.1.1 Multidimensional views
Business models are multidimensional in nature. Take the example of a computer
manufacturing organization: several dimensions can be identified for the business of
that company; time, location, product, and customer etc. For example, sales of
different items can differ in time from quarter to quarter, or from year to year. The
time dimension, therefore, has several levels within it. The location, or geography
dimension, can also have multiple levels such as city, state, country etc. Similarly
the product dimension can have several levels, such as categories (computers,
printers, etc.), and more refined levels (printer cartridge, printer paper, etc.)
This aspect of OLAP applications provides the foundation to slice and dice the data,
as well as providing flexible access to information hidden in the data warehouse.
Using OLAP applications, managers should be able to analyze data across any
dimension, at any level of aggregation, with equal functionality and ease.
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 12 of 15
The multidimensional data views are usually referred to as data cubes. In reality
data cubes can have as many dimensions as the business model allows.
4.1.2 Calculation-intensive
While most OLAP applications do simple data aggregation along a hierarchy, some
of them may conduct more complex calculations, such as percentage of totals, and
allocations that use the hierarchies from the top down. It is important that an OLAP
application is designed in a way that allows for such complex calculations. It is these
calculations that add great benefits to the ultimate solution.
Trend analysis is another example of complex calculations that can be carried out
with OLAP applications. Such analyses involve algebraic equation and complex
algorithms, such as moving averages and percentage growth.
OLTP (On-Line Transaction processing) systems are used to collect and manage
data, while OLAP systems are used to create information from the collected data that
may lead to new knowledge. It is the ability to conduct complex calculations by the
OLAP applications that allows for successful transfer of the raw data to information,
and later to knowledge.
4.1.3 Time Intelligence
Time is a universal dimension for almost all OLAP applications. It is very difficult to
find a business model where time is not considered an integral part. Time is used to
compare and judge performance of a business process. As an example, sale of a
particular product line this month may be compared to its sale last month. Or, the
profit of a company in the last quarter may be judged against its profit in the same
quarter last year.
The time dimension is not always used similar to other dimensions. For example, a
manager may ask about the sales totals for the first two months of the year, but is not
likely to ask about the sales of the first two computers in the product line. An OLAP
system should be built to easily allow for concepts like "year to date" and "period over
period comparisons" to be defined.
4.2 Benefits of OLAP
Although it is possible to build an OLAP system using the software designed for
transaction processing or data collection, it is certainly not a very efficient use of
developer time. By using software specially designed for OLAP, developers can
deliver applications to business users faster, providing better service that in turn
allows the developers to build more applications.
Another advantage of using OLAP systems is that if such systems are separate from
the On-Line Transaction Processing (OLTP) systems that feed the data warehouse,
OLTP systems performance will improve due to the reduced network traffic and
elimination of long and complex queries to the OLTP database.
In a nutshell, OLAP enables the organization as a whole to respond more quickly to
market demands. This is possible because it provides the ability to model real
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 13 of 15
business problems, make better decisions for the conduct of the organization, and
use people resources more efficiently. Market responsiveness, in turn, often yields
improved revenue and profitability.
4.3 Types of OLAP System (Data Storage Format)
The core of the OLAP concept is that it provides a mechanism to store the data in a
cube and subsequently retrieve it as and when needed. Cube data and aggregation
can be stored in a variety of ways. OLAP services addresses these issues by
implementing the following:
• No Storage is allocated for empty cells (Cell is the smallest component or unit of
data in a cube)
• Data compression is applied to stored aggregations
• OLAP services provides a complex algorithm for optimizing storage and
performance needs
• Several storage modes are available providing the developer with flexibility of
where to keep the data and how to manage it
The OLAP manager provides three different ways to store the data in a cube:
• Multidimensional OLAP (MOLAP)
• Relational OLAP (ROLAP)
• Hybrid OLAP (HOLAP)
Each of these options provides certain benefits, depending upon the size of
underlying database system and how the data will be used.
4.3.1 Multidimensional OLAP (MOLAP)
Multidimensional OLAP or MOLAP is a high performance, multidimensional data
storage format. With MOLAP, data is stored on the OLAP server. MOLAP gives the
best query performance, because it is specifically optimized for multidimensional data
queries. Performance gains stem from the fact that the fact tables are compressed
with this option and bitmap indexing is used for them.
The MOLAP option stores the cube data and aggregates in multidimensional
structures. MOLAP storage is appropriate for small to medium-sized data sets where
copying all of the data to the multidimensional format would not require significant
loading time or utilize large amounts of disk space.
4.3.2 Relational OLAP (ROLAP)
Relational OLAP storage keeps the data that feeds the cubes in the original relational
tables. A separate set of relational tables is used to store and reference aggregation
data in this OLTP system. These tables are not downloaded to the OLAP server.
The tables that hold the aggregations of the data are called materialized views.
These tables store data aggregations as defined by the dimensions when the cube is
created.
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 14 of 15
With this option, aggregation tables have fields for each dimension and measure.
Each dimension column is indexed. A composite index is also created for all of the
dimension fields. Due to its nature ROLAP is ideal for large databases or legacy
data that is infrequently queried. The only drawback to these systems is that
generating reports from them or processing the cube data may affect users of the
operational database reducing the performance of their transaction processing.
4.3.3 Hybrid OLAP (HOLAP)
The OLAP server also supports a combination of MOLAP and ROLAP. This
combination is referred to as HOLAP, the original data is kept in its relational
database tables similar to ROLAP. Aggregations of the data are performed and
stored in a multidimensional format. An advantage of this system is that HOLAP
provides connectivity to large data sets in relational tables, while taking advantage of
the faster performance of the multidimensional aggregation storage. A disadvantage
of this option is that the amount of processing between the MOLAP and ROLAP
systems may affect its efficiency.
5 Abbreviations Used
Terms Description
DW Data Warehouse
DM Dimensional Modeling
E-R Model Entity Relation Model
OLTP On-Line Transaction Processing
OLAP On-Line Analytical Processing
6 References
Following are good references for learning the basic concepts of data warehousing
and dimensional modeling. The same has also been used as a primary reference for
the preparation of this document.
1. The Data Warehouse Life Cycle Tool Kit By Ralph Kimball
2. The Data Warehouse Tool Kit By Ralph Kimball
3. Microsoft OLAP Solutions
data-warehouse-dimensional-
modeling-and-design-
150225083141-conversion-
gate01.doc
Page 15 of 15

Contenu connexe

Tendances

Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
gulab sharma
 
Warehouse components
Warehouse componentsWarehouse components
Warehouse components
ganblues
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
Shanthi Mukkavilli
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
pcherukumalla
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Mining
cpjcollege
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
butest
 
04 Dimensional Analysis - v6
04 Dimensional Analysis - v604 Dimensional Analysis - v6
04 Dimensional Analysis - v6
Prithwis Mukerjee
 

Tendances (20)

Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consulting
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report
 
Warehouse components
Warehouse componentsWarehouse components
Warehouse components
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with Example
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousing
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its concepts
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Mining
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
 
11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
04 Dimensional Analysis - v6
04 Dimensional Analysis - v604 Dimensional Analysis - v6
04 Dimensional Analysis - v6
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli
 

En vedette

Difference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional ModelingDifference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional Modeling
Abdul Aslam
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
vivekjv
 

En vedette (15)

Difference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional ModelingDifference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional Modeling
 
Business process modeling and analysis for data warehouse design
Business process modeling and analysis for data warehouse designBusiness process modeling and analysis for data warehouse design
Business process modeling and analysis for data warehouse design
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional Modeling
 
Web analytics 101: Web Metrics
Web analytics 101: Web MetricsWeb analytics 101: Web Metrics
Web analytics 101: Web Metrics
 
Web Metrics vs Web Behavioral Analytics and Why You Need to Know the Difference
Web Metrics vs Web Behavioral Analytics and Why You Need to Know the DifferenceWeb Metrics vs Web Behavioral Analytics and Why You Need to Know the Difference
Web Metrics vs Web Behavioral Analytics and Why You Need to Know the Difference
 
World-Class Web Metrics by Dan Olsen
World-Class Web Metrics by Dan OlsenWorld-Class Web Metrics by Dan Olsen
World-Class Web Metrics by Dan Olsen
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Business Metrics and Web Marketing
Business Metrics and Web MarketingBusiness Metrics and Web Marketing
Business Metrics and Web Marketing
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
Data Visualization and Dashboard Design
Data Visualization and Dashboard DesignData Visualization and Dashboard Design
Data Visualization and Dashboard Design
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 
Multi dimensional model vs (1)
Multi dimensional model vs (1)Multi dimensional model vs (1)
Multi dimensional model vs (1)
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 

Similaire à Data warehouse-dimensional-modeling-and-design

Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paper
july12jana
 
Data warehouse
Data warehouseData warehouse
Data warehouse
MR Z
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
obieefans
 

Similaire à Data warehouse-dimensional-modeling-and-design (20)

Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Unit 1
Unit 1Unit 1
Unit 1
 
Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paper
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdf
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
White Paper - How Data Works
White Paper - How Data WorksWhite Paper - How Data Works
White Paper - How Data Works
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 

Dernier

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 

Dernier (20)

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 

Data warehouse-dimensional-modeling-and-design

  • 1. Data Warehouse - Dimensional Modeling and Design data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Ver Rev. 1.0 i
  • 2. Table of Contents 1 Introduction............................................................................................................1 2 What is Data Warehousing....................................................................................1 2.1 DATA WAREHOUSE TERMINOLOGY.........................................................................2 2.1.1 Data Mart....................................................................................................2 2.1.2 Metadata.....................................................................................................3 2.1.3 Cube...........................................................................................................3 2.1.4 Data Cleansing............................................................................................4 2.1.5 Extraction Transformation and loading (ETL)..............................................4 2.1.6 Data Mining.................................................................................................5 2.2 ISSUES IN BUILDING A DATA WAREHOUSE................................................................5 2.3 BENEFITS OF DATA WAREHOUSING........................................................................6 3 Data Warehouse Modeling Techniques...............................................................6 3.1 WHAT IS ENTITY RELATIONSHIP (E-R) MODELING..................................................6 3.2 E-R MODELING APPROACH....................................................................................7 3.2.1 Entities........................................................................................................7 3.2.2 Attributes.....................................................................................................7 3.2.3 Relationships...............................................................................................7 3.2.4 Normalization..............................................................................................7 3.3 LIMITATIONS OF E-R MODELING FOR DW DESIGNING.............................................7 3.4 DIMENSIONAL MODELING APPROACH FOR DW DESIGNING......................................8 3.5 WHAT IS DIMENSIONAL MODELING (DM)................................................................8 3.6 DM TERMINOLOGY................................................................................................8 3.6.1 Grain...........................................................................................................8 3.6.2 Fact Table...................................................................................................8 3.6.3 Dimension Table.........................................................................................9 3.6.4 Star Schema...............................................................................................9 3.6.5 Snow Flake Schema...................................................................................9 3.7 APPROACH FOR DIMENSIONAL MODELING..............................................................9 3.8 WHY DM FOR DESIGNING DATA WAREHOUSE......................................................11 3.9 DESIGNING DM FOR E-R MODELING....................................................................11 4 What is OLAP.......................................................................................................12 4.1 OLAP FEATURES................................................................................................12 4.1.1 Multidimensional views..............................................................................12 4.1.2 Calculation-intensive.................................................................................13 4.1.3 Time Intelligence.......................................................................................13 4.2 BENEFITS OF OLAP............................................................................................13 4.3 TYPES OF OLAP SYSTEM (DATA STORAGE FORMAT)...........................................14 4.3.1 Multidimensional OLAP (MOLAP).............................................................14 4.3.2 Relational OLAP (ROLAP)........................................................................14 4.3.3 Hybrid OLAP (HOLAP)..............................................................................15 5 Abbreviations Used.............................................................................................15 6 References...........................................................................................................15 data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Ver Rev. 1.0 ii
  • 3. 1 Introduction For most organizations managing and using the enterprise-wide information to make effective business decisions is one of the most challenging tasks in today’s ever changing business environment. Business data is the key asset for any organization and these organizations have started to realize the need to make best use of this data for making timely business decisions. In a typical business environment thousands of transactions take place every day and collecting this information in such a way that can help business users to analyze and present it in a effective manner is a crucial requirement for organization’s growth. One of the most dramatic turnaround in database design, the dimensional Data Warehouse is a powerful database model and a prominent doorway, that enhances the capability of a business user to analyze huge, multidimensional sets of data. Data Warehouse technology has emerged as an increasingly popular and powerful concept of applying information technology to turn huge islands of data into meaning information for quick and effective business decisions. The Data Warehouse technology is said to have embedded business intelligence in it and can be termed as a practical approach for decision support systems. 2 What is Data Warehousing A data warehouse is an integrated data store that contains the data residing in organizations desperate data sources in a globally accepted way, which has standard encoding structure and storage format. These data sources could be on heterogeneous platform and have some information that may not be relevant at all to be included in the data warehouse, as it doesn’t help the business users in analyzing the business trends and/or making forecasting decisions. A data warehouse is a copy of transactional data specifically structured for querying and reporting. The form of the stored data has nothing to do with whether something is a data warehouse. A data warehouse can be normalized or denormalized. It can be a relational database, multidimensional database, flat file, hierarchical database, object database, etc. As compared to transactional system, a key advantage of using a data warehouse apart from ease of data analysis is that it dramatically increases the query response time. This important feature of the data warehouse is possible because it uses an entirely different modeling approach for designing, known as dimensional modeling. Other distinguishable features of a data warehouse are as follows: • Contains historical data The primary purpose of having a data warehouse in place is to analyze the business trends. Therefore the data warehouse is designed to contain historical data as compared to 3-6 month old data in OLTP systems. This data could be as long as 4- 10 years depending upon the needs of your business organization. After this period data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 1 of 15
  • 4. is over, the data is purged from the data warehouse and the backup of the same could be taken on stable media’s like magnetic tapes etc. for the purpose of future references. • No frequent updates Since most of the time, the data warehouse information is used for reporting purposes, the design approach should be in such a manner, which doesn’t support frequent updates. Some times, there may be a business requirement where the modeler may require few updates. • Independent of all transactional data sources To relieve the different transactional processing systems from the burden of frequent querying and reporting operations, the data warehouse server should be organized as an independent system, which should provide quick query response and hence increase productivity. • Subject-oriented All the related data about a business process is stored and kept as a single set in a meaningful and useful format, e.g. Sales, finance, marketing, customer etc. • Non-volatile Non-volatile means the data is loaded in the data warehouse on a scheduled basis and henceforth accessed from the warehouse itself. The schedule time depends upon the specific business requirements and could be daily, weekly, monthly etc. It is to be noticed that data warehouse is not a software package, which can be produced by a software vendor and purchased in the market. It is a set of software, hardware and tools that has to be customized according to the needs of a particular business requirement. Data warehousing is not just data in the data warehouse, but also the architecture and tools to collect, query, analyze and present the information to help business users make strategic business decisions, increase productivity and keep a edge in the global competitive business environment. 2.1 Data Warehouse Terminology Before taking an insight into the data warehouse world, it is very handy to have a look and understand the following basic terms, which are frequently used in the data warehousing literature. 2.1.1 Data Mart Data marts are subsets or departmental warehouses, which contain the data about a specific subject out of the whole set of subject oriented data warehouse. The concept of data mart is similar to the concept of views in relational databases. They are designed to meet the needs of a specific set of users of an organization. Examples of data mart could be campaign management, marketing, finance, sales etc. data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 2 of 15
  • 5. There are two approaches to building a data warehouse and data mart. One is the top down approach where the first step is to create the data warehouse and then make the small sized data marts from them depending upon the needs of concerned business organization. Second approach involves the design of a set of data marts, simulating organization’s various processes, like sales, production etc. and then building the data warehouse from the set of data marts. This approach is generally termed as bottom up approach of building a data warehouse. Data marts are less expensive and take less time for implementation with quick ROI. They are scalable to full data warehouses and at times are summarized subsets of more detailed, pre-existing data warehouses. 2.1.2 Metadata Metadata describes the data that is contained in data warehouse. This includes the data elements and the business logic. Apart from this basic definition, metadata also contains the definition of extraction and transformation logic, required to extract the data from disparate data sources and put it into the data warehouse. Metadata can be equated to a data warehouse’s information map or directory, dictionary, or card catalog. 2.1.3 Cube Cubes are the fundamental unit for data storage and retrieval in an OLAP system. Business measures can be stored for various dimensions in a cube. The concept of cube is similar to the concept of table in relational database systems. The dimensions of a cube are the perspectives from which the data can be viewed and analyzed. It’s easy to visualize the two dimensions of a relational system-rows and columns. But usually a cube has more than two dimensions. For example Microsoft OLAP cube can have up to 64 dimensions. OLAP cubes provide the capability to the end users to perform slicing and dicing operations. Slicing a cube indicates that the end user can keep on extracting the information for a particular dimension up to any level he wants, keeping all other dimensions constant. For example a business user can see the information at product segment, product series, product family or product level keeping all other dimensions such as time, location etc. fixed. On the other hand dicing of cubes indicates that while extracting the information for a particular dimension, the business user can switch to other dimensions, and hence can make his business data analysis even more effective. Similar to the concept of views in relational database systems, there is a methodology by which the data warehouse designer can extend the cubes and combine their definitions to form a logical entity, which are already defined in an OLAP system. These logical units are termed as virtual cubes. Thus the definition of a virtual cube consists of the combination of definitions of at least one cube in an OLAP system. They are derived from one or more other cubes. data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 3 of 15
  • 6. 2.1.4 Data Cleansing A crucial factor in the success of a data warehousing project is the accuracy of customer information. Therefore, data cleansing process plays a significant role in such knowledge management projects as it improves the quality of the customer data. In a typical data cleansing tool, the various data cleansing operations can be obtained using following processes: • Formatting and enhancements • Address verification • Name parsing • Duplication customer record check Formatting is usually the first step in data cleansing. This process removes the junk characters and formats the data for subsequent operations. Data cleansing tools contain large databases for various countries to store the information about their geographical hierarchy. Verification in data cleansing ensures data integrity. Parsing is the process in data cleansing which enables rearrangement of customer information into a preferred and consistent format. Matching process identifies duplicate records, thereby ensuring availability of unique customer information. 2.1.5 Extraction Transformation and loading (ETL) Extraction is the process of pulling the data from disparate data sources or legacy systems and putting it into an intermediate stage before loading it into the data warehouse. This intermediate stage is sometimes called as staging area. Staging area generally reflects the whole enterprise data as a consolidated unit, which is dispersed in heterogeneous data sources. The difference is that the data in different data sources could be in various storage formats or encoding structures, but staging area is a database, which is a replica of different data sources and unites dispersed data in a uniform format, so that cleansing and transformation processes can be carried out smoothly. Transformation is the next step in the process of migrating the data from different data sources to big data warehouse. It includes converting the data into a format and presenting it in such a manner, which facilitates the easy understanding of data and enhances the business user’s capability to carry out the business data analysis and hence supports the process of quick and effective corporate decision making. It is a good practice to put the transformed data into a second intermediate stage and not onto actual data warehouse server to relieve it from the burden of handling and managing large and complex data transformation processes. This second intermediate stage is sometimes termed as operational data store (ODS). This leaves the data warehouse server concentrate only to handle large and complex queries without any performance hit. data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 4 of 15
  • 7. Loading is the final process of migrating the data from data source to data warehouse. Once the transformed data has been captured into operational data store, the final task is to upload it into data warehouse server by converting the data into a format, which compiles into the design methodology of the data warehouse, which is known as dimensional data modeling. It is this design methodology, which embodies the business intelligence into data warehouse and hence prepares a full-fledged decision support system, which is then ready to be used by various reporting tools to generate the business information for making corporate wide effective business decisions. 2.1.6 Data Mining Data mining predicts the future trends and behaviors, allowing businesses to make proactive, knowledge driven decisions. Data mining is a mechanism, which uses intelligent algorithms to discover patterns, clusters and models from data. These patterns and hypotheses are then rendered in operational forms that are easy for users to visualize and understand. It is about searching for patterns in data that are relevant to the business. One way of looking at data mining is that input data goes in, and patterns come out. The patterns may be descriptions of interesting features in the data, or they may be predictions about the future. Data mining is the process of analyzing business data in the data warehouse to find unknown patterns or rules of information that can be used to tailor business operations. For instance, data mining can find patterns in data to answer questions like: • What item purchased in a given transaction triggers the purchase of additional related items? • How do purchasing patterns change with store location? • Did the same customer purchase related items at another time? 2.2 Issues in building a data warehouse Before taking a data warehousing project into hand, it’s very essential to analyze the risk factors involved into it. Studies have shown that 10 to 90 percent of data warehousing projects end with scrap baskets. Some potential issues that could evolve during the entire life cycle of a data warehousing project are discussed below. • Data warehousing systems can complicate business processes significantly. Data warehousing, if unchecked, can foster the "institutionalization" of easily created reports whose reason for being quickly is forgotten while people still toil to process these reports. If the organization does not know how to throw out processes, data warehousing can quickly add clutter to the business environment. • Data warehousing can have a learning curve that may be too long. data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 5 of 15
  • 8. Currently IT industry has shortage on data warehousing experts. A lot of research work is going on the various aspects of data warehousing architecture. One of them is data mining. There are limited numbers of people available who have worked with the full data warehousing system project "life cycle". Despite the speed of the data warehousing development effort, it takes time for an organization to figure how it can change its business practices to get a substantial return on its data warehousing investment. 2.3 Benefits of Data Warehousing A well designed and implemented data warehouse can be used to: • Understand business trends and make better forecasting decisions • Bring better products to market in a more timely manner • Analyze daily sales information and make quick decisions that can significantly affect company's performance Data warehousing can be a key differentiator in many different industries. At present, some of the most popular data warehouse applications include: • Sales and marketing analysis across all industries • Inventory turn and product tracking in manufacturing • Category management, vendor analysis, and marketing program effectiveness analysis in retail 3 Data Warehouse Modeling Techniques The most efficient way to build an effective data warehouse is to use a dimensional model to design it. Dimensional modeling is a database design methodology that is used to design data warehouses. Dimensional modeling for a data warehouse is similar to entity relationship modeling (E-R modeling) for an RDBMS. Both modeling techniques provide a methodology that facilitates the creation of effective well- designed databases. They differ, however, in their approach to business questions and design goals. An entity relationship model focuses on data integrity and efficiency of data entry so that you can enter each piece of data, such as customer address, only once. In contrast, a dimensional model focuses on business processes and business questions. This section first discusses the E-R modeling technique concisely and then discusses why this approach is not suitable for designing a data warehouse. 3.1 What is Entity Relationship (E-R) Modeling Entity Relation (E-R) model is developed to answers the following issues of conventional Data Base Management System (DBMS). (i) Redundancy of data (ii) Lack of integration and (iii) Lack of flexibility, This modeling is based on the relational theory and abides by the 13 rules proposed by E.F. Codd that a DBMS implementation must follow to be qualified as truly data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 6 of 15
  • 9. relational. The data in E-R model is presented in a simple form of two-dimensional tables. 3.2 E-R Modeling Approach In E-R model the database is designed using: 3.2.1 Entities These are the real life objects, which are being modeled. Entities are capable of independent existence and can be uniquely identified. 3.2.2 Attributes These are the properties/characteristics of the entities, which uniquely identify a given entity in the real world. 3.2.3 Relationships This represents the association among the entities and the way in which entities interacts. 3.2.4 Normalization Normalization is a process of decomposing the tables to prevent redundancy, insert & update anomalies. The design approach consists of following steps: • Identify the Entities about which descriptive information is to be stored • Identify the Attributes (i.e. the properties) of the entities, which meaningfully describe the entities • Define the relationship between entities. The functionality of the relationship can be one-to-one (1:1), one-to-many (1:N) or many-to-many (N:M) • The E-R model is then transformed to 1st , 2nd , 3rd or BCNF normal form 3.3 Limitations of E-R modeling for DW Designing E-R modeling is a powerful technique to design and develop OLTP systems in relational environment. However E-R model defeats the basic needs of data ware housing, namely prompt and high-performance retrieval of data. E-R model is characterized by: • Highly normalized designed to prevent redundancy of data • Frequent insertion and updated to support OLTP (On Line Transactional Processing) concept • Less indexing to support frequent inserts and updates • As the design is supportive to OLTP system the data storage life is small. Also as the insertion rate is high the amount of data is huge. So to keep the data volume within manageable limits the data storage life is small data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 7 of 15
  • 10. Data warehouse on the contrary has following features: • Fast processing speed of queries • Very minimal inserts and updates to support OLAP (On Line Analysis Processing) concept • The DW database uses indexing to make query processing faster • The data stored is historical in nature as trend analysis is the main idea for designing DW Because of the above contradictions in the features, E-R model modeling approach is not preferred for the design of data warehouse. 3.4 Dimensional Modeling Approach for DW Designing As the features of E-R model don't support the design of DW, there is always a need for a model to support the OLAP (On Line Analytical Processing) systems, the concept on which the data warehouse design is based. The modeling that can be used for the DW design: • Should have a mechanism that allows low updates and should use indexing for faster query processing • Should store historical data for proper analysis of trends • Should have less number of joints between tables for query performance • Should create analysis-oriented database, which can be queried These features are met by Dimensional Modeling technique, which is accepted widely for the design of Data warehouse. 3.5 What is Dimensional Modeling (DM) Dimensional Modeling is the favorite modeling technique for data warehouse designing. It is a design technique, which represents the data in a standard framework that supports prompt and high performance access of data. Like E-R model, DM is also based on the concept of relational modeling with some restrictions. The DM consists of a Fact table, having multiple parts Primary key, and several Dimension tables having singled part Primary Key. 3.6 DM Terminology 3.6.1 Grain Each record in a fact table is known as a grain. 3.6.2 Fact Table The fact table is the central table in a star schema diagram (described in the next section). It can consist of millions of rows and contains the additive or factual data about a business that can help answer the business questions better. It brings together data that would reside in multiple tables throughout the database in traditional relational databases. data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 8 of 15
  • 11. Fact table consists of two parts: • Key Attributes: These are the multiple part Primary keys and are usually Foreign Keys from Dimension tables • Non-Key Attributes: These are the most useful attributes (Facts) in the fact table and have numeric and additive values. 3.6.3 Dimension Table Dimension tables are the secondary tables in a dimensional model. Dimension tables have fewer rows than fact tables and contain descriptive information about the business such as customers or product information. These tables enable the business user to quickly drill down from the fact table to additional information in key business areas. Fact tables are the entry points in the data warehouse. Like Fact tables these tables also have two parts: • Key Attributes: These are the single part Primary key and usually surrogate keys are used for this purpose. • Non-Key Attributes: These are text like attributes and describes the nature and characteristics of the dimension. 3.6.4 Star Schema Star schema for data warehouse design consists of a central Fact table with detail and summary data and only one foreign key from each dimension table. The dimension table is highly denormalized and large in size. Dimension tables surround the central fact table. Scope of using star schema for data warehouse designing is summarized below: • Advantages: This schema is easy to understand and has less number of physical joins • Disadvantages: Fat dimensional tables are difficult to maintain. 3.6.5 Snow Flake Schema Snowflake schema for data warehouse design consists of dimension tables, which are highly normalized by decomposing them into various hierarchical levels. Each dimension table has one primary key at every level of hierarchy. The most granular level of dimensional hierarchy is the entry point to the fact table. • Advantage: Best performer when query involves aggregates. • Disadvantage: Number of dimensional tables is large. 3.7 Approach for Dimensional Modeling Following steps are followed for designing a data warehouse using DM: data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 9 of 15
  • 12. 1. Get the grain statement for the process to be modeled. This can be specific to the users’ requirement. Example: What is the sale of a particular product in a given region from a specified sales channel in a specified time period? 2. Identify the Dimensions from the grain statement. Example: Product, Region, Sales Channel, and Time Period. 3. Identify the Key and non-key attributes of the Dimensions. Example: For the product dimension Product Key can be the Primary Key (Key attribute). The non-key attributes can be the size of product, weight of product etc. 4. Define and identify the dimensional hierarchy if required. Example: The product dimension can have following hierarchy. Product SegmentProduct LineProduct Type Product FamilyProduct SeriesProduct Similarly Region dimension can have the following hierarchy. Super RegionRegionCountryStateDistrictCityLocality. The hierarchy identified for a dimension can be stored in two different ways depending upon the SCHEMA followed for design: 1. All the hierarchical level stored in a single dimension table with an additional column for each row indicating the level of the hierarchy. In this case the dimension table is denormalized and is very large size-wise. This design is followed in Star schema design. 2. Dimensions can be normalized and each level can be stored in separate table. Each level will have separate Primary key and the level below will have the primary keys of the level above them. Usually the entry to the fact table is made using the primary key of the lowest level of dimensional hierarchy. This design is followed in snowflake schema. 5. Identify the Fact from the Grain statement. Example: Sales figures in terms of quantity and revenue, cost of the product can be identified as Facts. Note: Numeric values, which can be calculated using two or more existing facts should not be taken as fact. For example, Sales Margin, which can be calculated using Revenue-Cost, should not be considered as Fact in FACT table. Also at time fact table can store a snapshot of the event, which can later be counted to give a numeric figure of organization’s interest. For example a survey fact table of an automobile company can have a field “Vehicle Owned” which can have values “Y” or “N” depending upon the whether respondent owns a vehicle presently or not. This field does not have a numeric additive value but a count query on this can tell how many numbers of respondents have vehicle currently. These are known as FACTLESS Fact table. 6. Identify the Key and Non Key attributes for the FACT table. Note: Each record of the FACT table represents the grain statement. data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 10 of 15
  • 13. Example: Primary Key (key attribute) can be the combination of primary keys of Product, Region, Sales Channel and Time dimension. The Non Key attributes are the actual facts like Sales Quantity, Sales Revenue, Cost or a factless fact as cited above. 7. Identify the AGGREGATES. Example: If the data in Sales fact table is stored at the date level (lowest level) of time dimension and if 1. The frequency of report generated on month level is more OR 2. The number of reports generated on month level is more. A separate FACT table/View/Materialized View can be created to store the aggregated fact at month level for better query performance. 3.8 Why DM for Designing Data Warehouse There are many advantages of using DM for designing a data warehouse: 1. As DM is designed from the users’ own perspective of a process the DM model is easy to understand 2. DM model is predictable and has standard framework 3. As the design is symmetric every dimension table is equivalent and can be thought of as asymmetric entry point into the Fact table. This design can withstand the unexpected change in user behavior 4. New data element like new Fact in the fact table completely new dimension and their attributes can be accommodated in the design The design enhances the query speed due to aggregates. 3.9 Designing DM for E-R Modeling As both DM and E-R techniques are based on the relational theory, existing E-R model can be referred to for designing data warehouse using DM. A single E-R diagram breaks into multiple Fact tables diagrams. Following steps should be followed to analyze an E-R data model as DM: 1. Identify the various related processes represented by the E-R diagram and separate them into discrete business process 2. Identify many-to-many relationship in E-R model, which contains numeric and additive values. These can be designated as FACT table 3. Denormalized the remaining tables having single part key. These can be designated as Dimension table and can be connected directly to the fact table. In case same dimension table is joint to more than one Fact table these dimension are designated as Conform dimension data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 11 of 15
  • 14. 4 What is OLAP Relational databases store data in a two-dimensional format: tables of data represented by rows and columns. Multi-dimensional analysis solutions commonly referred to as On-Line Analytical Processing (OLAP) solutions, offer an extension to the relational model to provide a multi-dimensional view of the data. For example, in multi-dimensional analysis, data entities such as products, geographies, time periods, store locations, promotions and sales channels may all represent different dimensions. Multi-dimensional solutions provide the ability to: • Analyze potentially large amounts of data with very fast response times • "Slice and dice" through the data, and drill down or roll up through various dimensions as defined by the data structure • Quickly identify trends or problem areas that would otherwise be missed The basic advantage of OLAP systems is that they can be used to study different scenarios by the questions "what if?". If we take the example of a manufacturing organization, a sample question would be, "what if the organization sends a brochure, who has the details of upcoming products of the company, to those customers who have made a purchase of related products more than once in the current month? How would that affect the company revenue?" This unique feature makes OLAP a great decision making tool that could help determine the best courses of action for the company's business. OLAP and data warehouses complement each other. The data warehouse stores and manages the data, while OLAP converts the stored data into useful information. OLAP techniques may range from simple navigation and browsing of the data (often referred to as slicing and dicing), to more detailed analyses, such as time-series and complex modeling. 4.1 OLAP Features OLAP applications are found in a wide variety of functional areas of an organization. However, no matter what functions are served by an OLAP application, it must always have the following elements. 4.1.1 Multidimensional views Business models are multidimensional in nature. Take the example of a computer manufacturing organization: several dimensions can be identified for the business of that company; time, location, product, and customer etc. For example, sales of different items can differ in time from quarter to quarter, or from year to year. The time dimension, therefore, has several levels within it. The location, or geography dimension, can also have multiple levels such as city, state, country etc. Similarly the product dimension can have several levels, such as categories (computers, printers, etc.), and more refined levels (printer cartridge, printer paper, etc.) This aspect of OLAP applications provides the foundation to slice and dice the data, as well as providing flexible access to information hidden in the data warehouse. Using OLAP applications, managers should be able to analyze data across any dimension, at any level of aggregation, with equal functionality and ease. data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 12 of 15
  • 15. The multidimensional data views are usually referred to as data cubes. In reality data cubes can have as many dimensions as the business model allows. 4.1.2 Calculation-intensive While most OLAP applications do simple data aggregation along a hierarchy, some of them may conduct more complex calculations, such as percentage of totals, and allocations that use the hierarchies from the top down. It is important that an OLAP application is designed in a way that allows for such complex calculations. It is these calculations that add great benefits to the ultimate solution. Trend analysis is another example of complex calculations that can be carried out with OLAP applications. Such analyses involve algebraic equation and complex algorithms, such as moving averages and percentage growth. OLTP (On-Line Transaction processing) systems are used to collect and manage data, while OLAP systems are used to create information from the collected data that may lead to new knowledge. It is the ability to conduct complex calculations by the OLAP applications that allows for successful transfer of the raw data to information, and later to knowledge. 4.1.3 Time Intelligence Time is a universal dimension for almost all OLAP applications. It is very difficult to find a business model where time is not considered an integral part. Time is used to compare and judge performance of a business process. As an example, sale of a particular product line this month may be compared to its sale last month. Or, the profit of a company in the last quarter may be judged against its profit in the same quarter last year. The time dimension is not always used similar to other dimensions. For example, a manager may ask about the sales totals for the first two months of the year, but is not likely to ask about the sales of the first two computers in the product line. An OLAP system should be built to easily allow for concepts like "year to date" and "period over period comparisons" to be defined. 4.2 Benefits of OLAP Although it is possible to build an OLAP system using the software designed for transaction processing or data collection, it is certainly not a very efficient use of developer time. By using software specially designed for OLAP, developers can deliver applications to business users faster, providing better service that in turn allows the developers to build more applications. Another advantage of using OLAP systems is that if such systems are separate from the On-Line Transaction Processing (OLTP) systems that feed the data warehouse, OLTP systems performance will improve due to the reduced network traffic and elimination of long and complex queries to the OLTP database. In a nutshell, OLAP enables the organization as a whole to respond more quickly to market demands. This is possible because it provides the ability to model real data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 13 of 15
  • 16. business problems, make better decisions for the conduct of the organization, and use people resources more efficiently. Market responsiveness, in turn, often yields improved revenue and profitability. 4.3 Types of OLAP System (Data Storage Format) The core of the OLAP concept is that it provides a mechanism to store the data in a cube and subsequently retrieve it as and when needed. Cube data and aggregation can be stored in a variety of ways. OLAP services addresses these issues by implementing the following: • No Storage is allocated for empty cells (Cell is the smallest component or unit of data in a cube) • Data compression is applied to stored aggregations • OLAP services provides a complex algorithm for optimizing storage and performance needs • Several storage modes are available providing the developer with flexibility of where to keep the data and how to manage it The OLAP manager provides three different ways to store the data in a cube: • Multidimensional OLAP (MOLAP) • Relational OLAP (ROLAP) • Hybrid OLAP (HOLAP) Each of these options provides certain benefits, depending upon the size of underlying database system and how the data will be used. 4.3.1 Multidimensional OLAP (MOLAP) Multidimensional OLAP or MOLAP is a high performance, multidimensional data storage format. With MOLAP, data is stored on the OLAP server. MOLAP gives the best query performance, because it is specifically optimized for multidimensional data queries. Performance gains stem from the fact that the fact tables are compressed with this option and bitmap indexing is used for them. The MOLAP option stores the cube data and aggregates in multidimensional structures. MOLAP storage is appropriate for small to medium-sized data sets where copying all of the data to the multidimensional format would not require significant loading time or utilize large amounts of disk space. 4.3.2 Relational OLAP (ROLAP) Relational OLAP storage keeps the data that feeds the cubes in the original relational tables. A separate set of relational tables is used to store and reference aggregation data in this OLTP system. These tables are not downloaded to the OLAP server. The tables that hold the aggregations of the data are called materialized views. These tables store data aggregations as defined by the dimensions when the cube is created. data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 14 of 15
  • 17. With this option, aggregation tables have fields for each dimension and measure. Each dimension column is indexed. A composite index is also created for all of the dimension fields. Due to its nature ROLAP is ideal for large databases or legacy data that is infrequently queried. The only drawback to these systems is that generating reports from them or processing the cube data may affect users of the operational database reducing the performance of their transaction processing. 4.3.3 Hybrid OLAP (HOLAP) The OLAP server also supports a combination of MOLAP and ROLAP. This combination is referred to as HOLAP, the original data is kept in its relational database tables similar to ROLAP. Aggregations of the data are performed and stored in a multidimensional format. An advantage of this system is that HOLAP provides connectivity to large data sets in relational tables, while taking advantage of the faster performance of the multidimensional aggregation storage. A disadvantage of this option is that the amount of processing between the MOLAP and ROLAP systems may affect its efficiency. 5 Abbreviations Used Terms Description DW Data Warehouse DM Dimensional Modeling E-R Model Entity Relation Model OLTP On-Line Transaction Processing OLAP On-Line Analytical Processing 6 References Following are good references for learning the basic concepts of data warehousing and dimensional modeling. The same has also been used as a primary reference for the preparation of this document. 1. The Data Warehouse Life Cycle Tool Kit By Ralph Kimball 2. The Data Warehouse Tool Kit By Ralph Kimball 3. Microsoft OLAP Solutions data-warehouse-dimensional- modeling-and-design- 150225083141-conversion- gate01.doc Page 15 of 15