SlideShare une entreprise Scribd logo
1  sur  50
Introduction to Data Warehousing
Data Warehouse





Maintain historic data
Analysis to get better understanding of business
Better Decision making
Definition: A data warehouse is a
 subject-oriented
 integrated

 time-varying
 non-volatile
collection of data that is used primarily in organizational
decision making.
-- Bill Inmon, Building the Data Warehouse
1996
Subject Oriented
•
•
•

Data warehouse is organized around subjects such as
sales, product, customer.
It focuses on modeling and analysis of data for decision
makers.
Excludes data not useful in decision support process.
Integrated
•
•

Data Warehouse is constructed by integrating multiple
heterogeneous sources.
Data Preprocessing are applied to ensure consistency.

RDBMS
Data Processing
Data Transformation
Legacy
System

Flat File

Data
Warehouse
Data Processing
Data Transformation
Non-volatile
•

Mostly, data once recorded will not be updated.
• Data warehouse requires two operations in data accessing
- Incremental loading of data
- Access of data

load

access
Time Variant
•
•

Provides information from historical perspective e.g. past 510 years
Every key structure contains either implicitly or explicitly an
element of time
Why Data Warehouse?
Problem Statement:
• ABC Pvt Ltd is a company with branches at Mumbai, Delhi,
Chennai and Bangalore.
• The Sales Manager wants quarterly sales report across the
branches.
• Each branch has a separate operational system where sales
transactions are recorded.
Why Data Warehouse?
Mumbai

Delhi

Get quarterly sales figure
for each branch
and manually calculate
sales figure across branches.

Sales
Manager

Chennai

Banglore

What if he need daily sales report across the branches?
Why Data Warehouse?
•
•

Solution:
Extract sales information from each database.
Store the information in a common repository at a single
site.
Why Data Warehouse?
Mumbai

Delhi
Data
Warehouse
Chennai

Banglore

Query &
Analysis tools

Sales
Manager
Characteristics of Data Warehouse









Relational / Multidimensional database
Query and Analysis rather than transaction
Historical data from transactions
Consolidates Multiple data sources
Separates query load from transactions
Mostly non volatile
Large amount of data in order of TBs
When we say large - we mean it!
• Terabytes -- 10^12 bytes:

Yahoo! – 300 Terabytes and
growing

• Petabytes -- 10^15 bytes:
Geographic Information Systems
• Exabytes -- 10^18 bytes:
National Medical Records

• Zettabytes -- 10^21 bytes:
Weather images
• Zottabytes -- 10^24 bytes:
Intelligence Agency Videos
OLTP Vs Data Warehouse (OLAP)
OLTP

Data Warehouse (OLAP)

Indexes

Few

Many

Data

Normalized

Generally De-normalized

Joins

Many

Some

Derived data and aggregates

Rare

Common
Data Warehouse Architecture
Operational
System

Sales
Data Mart
Analysis

Operational
System
ETL
(Extract
Transform
and Load)

Data
Warehouse

Generic
Data Mart

Flat
Files

Flat
Files

Data Mining

Inventory
Data Mart

Reporting
ETL
ETL stands for Extract, Transform and
Load






Data is distributed across different sources
– Flat files, Streaming Data, DB Systems, XML, JSON
Data can be in different format
– CSV, Key Value Pairs
Different units and representation
– Country: IN or India
– Date: 20 Nov 2010 or 20101020
ETL Functions






Extract
– Collect data from different sources
– Parse data
– Remove unwanted data
Transform
– Project
– Generate Surrogate keys
– Encode data
– Join data from different sources
– Aggregate
Load
ETL Steps
•

The first step in ETL process is mapping the data between
source systems and target database.
• The second step is cleansing of source data in staging area.
• The third step is transforming cleansed source data.
• Fourth step is loading into the target system.


Data before ETL Processing:



Data after ETL Processing:
ETL Glossary
Mapping:
Defining relationship between source and target objects.
Cleansing:
The process of resolving inconsistencies in source data.
Transformation:
The process of manipulating data. Any manipulation beyond
copying is a transformation. Examples include aggregating, and
integrating data from multiple sources.
Staging Area:

A place where data is processed before entering the
warehouse.
Dimension






Categorizes the data. For example - time, location, etc.
A dimension can have one or more attributes. For example
- day, week and month are attributes of time dimension.
Role of dimensions in data warehousing.
- Slice and dice
- Filter by dimensions
Types of dimensions
•

•

•

•

•

Conformed Dimension - A dimension that is shared across
fact tables.
Junk Dimension - A junk dimension is a convenient
grouping of flags and indicators. For example, payment
method, shipping method.
De-generated Dimension - A dimension key, that has no
attributes and hence does not have its own dimension
table. For example, transaction number, invoice number.
Value of these dimension is mostly unique within a fact
table.
Role Playing Dimensions - Role Playing dimension refers
to a dimension that play different roles in fact tables
depending on the context. For example, the Date
dimension can be used for the ordered date, shipment
date, and invoice date.
Slowly Changing Dimensions - Dimensions that have data
Types of Slowly Changing Dimension
•

•

•

•

Type1 - The Type 1 methodology overwrites old data with
new data, and therefore does not track historical data at
all.
Type 2 - The Type 2 method tracks historical data by
creating multiple records for a given value in dimension
table with separate surrogate keys.
Type 3 - The Type 3 method tracks changes using
separate columns. Whereas Type 2 had unlimited history
preservation, Type 3 has limited history preservation, as it's
limited to the number of columns we designate for storing
historical data.
Type 4 - The Type 4 method is usually referred to as using
"history tables", where one table keeps the current data,
and an additional table is used to keep a record of all
changes.
Type 1, 2 and 3 are commonly used.
Facts







Facts are values that can be examined and analyzed.
For Example - Page Views, Unique Users, Pieces
Sold, Profit.
Fact and measure are synonymous.
Types of facts:
–
Additive - Measures that can be added across all
dimensions.
–
Non Additive - Measures that cannot be added across
all dimensions.
–
Semi Additive - Measures that can be added across
few dimensions and not with others.
How to store data?
Facts and Dimensions:
1. Select the business process to model
2. Declare the grain of the business process
3. Choose the dimensions that apply to each fact table row
4. Identify the numeric facts that will populate each fact table
row
Dimension Table





Contains attributes of dimensions e.g. Month is an attribute
of Time dimension.
Can also have foreign keys to another dimension table
Usually identified by a unique integer primary key called
surrogate key
Fact Table




Contains Facts
Foreign keys to dimension tables
Primary Key: usually composite key of all FKs
Types of schema used in data
warehouse
Star Schema





Snowflake Schema
Fact Constellation Schema
Star Schema





Multi-dimensional Data
Dimension and Fact Tables
A fact table with pointers to Dimension tables
Star Schema
Snowflake Schema




An extension of star schema in which the dimension tables
are partly or fully normalized.
Dimension table hierarchies broken down into simpler
tables.
Snowflake Schema
Fact Constellation Schema
•
•



A fact constellation schema allows dimension tables to be
shared between fact tables.
This Schema is used mainly for the aggregate fact tables,
OR where we want to split a fact table for better
comprehension.
For example, a separate fact table for daily, weekly and
monthly reporting requirement.
Fact Constellation Schema

In this example, the dimensions tables for time, item, and location are
shared between both the sales and shipping fact tables.
Operations on Data Warehouse






Drill Down
Roll up
Slice & Dice
Pivoting
Drill Down
Product
Category e.g Home Appliances
Sub Category e.g Kitchen Appliances
Product e.g Toaster

Time
Roll Up
Year

Fiscal Year

Quarter

Fiscal Quarter

Month

Fiscal Month

Fiscal Week

Day
Slice & Dice
Product = Toaster
Product

Time
Time
Pivoting
Product

Product

Time

•
•
•

Also called rotation
Rotate on an axis
Interchange Rows and Columns

Region
Advantages of Data Warehouse
•
•
•
•
•

One consistent data store for reporting, forecasting, and
analysis
Easier and timely access to data
Scalability
Trend analysis and detection
Drill down analysis
Disadvantages of Data Warehouse
•

Preparation may be time consuming.
• High associated cost
Case Study: Why Data Warehouse
•

•

G2G Courier Pvt. Ltd. is an established brand in courier
industry which has its own network in main cities and also
have sub contracted in rural areas across the country to
various partners.
The President of the company wants to look deep into the
financial health of the company and different performance
aspects.
Challenges
Apart from G2G’s own transaction system, each partner has
their own system which make the data very heterogeneous.
• Granularity of data in various systems is also different. For
eg: minute accuracy and day accuracy.
• To do analysis on metrics like Revenue and Timely delivery
across various geographical locations and partner, we need
to have a unified system.
•
“Looks like we are doing good in
South, is there any scope of further
improvement???”

“We are getting lot of complaints
from the East, who exactly is the
black sheep???”
Sales Information
Report: Revenue by region
Region

Revenue (lacs)

% Change

South

41

+ 8.1

North

34

+ 5.2

East

25

- 6.8

West

12

+ 2.7

Report: Performance by partner
Partner

On Time Delivery Rate

No. of complaints

A

100 %

0

B

98 %

90

C

60 %

521
Case Study: Data Warehouse Design
•
•
•

ABC Pvt Ltd is a new company which produces stationary
products with production unit located at Ludhiana.
They have sales units at Delhi, Bangalore.
The President of the company wants sales information.
Sales Information
Report: The number of units sold.
113

Report: The number of units sold over time
January

February

March

April

14

41

33

25

Report : The number of items sold for each product with time
Jan

Feb

Apr

6

Black Cartridge

Mar

17
8

Long notebook

6

16

6

Short notebook

8

25

21
Product
Sales Information
Report: The number of items sold in each City for each product with time
City

Item

Delhi

Jan

Feb Mar Apr

Black Cartridge

3
16

6

Short Notebook 4

16

6

Bangalore Black Cartridge

3

Time

Long Notebook 3

10

7

Long Notebook 3

8

Short Notebook 4

9

Product

15

City

Item

Jan Feb

Mar Apr

Delhi

General Stationary

7

12

Ink & Toners
Bangalore General Stationary

Ink & Toners

3
7

9

10

15

8

3

7

Time

32

Product Category
Identify sales Facts & Dimensions


Facts – Units sold



Dimensions – Product, Time, Region.



Fact Table
City_ID Prod_ID
1

589

1

3

1

1218

1

4

2

589

1

3

2

1218

1

4

1


Time_Id Units

589

2

16

Time dimension table
Time_Id

Month

1

January 2012

2

February 2012
Identify sales Facts & Dimensions
Region Dimension Table

City_ID

City

Region

Country

1

Delhi

North

India

2

Bangalore

South

India

Product Dimension Tables

Prod_ID

Product_Name

Product_Category_ID

589

Black Cartridge

2

590

Long Notebook

1

288

Short Notebook

1

Product_Category_ID Product_Category
1

General Stationary

2

Ink & Toners
Data warehouse model
Time

Product

Sales Fact

Region

Product
Category
Thank You

Contenu connexe

Tendances

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingDunn Solutions Group
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasEric Matthews
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousingKavisha Uniyal
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data modeljagdish_93
 
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | EdurekaData Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | EdurekaEdureka!
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etlAashish Rathod
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse designines beltaief
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modelingaksrauf
 
OLAP OnLine Analytical Processing
OLAP OnLine Analytical ProcessingOLAP OnLine Analytical Processing
OLAP OnLine Analytical ProcessingWalid Elbadawy
 

Tendances (20)

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
OLAP v/s OLTP
OLAP v/s OLTPOLAP v/s OLTP
OLAP v/s OLTP
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional Modeling
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Ppt
PptPpt
Ppt
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousing
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 
Data warehousing ppt
Data warehousing pptData warehousing ppt
Data warehousing ppt
 
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | EdurekaData Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse design
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
OLAP OnLine Analytical Processing
OLAP OnLine Analytical ProcessingOLAP OnLine Analytical Processing
OLAP OnLine Analytical Processing
 

En vedette

DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingJason S
 
Data warehousing
Data warehousingData warehousing
Data warehousingVarun Jain
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehousekiran14360
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and miningRajesh Chandra
 
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationIOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationDavid Walker
 
Benefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topperBenefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topperBeing Topper
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousingShahed Khalili
 
Data as Seductive Material, Spring Summit, Umeå March09
Data as Seductive Material, Spring Summit, Umeå March09Data as Seductive Material, Spring Summit, Umeå March09
Data as Seductive Material, Spring Summit, Umeå March09Matt Jones
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
Data Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkData Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkDr. Sunil Kr. Pandey
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEyad Manna
 
Bimodal IT and EDW Modernization
Bimodal IT and EDW ModernizationBimodal IT and EDW Modernization
Bimodal IT and EDW ModernizationRobert Gleave
 
Data Warehouse Concepts and Architecture
Data Warehouse Concepts and ArchitectureData Warehouse Concepts and Architecture
Data Warehouse Concepts and ArchitectureMohd Tousif
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architectureuncleRhyme
 
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...Denodo
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Mike Frampton
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouseJ M
 

En vedette (20)

DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and mining
 
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationIOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
 
Benefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topperBenefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topper
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
Data as Seductive Material, Spring Summit, Umeå March09
Data as Seductive Material, Spring Summit, Umeå March09Data as Seductive Material, Spring Summit, Umeå March09
Data as Seductive Material, Spring Summit, Umeå March09
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
Data Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkData Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural Framework
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Bimodal IT and EDW Modernization
Bimodal IT and EDW ModernizationBimodal IT and EDW Modernization
Bimodal IT and EDW Modernization
 
Data Warehouse Concepts and Architecture
Data Warehouse Concepts and ArchitectureData Warehouse Concepts and Architecture
Data Warehouse Concepts and Architecture
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Inmon & kimball method
Inmon & kimball methodInmon & kimball method
Inmon & kimball method
 
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 

Similaire à Introduction to Data Warehousing

Similaire à Introduction to Data Warehousing (20)

Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha Durganathan
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
Intro to datawarehouse dev 1.0
Intro to datawarehouse   dev 1.0Intro to datawarehouse   dev 1.0
Intro to datawarehouse dev 1.0
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Business analysis
Business analysisBusiness analysis
Business analysis
 
Bi overview
Bi overviewBi overview
Bi overview
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
datamarts.ppt
datamarts.pptdatamarts.ppt
datamarts.ppt
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 
Complete unit ii notes
Complete unit ii notesComplete unit ii notes
Complete unit ii notes
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehouse
 

Plus de Gurpreet Singh Sachdeva

Plus de Gurpreet Singh Sachdeva (6)

iOS App performance - Things to take care
iOS App performance - Things to take careiOS App performance - Things to take care
iOS App performance - Things to take care
 
Firefox addons
Firefox addonsFirefox addons
Firefox addons
 
Introduction to Greasemonkey
Introduction to GreasemonkeyIntroduction to Greasemonkey
Introduction to Greasemonkey
 
iOS training (advanced)
iOS training (advanced)iOS training (advanced)
iOS training (advanced)
 
iOS training (intermediate)
iOS training (intermediate)iOS training (intermediate)
iOS training (intermediate)
 
iOS training (basic)
iOS training (basic)iOS training (basic)
iOS training (basic)
 

Dernier

Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 

Dernier (20)

Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

Introduction to Data Warehousing

  • 1. Introduction to Data Warehousing
  • 2. Data Warehouse     Maintain historic data Analysis to get better understanding of business Better Decision making Definition: A data warehouse is a  subject-oriented  integrated  time-varying  non-volatile collection of data that is used primarily in organizational decision making. -- Bill Inmon, Building the Data Warehouse 1996
  • 3. Subject Oriented • • • Data warehouse is organized around subjects such as sales, product, customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.
  • 4. Integrated • • Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency. RDBMS Data Processing Data Transformation Legacy System Flat File Data Warehouse Data Processing Data Transformation
  • 5. Non-volatile • Mostly, data once recorded will not be updated. • Data warehouse requires two operations in data accessing - Incremental loading of data - Access of data load access
  • 6. Time Variant • • Provides information from historical perspective e.g. past 510 years Every key structure contains either implicitly or explicitly an element of time
  • 7. Why Data Warehouse? Problem Statement: • ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Bangalore. • The Sales Manager wants quarterly sales report across the branches. • Each branch has a separate operational system where sales transactions are recorded.
  • 8. Why Data Warehouse? Mumbai Delhi Get quarterly sales figure for each branch and manually calculate sales figure across branches. Sales Manager Chennai Banglore What if he need daily sales report across the branches?
  • 9. Why Data Warehouse? • • Solution: Extract sales information from each database. Store the information in a common repository at a single site.
  • 11. Characteristics of Data Warehouse        Relational / Multidimensional database Query and Analysis rather than transaction Historical data from transactions Consolidates Multiple data sources Separates query load from transactions Mostly non volatile Large amount of data in order of TBs
  • 12. When we say large - we mean it! • Terabytes -- 10^12 bytes: Yahoo! – 300 Terabytes and growing • Petabytes -- 10^15 bytes: Geographic Information Systems • Exabytes -- 10^18 bytes: National Medical Records • Zettabytes -- 10^21 bytes: Weather images • Zottabytes -- 10^24 bytes: Intelligence Agency Videos
  • 13. OLTP Vs Data Warehouse (OLAP) OLTP Data Warehouse (OLAP) Indexes Few Many Data Normalized Generally De-normalized Joins Many Some Derived data and aggregates Rare Common
  • 14. Data Warehouse Architecture Operational System Sales Data Mart Analysis Operational System ETL (Extract Transform and Load) Data Warehouse Generic Data Mart Flat Files Flat Files Data Mining Inventory Data Mart Reporting
  • 15. ETL ETL stands for Extract, Transform and Load    Data is distributed across different sources – Flat files, Streaming Data, DB Systems, XML, JSON Data can be in different format – CSV, Key Value Pairs Different units and representation – Country: IN or India – Date: 20 Nov 2010 or 20101020
  • 16. ETL Functions    Extract – Collect data from different sources – Parse data – Remove unwanted data Transform – Project – Generate Surrogate keys – Encode data – Join data from different sources – Aggregate Load
  • 17. ETL Steps • The first step in ETL process is mapping the data between source systems and target database. • The second step is cleansing of source data in staging area. • The third step is transforming cleansed source data. • Fourth step is loading into the target system.  Data before ETL Processing:  Data after ETL Processing:
  • 18. ETL Glossary Mapping: Defining relationship between source and target objects. Cleansing: The process of resolving inconsistencies in source data. Transformation: The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include aggregating, and integrating data from multiple sources. Staging Area: A place where data is processed before entering the warehouse.
  • 19. Dimension    Categorizes the data. For example - time, location, etc. A dimension can have one or more attributes. For example - day, week and month are attributes of time dimension. Role of dimensions in data warehousing. - Slice and dice - Filter by dimensions
  • 20. Types of dimensions • • • • • Conformed Dimension - A dimension that is shared across fact tables. Junk Dimension - A junk dimension is a convenient grouping of flags and indicators. For example, payment method, shipping method. De-generated Dimension - A dimension key, that has no attributes and hence does not have its own dimension table. For example, transaction number, invoice number. Value of these dimension is mostly unique within a fact table. Role Playing Dimensions - Role Playing dimension refers to a dimension that play different roles in fact tables depending on the context. For example, the Date dimension can be used for the ordered date, shipment date, and invoice date. Slowly Changing Dimensions - Dimensions that have data
  • 21. Types of Slowly Changing Dimension • • • • Type1 - The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. Type 2 - The Type 2 method tracks historical data by creating multiple records for a given value in dimension table with separate surrogate keys. Type 3 - The Type 3 method tracks changes using separate columns. Whereas Type 2 had unlimited history preservation, Type 3 has limited history preservation, as it's limited to the number of columns we designate for storing historical data. Type 4 - The Type 4 method is usually referred to as using "history tables", where one table keeps the current data, and an additional table is used to keep a record of all changes. Type 1, 2 and 3 are commonly used.
  • 22. Facts     Facts are values that can be examined and analyzed. For Example - Page Views, Unique Users, Pieces Sold, Profit. Fact and measure are synonymous. Types of facts: – Additive - Measures that can be added across all dimensions. – Non Additive - Measures that cannot be added across all dimensions. – Semi Additive - Measures that can be added across few dimensions and not with others.
  • 23. How to store data? Facts and Dimensions: 1. Select the business process to model 2. Declare the grain of the business process 3. Choose the dimensions that apply to each fact table row 4. Identify the numeric facts that will populate each fact table row
  • 24. Dimension Table    Contains attributes of dimensions e.g. Month is an attribute of Time dimension. Can also have foreign keys to another dimension table Usually identified by a unique integer primary key called surrogate key
  • 25. Fact Table    Contains Facts Foreign keys to dimension tables Primary Key: usually composite key of all FKs
  • 26. Types of schema used in data warehouse Star Schema    Snowflake Schema Fact Constellation Schema
  • 27. Star Schema    Multi-dimensional Data Dimension and Fact Tables A fact table with pointers to Dimension tables
  • 29. Snowflake Schema   An extension of star schema in which the dimension tables are partly or fully normalized. Dimension table hierarchies broken down into simpler tables.
  • 31. Fact Constellation Schema • •  A fact constellation schema allows dimension tables to be shared between fact tables. This Schema is used mainly for the aggregate fact tables, OR where we want to split a fact table for better comprehension. For example, a separate fact table for daily, weekly and monthly reporting requirement.
  • 32. Fact Constellation Schema In this example, the dimensions tables for time, item, and location are shared between both the sales and shipping fact tables.
  • 33. Operations on Data Warehouse     Drill Down Roll up Slice & Dice Pivoting
  • 34. Drill Down Product Category e.g Home Appliances Sub Category e.g Kitchen Appliances Product e.g Toaster Time
  • 35. Roll Up Year Fiscal Year Quarter Fiscal Quarter Month Fiscal Month Fiscal Week Day
  • 36. Slice & Dice Product = Toaster Product Time Time
  • 37. Pivoting Product Product Time • • • Also called rotation Rotate on an axis Interchange Rows and Columns Region
  • 38. Advantages of Data Warehouse • • • • • One consistent data store for reporting, forecasting, and analysis Easier and timely access to data Scalability Trend analysis and detection Drill down analysis
  • 39. Disadvantages of Data Warehouse • Preparation may be time consuming. • High associated cost
  • 40. Case Study: Why Data Warehouse • • G2G Courier Pvt. Ltd. is an established brand in courier industry which has its own network in main cities and also have sub contracted in rural areas across the country to various partners. The President of the company wants to look deep into the financial health of the company and different performance aspects.
  • 41. Challenges Apart from G2G’s own transaction system, each partner has their own system which make the data very heterogeneous. • Granularity of data in various systems is also different. For eg: minute accuracy and day accuracy. • To do analysis on metrics like Revenue and Timely delivery across various geographical locations and partner, we need to have a unified system. •
  • 42. “Looks like we are doing good in South, is there any scope of further improvement???” “We are getting lot of complaints from the East, who exactly is the black sheep???”
  • 43. Sales Information Report: Revenue by region Region Revenue (lacs) % Change South 41 + 8.1 North 34 + 5.2 East 25 - 6.8 West 12 + 2.7 Report: Performance by partner Partner On Time Delivery Rate No. of complaints A 100 % 0 B 98 % 90 C 60 % 521
  • 44. Case Study: Data Warehouse Design • • • ABC Pvt Ltd is a new company which produces stationary products with production unit located at Ludhiana. They have sales units at Delhi, Bangalore. The President of the company wants sales information.
  • 45. Sales Information Report: The number of units sold. 113 Report: The number of units sold over time January February March April 14 41 33 25 Report : The number of items sold for each product with time Jan Feb Apr 6 Black Cartridge Mar 17 8 Long notebook 6 16 6 Short notebook 8 25 21 Product
  • 46. Sales Information Report: The number of items sold in each City for each product with time City Item Delhi Jan Feb Mar Apr Black Cartridge 3 16 6 Short Notebook 4 16 6 Bangalore Black Cartridge 3 Time Long Notebook 3 10 7 Long Notebook 3 8 Short Notebook 4 9 Product 15 City Item Jan Feb Mar Apr Delhi General Stationary 7 12 Ink & Toners Bangalore General Stationary Ink & Toners 3 7 9 10 15 8 3 7 Time 32 Product Category
  • 47. Identify sales Facts & Dimensions  Facts – Units sold  Dimensions – Product, Time, Region.  Fact Table City_ID Prod_ID 1 589 1 3 1 1218 1 4 2 589 1 3 2 1218 1 4 1  Time_Id Units 589 2 16 Time dimension table Time_Id Month 1 January 2012 2 February 2012
  • 48. Identify sales Facts & Dimensions Region Dimension Table City_ID City Region Country 1 Delhi North India 2 Bangalore South India Product Dimension Tables Prod_ID Product_Name Product_Category_ID 589 Black Cartridge 2 590 Long Notebook 1 288 Short Notebook 1 Product_Category_ID Product_Category 1 General Stationary 2 Ink & Toners
  • 49. Data warehouse model Time Product Sales Fact Region Product Category

Notes de l'éditeur

  1. CRMERP change to something simple