SlideShare une entreprise Scribd logo
1  sur  56
Chanderprabhu Jain College of Higher Studies & School of Law
Plot No. OCF, Sector A-8, Narela, New Delhi – 110040
(Affiliated to Guru Gobind Singh Indraprastha University and
Approved by Govt of NCT of Delhi & Bar Council of India)
DATA WAREHOUSING
AND
DATA MINING
Priyanka Gautam
Asst Pof(IT) CPJCHS
2
0. Introduction
Data Warehousing,
OLAP and data mining:
what and why
(now)?
Relation to OLTP
A case study
demos, labs
3
Which are our
lowest/highest margin
customers ?
Which are our
lowest/highest margin
customers ?
Who are my customers
and what products
are they buying?
Who are my customers
and what products
are they buying?
Which customers
are most likely to go
to the competition ?
Which customers
are most likely to go
to the competition ?
What impact will
new products/services
have on revenue
and margins?
What impact will
new products/services
have on revenue
and margins?
What product prom-
-otions have the biggest
impact on revenue?
What product prom-
-otions have the biggest
impact on revenue?
What is the most
effective distribution
channel?
What is the most
effective distribution
channel?
A producer wants to know….
4
Data, Data everywhere
yet ... I can’t find the data I need
data is scattered over the
network
many versions, subtle
differences
I can’t get the data I need
need an expert to get the data
I can’t understand the data I
found
available data poorly documented
I can’t use the data I found
results are unexpected
data needs to be transformed
from one form to other
5
What is a Data Warehouse?
A single, complete and
consistent store of data
obtained from a variety
of different sources
made available to end
users in a what they
can understand and use
in a business context.
[Barry Devlin]
6
What is Data Warehousing?
A process of
transforming data into
information and
making it available to
users in a timely
enough manner to
make a difference
[Forrester Research, April
1996]Data
Information
7
Evolution
60’s: Batch reports
hard to find and analyze information
inflexible and expensive, reprogram every new
request
70’s: Terminal-based DSS and EIS (executive
information systems)
still inflexible, not integrated with desktop tools
80’s: Desktop data access and analysis tools
query tools, spreadsheets, GUIs
easier to use, but only access operational databases
90’s: Data warehousing with integrated OLAP
engines and tools
8
Warehouses are Very Large
Databases
35%
30%
25%
20%
15%
10%
5%
0%
5GB
5-9GB
10-19GB 50-99GB 250-499GB
20-49GB 100-249GB 500GB-1TB
Initial
Projected 2Q96
Source: META Group, Inc.
Respondents
9
Very Large Data Bases
Terabytes -- 10^12 bytes:
Petabytes -- 10^15 bytes:
Exabytes -- 10^18 bytes:
Zettabytes -- 10^21
bytes:
Zottabytes -- 10^24
bytes:
Walmart -- 24 Terabytes
Geographic Information
Systems
National Medical Records
Weather images
Intelligence Agency
Videos
10
Data Warehousing --
It is a process
Technique for assembling and
managing data from various
sources for the purpose of
answering business
questions. Thus making
decisions that were not
previous possible
A decision support database
maintained separately from
the organization’s operational
database
11
Data Warehouse
A data warehouse is a
subject-oriented
integrated
time-varying
non-volatile
collection of data that is used primarily in
organizational decision making.
-- Bill Inmon, Building the Data Warehouse 1996
12
Explorers, Farmers and Tourists
Explorers: Seek out the unknown and
previously unsuspected rewards hiding in
the detailed data
Farmers: Harvest information
from known access paths
Tourists: Browse information
harvested by farmers
13
Data Warehouse Architecture
Data Warehouse
Engine
Optimized Loader
Extraction
Cleansing
Analyze
Query
Metadata Repository
Relational
Databases
Legacy
Data
Purchased
Data
ERP
Systems
14
Data Warehouse for Decision
Support & OLAP
Putting Information technology to help the
knowledge worker make faster and better
decisions
Which of my customers are most likely to go
to the competition?
What product promotions have the biggest
impact on revenue?
How did the share price of software
companies correlate with profits over last 10
years?
15
Decision Support
Used to manage and control business
Data is historical or point-in-time
Optimized for inquiry rather than update
Use of the system is loosely defined and
can be ad-hoc
Used by managers and end-users to
understand the business and make
judgements
16
Application Areas
Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providers Value added data
Utilities Power usage analysis
17
Data Mining in Use
The US Government uses Data Mining to
track fraud
A Supermarket becomes an information
broker
Basketball teams use it to track game
strategy
Cross Selling
Warranty Claims Routing
Holding on to Good Customers
Weeding out Bad Customers
18
Why Separate Data Warehouse?
Performance
Op dbs designed & tuned for known txs & workloads.
Complex OLAP queries would degrade perf. for op txs.
Special data organization, access & implementation
methods needed for multidimensional views & queries.
Function
Missing data: Decision support requires historical data, which op dbs do not typically maintain.
Data consolidation: Decision support requires consolidation (aggregation, summarization) of data
from many heterogeneous sources: op dbs, external sources.
Data quality: Different sources typically use inconsistent data representations, codes, and formats
which have to be reconciled.
19
Examples of Operational Data
Data Industry Usage Technology Volumes
Customer
File
All Track
Customer
Details
Legacy application, flat
files, main frames
Small-medium
Account
Balance
Finance Control
account
activities
Legacy applications,
hierarchical databases,
mainframe
Large
Point-of-
Sale data
Retail Generate
bills, manage
stock
ERP, Client/Server,
relational databases
Very Large
Call
Record
Telecomm-
unications
Billing Legacy application,
hierarchical database,
mainframe
Very Large
Production
Record
Manufact-
uring
Control
Production
ERP,
relational databases,
AS/400
Medium
20
Application-Orientation vs.
Subject-Orientation
Application-Orientation
Operationa
l Database
Loans
Credit
Card
Trust
Savings
Subject-Orientation
Data
Warehouse
Customer
Vendor
Product
Activity
21
OLTP vs. Data Warehouse
OLTP systems are tuned for known
transactions and workloads while
workload is not known a priori in a data
warehouse
Special data organization, access methods
and implementation methods are needed
to support data warehouse queries
(typically multidimensional queries)
e.g., average amount spent on phone calls
between 9AM-5PM in Pune during the month
of December
22
OLTP vs Data Warehouse
OLTP
Application
Oriented
Used to run
business
Detailed data
Current up to date
Isolated Data
Repetitive access
Clerical User
Warehouse (DSS)
Subject Oriented
Used to analyze
business
Summarized and
refined
Snapshot data
Integrated Data
Ad-hoc access
Knowledge User
(Manager)
23
OLTP vs Data Warehouse
OLTP
Performance Sensitive
Few Records accessed at
a time (tens)
Read/Update Access
No data redundancy
Database Size 100MB
-100 GB
Data Warehouse
Performance relaxed
Large volumes accessed
at a time(millions)
Mostly Read (Batch
Update)
Redundancy present
Database Size
100 GB - few terabytes
24
To summarize ...
OLTP Systems are
used to “run” a
business
The Data
Warehouse helps
to “optimize” the
business
25
Why Now?
Data is being produced
ERP provides clean data
The computing power is available
The computing power is affordable
The competitive pressures are strong
Commercial products are available
26
I. Data Warehouses:
Architecture, Design & Construction
DW Architecture
Loading, refreshing
Structuring/Modeling
DWs and Data Marts
Query Processing
demos, labs
27
Data Warehouse Architecture
Data Warehouse
Engine
Optimized Loader
Extraction
Cleansing
Analyze
Query
Metadata Repository
Relational
Databases
Legacy
Data
Purchased
Data
ERP
Systems
28
Components of the Warehouse
Data Extraction and Loading
The Warehouse
Analyze and Query -- OLAP Tools
Metadata
Data Mining tools
29
Data Quality - The Reality
Tempting to think creating a data
warehouse is simply extracting
operational data and entering into a
data warehouse
Nothing could be farther from the
truth
Warehouse data comes from
disparate questionable sources
30
Data Quality - The Reality
Legacy systems no longer documented
Outside sources with questionable quality
procedures
Production systems with no built in
integrity checks and no integration
Operational systems are usually designed to
solve a specific business problem and are
rarely developed to a a corporate plan
“And get it done quickly, we do not have time to
worry about corporate standards...”
31
Data Transformation Terms
Extracting
Capture of data from operational source in
“as is” status
Sources for data generally in legacy
mainframes in VSAM, IMS, IDMS, DB2; more
data today in relational databases on Unix
Conditioning
The conversion of data types from the source
to the target data store (warehouse) --
always a relational database
32
Data Extraction and Cleansing
Extract data from existing
operational and legacy data
Issues:
Sources of data for the warehouse
Data quality at the sources
Merging different data sources
Data Transformation
How to propagate updates (on the sources) to
the warehouse
Terabytes of data to be loaded
33
Data -- Heart of the Data
Warehouse
Heart of the data warehouse is the
data itself!
Single version of the truth
Corporate memory
Data is organized in a way that
represents business -- subject
orientation
34
Data Warehouse Structure
Subject Orientation -- customer,
product, policy, account etc... A
subject may be implemented as a
set of related tables. E.g.,
customer may be five tables
35
Data Warehouse Structure
base customer (1985-87)
custid, from date, to date, name, phone, dob
base customer (1988-90)
custid, from date, to date, name, credit rating,
employer
customer activity (1986-89) -- monthly
summary
customer activity detail (1987-89)
custid, activity date, amount, clerk id, order no
customer activity detail (1990-91)
custid, activity date, amount, line item no, order no
Time isTime is
part ofpart of
key ofkey of
each tableeach table
36
Schema Design
Database organization
must look like business
must be recognizable by business user
approachable by business user
Must be simple
Schema Types
Star Schema
Fact Constellation Schema
Snowflake schema
37
Star Schema
A single fact table and for each
dimension one dimension table
Does not capture hierarchies directly
T
i
m
e
p
r
o
d
c
u
s
t
c
i
t
y
f
a
c
t
date, custno, prodno, cityname, ...
38
De-normalization
Normalization in a data warehouse
may lead to lots of small tables
Can lead to excessive I/O’s since
many tables have to be accessed
De-normalization is the answer
especially since updates are rare
Data Warehouse vs. Data Marts
What comes first
40
From the Data Warehouse to Data
Marts
Departmentally
Structured
Individually
Structured
Data Warehouse
Organizationally
Structured
Less
More
History
Normalized
Detailed
Data
Information
41
Data Warehouse and Data Marts
OLAP
Data Mart
Lightly summarized
Departmentally structured
Organizationally structured
Atomic
Detailed Data Warehouse Data
42
Characteristics of the
Departmental Data Mart
OLAP
Small
Flexible
Customized by
Department
Source is
departmentally
structured data
warehouse
43
Data Mart Centric
Data Marts
Data Sources
Data Warehouse
II. On-Line Analytical Processing (OLAP)
Making Decision
Support Possible
45
Limitations of SQL
“A Freshman in
Business needs
a Ph.D. in SQL”
-- Ralph Kimball
46
Typical OLAP Queries
Write a multi-table join to compare sales for each
product line YTD this year vs. last year.
Repeat the above process to find the top 5
product contributors to margin.
Repeat the above process to find the sales of a
product line to new vs. existing customers.
Repeat the above process to find the customers
that have had negative sales growth.
47
* Reference: http://www.arborsoft.com/essbase/wht_ppr/coddTOC.html* Reference: http://www.arborsoft.com/essbase/wht_ppr/coddTOC.html
What Is OLAP?
Online Analytical Processing - coined by
EF Codd in 1994 paper contracted by
Arbor Software*
Generally synonymous with earlier terms such as
Decisions Support, Business Intelligence, Executive
Information System
OLAP = Multidimensional Database
MOLAP: Multidimensional OLAP (Arbor Essbase,
Oracle Express)
ROLAP: Relational OLAP (Informix MetaCube,
Microstrategy DSS Agent)
48
Strengths of OLAP
It is a powerful visualization paradigm
It provides fast, interactive response
times
It is good for analyzing time series
It can be useful to find some clusters and
outliers
Many vendors offer OLAP tools
49
A Visual Operation: Pivot (Rotate)
1010
4747
3030
1212
JuiceJuice
ColaCola
MilkMilk
CreamCream
NYNY
LALA
SFSF
3/1 3/2 3/3 3/43/1 3/2 3/3 3/4
DateDate
Month
Month
RegionRegion
ProductProduct
50
Nature of OLAP Analysis
Aggregation -- (total sales,
percent-to-total)
Comparison -- Budget vs.
Expenses
Ranking -- Top 10, quartile
analysis
Access to detailed and
aggregate data
Complex criteria
specification
Visualization
51
For a Successful Warehouse
From day one establish that warehousing
is a joint user/builder project
Establish that maintaining data quality will
be an ONGOING joint user/builder
responsibility
Train the users one step at a time
Consider doing a high level corporate data
model in no more than three weeks
From Larry Greenfield, http://pwp.starnetinc.com/larryg/index.html
52
For a Successful Warehouse
Look closely at the data extracting,
cleaning, and loading tools
Implement a user accessible automated
directory to information stored in the
warehouse
Determine a plan to test the integrity of
the data in the warehouse
From the start get warehouse users in the
habit of 'testing' complex queries
53
Data Warehouse
W.H. Inmon, Building the Data
Warehouse, Second Edition, John Wiley
and Sons, 1996
W.H. Inmon, J. D. Welch, Katherine L.
Glassey, Managing the Data Warehouse,
John Wiley and Sons, 1997
Barry Devlin, Data Warehouse from
Architecture to Implementation, Addison
Wesley Longman, Inc 1997
54
Data Warehouse
W.H. Inmon, John A. Zachman, Jonathan
G. Geiger, Data Stores Data Warehousing
and the Zachman Framework, McGraw Hill
Series on Data Warehousing and Data
Management, 1997
Ralph Kimball, The Data Warehouse
Toolkit, John Wiley and Sons, 1996
55
OLAP and DSS
Erik Thomsen, OLAP Solutions, John Wiley
and Sons 1997
Microsoft TechEd Transparencies from
Microsoft TechEd 98
Essbase Product Literature
Oracle Express Product Literature
Microsoft Plato Web Site
Microstrategy Web Site
56
Data Mining
Michael J.A. Berry and Gordon Linoff, Data
Mining Techniques, John Wiley and Sons
1997
Peter Adriaans and Dolf Zantinge, Data
Mining, Addison Wesley Longman Ltd.
1996
KDD Conferences

Contenu connexe

Tendances

Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Traditional data warehouse vs data lake
Traditional data warehouse vs data lakeTraditional data warehouse vs data lake
Traditional data warehouse vs data lakeBHASKAR CHAUDHURY
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault ModelingKent Graziano
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional ModelingSunita Sahu
 
How to build a data dictionary
How to build a data dictionaryHow to build a data dictionary
How to build a data dictionaryPiotr Kononow
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Introduction Data warehouse
Introduction Data warehouseIntroduction Data warehouse
Introduction Data warehouseAmin Choroomi
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEyad Manna
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing Girish Dhareshwar
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasEric Matthews
 
Data mining
Data mining Data mining
Data mining AthiraR23
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processingSamraiz Tejani
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseSOMASUNDARAM T
 

Tendances (20)

Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Traditional data warehouse vs data lake
Traditional data warehouse vs data lakeTraditional data warehouse vs data lake
Traditional data warehouse vs data lake
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault Modeling
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
How to build a data dictionary
How to build a data dictionaryHow to build a data dictionary
How to build a data dictionary
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Introduction Data warehouse
Introduction Data warehouseIntroduction Data warehouse
Introduction Data warehouse
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 
Data mining
Data mining Data mining
Data mining
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 

Similaire à DATA Warehousing & Data Mining

UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningNandakumar P
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.pptKRISHNARAJ207
 
13500892 data-warehousing-and-data-mining
13500892 data-warehousing-and-data-mining13500892 data-warehousing-and-data-mining
13500892 data-warehousing-and-data-miningNgaire Taylor
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Conceptsraulmisir
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mininggulab sharma
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Overview of Business Intelligence
Overview of Business IntelligenceOverview of Business Intelligence
Overview of Business IntelligenceParthiv Dixit
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewNagaraj Yerram
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
Dataware housing
Dataware housingDataware housing
Dataware housingwork
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overviewnetpeachteam
 
The Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamThe Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamSenturus
 

Similaire à DATA Warehousing & Data Mining (20)

UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data Mining
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.ppt
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.ppt
 
13500892 data-warehousing-and-data-mining
13500892 data-warehousing-and-data-mining13500892 data-warehousing-and-data-mining
13500892 data-warehousing-and-data-mining
 
DWM
DWMDWM
DWM
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Concepts
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
 
IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Overview of Business Intelligence
Overview of Business IntelligenceOverview of Business Intelligence
Overview of Business Intelligence
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overview
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
CTP Data Warehouse
CTP Data WarehouseCTP Data Warehouse
CTP Data Warehouse
 
Msbi by quontra us
Msbi by quontra usMsbi by quontra us
Msbi by quontra us
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overview
 
The Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamThe Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science Team
 

Plus de cpjcollege

Tax Law (LLB-403)
Tax Law (LLB-403)Tax Law (LLB-403)
Tax Law (LLB-403)cpjcollege
 
Law and Emerging Technology (LLB -405)
 Law and Emerging Technology (LLB -405) Law and Emerging Technology (LLB -405)
Law and Emerging Technology (LLB -405)cpjcollege
 
Law of Crimes-I ( LLB -205)
 Law of Crimes-I  ( LLB -205)  Law of Crimes-I  ( LLB -205)
Law of Crimes-I ( LLB -205) cpjcollege
 
Socio-Legal Dimensions of Gender (LLB-507 & 509 )
Socio-Legal Dimensions of Gender (LLB-507 & 509 )Socio-Legal Dimensions of Gender (LLB-507 & 509 )
Socio-Legal Dimensions of Gender (LLB-507 & 509 )cpjcollege
 
Family Law-I ( LLB -201)
Family Law-I  ( LLB -201) Family Law-I  ( LLB -201)
Family Law-I ( LLB -201) cpjcollege
 
Alternative Dispute Resolution (ADR) [LLB -309]
Alternative Dispute Resolution (ADR) [LLB -309] Alternative Dispute Resolution (ADR) [LLB -309]
Alternative Dispute Resolution (ADR) [LLB -309] cpjcollege
 
Law of Evidence (LLB-303)
Law of Evidence  (LLB-303) Law of Evidence  (LLB-303)
Law of Evidence (LLB-303) cpjcollege
 
Environmental Studies and Environmental Laws (: LLB -301)
Environmental Studies and Environmental Laws (: LLB -301)Environmental Studies and Environmental Laws (: LLB -301)
Environmental Studies and Environmental Laws (: LLB -301)cpjcollege
 
Code of Civil Procedure (LLB -307)
 Code of Civil Procedure (LLB -307) Code of Civil Procedure (LLB -307)
Code of Civil Procedure (LLB -307)cpjcollege
 
Constitutional Law-I (LLB -203)
 Constitutional Law-I (LLB -203) Constitutional Law-I (LLB -203)
Constitutional Law-I (LLB -203)cpjcollege
 
Women and Law [LLB 409 (c)]
Women and Law [LLB 409 (c)]Women and Law [LLB 409 (c)]
Women and Law [LLB 409 (c)]cpjcollege
 
Corporate Law ( LLB- 305)
Corporate Law ( LLB- 305)Corporate Law ( LLB- 305)
Corporate Law ( LLB- 305)cpjcollege
 
Human Rights Law ( LLB -407)
 Human Rights Law ( LLB -407) Human Rights Law ( LLB -407)
Human Rights Law ( LLB -407)cpjcollege
 
Labour Law-I (LLB 401)
 Labour Law-I (LLB 401) Labour Law-I (LLB 401)
Labour Law-I (LLB 401)cpjcollege
 
Legal Ethics and Court Craft (LLB 501)
 Legal Ethics and Court Craft (LLB 501) Legal Ethics and Court Craft (LLB 501)
Legal Ethics and Court Craft (LLB 501)cpjcollege
 
Political Science-II (BALLB- 209)
Political Science-II (BALLB- 209)Political Science-II (BALLB- 209)
Political Science-II (BALLB- 209)cpjcollege
 
Health Care Law ( LLB 507 & LLB 509 )
Health Care Law ( LLB 507 & LLB 509 )Health Care Law ( LLB 507 & LLB 509 )
Health Care Law ( LLB 507 & LLB 509 )cpjcollege
 
Land and Real Estate Laws (LLB-505)
Land and Real Estate Laws (LLB-505)Land and Real Estate Laws (LLB-505)
Land and Real Estate Laws (LLB-505)cpjcollege
 
Business Environment and Ethical Practices (BBA LLB 213 )
Business Environment and Ethical Practices (BBA LLB 213 )Business Environment and Ethical Practices (BBA LLB 213 )
Business Environment and Ethical Practices (BBA LLB 213 )cpjcollege
 
HUMAN RESOURCE MANAGEMENT (BBA LLB215 )
HUMAN RESOURCE MANAGEMENT (BBA LLB215 )HUMAN RESOURCE MANAGEMENT (BBA LLB215 )
HUMAN RESOURCE MANAGEMENT (BBA LLB215 )cpjcollege
 

Plus de cpjcollege (20)

Tax Law (LLB-403)
Tax Law (LLB-403)Tax Law (LLB-403)
Tax Law (LLB-403)
 
Law and Emerging Technology (LLB -405)
 Law and Emerging Technology (LLB -405) Law and Emerging Technology (LLB -405)
Law and Emerging Technology (LLB -405)
 
Law of Crimes-I ( LLB -205)
 Law of Crimes-I  ( LLB -205)  Law of Crimes-I  ( LLB -205)
Law of Crimes-I ( LLB -205)
 
Socio-Legal Dimensions of Gender (LLB-507 & 509 )
Socio-Legal Dimensions of Gender (LLB-507 & 509 )Socio-Legal Dimensions of Gender (LLB-507 & 509 )
Socio-Legal Dimensions of Gender (LLB-507 & 509 )
 
Family Law-I ( LLB -201)
Family Law-I  ( LLB -201) Family Law-I  ( LLB -201)
Family Law-I ( LLB -201)
 
Alternative Dispute Resolution (ADR) [LLB -309]
Alternative Dispute Resolution (ADR) [LLB -309] Alternative Dispute Resolution (ADR) [LLB -309]
Alternative Dispute Resolution (ADR) [LLB -309]
 
Law of Evidence (LLB-303)
Law of Evidence  (LLB-303) Law of Evidence  (LLB-303)
Law of Evidence (LLB-303)
 
Environmental Studies and Environmental Laws (: LLB -301)
Environmental Studies and Environmental Laws (: LLB -301)Environmental Studies and Environmental Laws (: LLB -301)
Environmental Studies and Environmental Laws (: LLB -301)
 
Code of Civil Procedure (LLB -307)
 Code of Civil Procedure (LLB -307) Code of Civil Procedure (LLB -307)
Code of Civil Procedure (LLB -307)
 
Constitutional Law-I (LLB -203)
 Constitutional Law-I (LLB -203) Constitutional Law-I (LLB -203)
Constitutional Law-I (LLB -203)
 
Women and Law [LLB 409 (c)]
Women and Law [LLB 409 (c)]Women and Law [LLB 409 (c)]
Women and Law [LLB 409 (c)]
 
Corporate Law ( LLB- 305)
Corporate Law ( LLB- 305)Corporate Law ( LLB- 305)
Corporate Law ( LLB- 305)
 
Human Rights Law ( LLB -407)
 Human Rights Law ( LLB -407) Human Rights Law ( LLB -407)
Human Rights Law ( LLB -407)
 
Labour Law-I (LLB 401)
 Labour Law-I (LLB 401) Labour Law-I (LLB 401)
Labour Law-I (LLB 401)
 
Legal Ethics and Court Craft (LLB 501)
 Legal Ethics and Court Craft (LLB 501) Legal Ethics and Court Craft (LLB 501)
Legal Ethics and Court Craft (LLB 501)
 
Political Science-II (BALLB- 209)
Political Science-II (BALLB- 209)Political Science-II (BALLB- 209)
Political Science-II (BALLB- 209)
 
Health Care Law ( LLB 507 & LLB 509 )
Health Care Law ( LLB 507 & LLB 509 )Health Care Law ( LLB 507 & LLB 509 )
Health Care Law ( LLB 507 & LLB 509 )
 
Land and Real Estate Laws (LLB-505)
Land and Real Estate Laws (LLB-505)Land and Real Estate Laws (LLB-505)
Land and Real Estate Laws (LLB-505)
 
Business Environment and Ethical Practices (BBA LLB 213 )
Business Environment and Ethical Practices (BBA LLB 213 )Business Environment and Ethical Practices (BBA LLB 213 )
Business Environment and Ethical Practices (BBA LLB 213 )
 
HUMAN RESOURCE MANAGEMENT (BBA LLB215 )
HUMAN RESOURCE MANAGEMENT (BBA LLB215 )HUMAN RESOURCE MANAGEMENT (BBA LLB215 )
HUMAN RESOURCE MANAGEMENT (BBA LLB215 )
 

Dernier

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 

Dernier (20)

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 

DATA Warehousing & Data Mining

  • 1. Chanderprabhu Jain College of Higher Studies & School of Law Plot No. OCF, Sector A-8, Narela, New Delhi – 110040 (Affiliated to Guru Gobind Singh Indraprastha University and Approved by Govt of NCT of Delhi & Bar Council of India) DATA WAREHOUSING AND DATA MINING Priyanka Gautam Asst Pof(IT) CPJCHS
  • 2. 2 0. Introduction Data Warehousing, OLAP and data mining: what and why (now)? Relation to OLTP A case study demos, labs
  • 3. 3 Which are our lowest/highest margin customers ? Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? Who are my customers and what products are they buying? Which customers are most likely to go to the competition ? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? What impact will new products/services have on revenue and margins? What product prom- -otions have the biggest impact on revenue? What product prom- -otions have the biggest impact on revenue? What is the most effective distribution channel? What is the most effective distribution channel? A producer wants to know….
  • 4. 4 Data, Data everywhere yet ... I can’t find the data I need data is scattered over the network many versions, subtle differences I can’t get the data I need need an expert to get the data I can’t understand the data I found available data poorly documented I can’t use the data I found results are unexpected data needs to be transformed from one form to other
  • 5. 5 What is a Data Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. [Barry Devlin]
  • 6. 6 What is Data Warehousing? A process of transforming data into information and making it available to users in a timely enough manner to make a difference [Forrester Research, April 1996]Data Information
  • 7. 7 Evolution 60’s: Batch reports hard to find and analyze information inflexible and expensive, reprogram every new request 70’s: Terminal-based DSS and EIS (executive information systems) still inflexible, not integrated with desktop tools 80’s: Desktop data access and analysis tools query tools, spreadsheets, GUIs easier to use, but only access operational databases 90’s: Data warehousing with integrated OLAP engines and tools
  • 8. 8 Warehouses are Very Large Databases 35% 30% 25% 20% 15% 10% 5% 0% 5GB 5-9GB 10-19GB 50-99GB 250-499GB 20-49GB 100-249GB 500GB-1TB Initial Projected 2Q96 Source: META Group, Inc. Respondents
  • 9. 9 Very Large Data Bases Terabytes -- 10^12 bytes: Petabytes -- 10^15 bytes: Exabytes -- 10^18 bytes: Zettabytes -- 10^21 bytes: Zottabytes -- 10^24 bytes: Walmart -- 24 Terabytes Geographic Information Systems National Medical Records Weather images Intelligence Agency Videos
  • 10. 10 Data Warehousing -- It is a process Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible A decision support database maintained separately from the organization’s operational database
  • 11. 11 Data Warehouse A data warehouse is a subject-oriented integrated time-varying non-volatile collection of data that is used primarily in organizational decision making. -- Bill Inmon, Building the Data Warehouse 1996
  • 12. 12 Explorers, Farmers and Tourists Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data Farmers: Harvest information from known access paths Tourists: Browse information harvested by farmers
  • 13. 13 Data Warehouse Architecture Data Warehouse Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased Data ERP Systems
  • 14. 14 Data Warehouse for Decision Support & OLAP Putting Information technology to help the knowledge worker make faster and better decisions Which of my customers are most likely to go to the competition? What product promotions have the biggest impact on revenue? How did the share price of software companies correlate with profits over last 10 years?
  • 15. 15 Decision Support Used to manage and control business Data is historical or point-in-time Optimized for inquiry rather than update Use of the system is loosely defined and can be ad-hoc Used by managers and end-users to understand the business and make judgements
  • 16. 16 Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
  • 17. 17 Data Mining in Use The US Government uses Data Mining to track fraud A Supermarket becomes an information broker Basketball teams use it to track game strategy Cross Selling Warranty Claims Routing Holding on to Good Customers Weeding out Bad Customers
  • 18. 18 Why Separate Data Warehouse? Performance Op dbs designed & tuned for known txs & workloads. Complex OLAP queries would degrade perf. for op txs. Special data organization, access & implementation methods needed for multidimensional views & queries. Function Missing data: Decision support requires historical data, which op dbs do not typically maintain. Data consolidation: Decision support requires consolidation (aggregation, summarization) of data from many heterogeneous sources: op dbs, external sources. Data quality: Different sources typically use inconsistent data representations, codes, and formats which have to be reconciled.
  • 19. 19 Examples of Operational Data Data Industry Usage Technology Volumes Customer File All Track Customer Details Legacy application, flat files, main frames Small-medium Account Balance Finance Control account activities Legacy applications, hierarchical databases, mainframe Large Point-of- Sale data Retail Generate bills, manage stock ERP, Client/Server, relational databases Very Large Call Record Telecomm- unications Billing Legacy application, hierarchical database, mainframe Very Large Production Record Manufact- uring Control Production ERP, relational databases, AS/400 Medium
  • 21. 21 OLTP vs. Data Warehouse OLTP systems are tuned for known transactions and workloads while workload is not known a priori in a data warehouse Special data organization, access methods and implementation methods are needed to support data warehouse queries (typically multidimensional queries) e.g., average amount spent on phone calls between 9AM-5PM in Pune during the month of December
  • 22. 22 OLTP vs Data Warehouse OLTP Application Oriented Used to run business Detailed data Current up to date Isolated Data Repetitive access Clerical User Warehouse (DSS) Subject Oriented Used to analyze business Summarized and refined Snapshot data Integrated Data Ad-hoc access Knowledge User (Manager)
  • 23. 23 OLTP vs Data Warehouse OLTP Performance Sensitive Few Records accessed at a time (tens) Read/Update Access No data redundancy Database Size 100MB -100 GB Data Warehouse Performance relaxed Large volumes accessed at a time(millions) Mostly Read (Batch Update) Redundancy present Database Size 100 GB - few terabytes
  • 24. 24 To summarize ... OLTP Systems are used to “run” a business The Data Warehouse helps to “optimize” the business
  • 25. 25 Why Now? Data is being produced ERP provides clean data The computing power is available The computing power is affordable The competitive pressures are strong Commercial products are available
  • 26. 26 I. Data Warehouses: Architecture, Design & Construction DW Architecture Loading, refreshing Structuring/Modeling DWs and Data Marts Query Processing demos, labs
  • 27. 27 Data Warehouse Architecture Data Warehouse Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased Data ERP Systems
  • 28. 28 Components of the Warehouse Data Extraction and Loading The Warehouse Analyze and Query -- OLAP Tools Metadata Data Mining tools
  • 29. 29 Data Quality - The Reality Tempting to think creating a data warehouse is simply extracting operational data and entering into a data warehouse Nothing could be farther from the truth Warehouse data comes from disparate questionable sources
  • 30. 30 Data Quality - The Reality Legacy systems no longer documented Outside sources with questionable quality procedures Production systems with no built in integrity checks and no integration Operational systems are usually designed to solve a specific business problem and are rarely developed to a a corporate plan “And get it done quickly, we do not have time to worry about corporate standards...”
  • 31. 31 Data Transformation Terms Extracting Capture of data from operational source in “as is” status Sources for data generally in legacy mainframes in VSAM, IMS, IDMS, DB2; more data today in relational databases on Unix Conditioning The conversion of data types from the source to the target data store (warehouse) -- always a relational database
  • 32. 32 Data Extraction and Cleansing Extract data from existing operational and legacy data Issues: Sources of data for the warehouse Data quality at the sources Merging different data sources Data Transformation How to propagate updates (on the sources) to the warehouse Terabytes of data to be loaded
  • 33. 33 Data -- Heart of the Data Warehouse Heart of the data warehouse is the data itself! Single version of the truth Corporate memory Data is organized in a way that represents business -- subject orientation
  • 34. 34 Data Warehouse Structure Subject Orientation -- customer, product, policy, account etc... A subject may be implemented as a set of related tables. E.g., customer may be five tables
  • 35. 35 Data Warehouse Structure base customer (1985-87) custid, from date, to date, name, phone, dob base customer (1988-90) custid, from date, to date, name, credit rating, employer customer activity (1986-89) -- monthly summary customer activity detail (1987-89) custid, activity date, amount, clerk id, order no customer activity detail (1990-91) custid, activity date, amount, line item no, order no Time isTime is part ofpart of key ofkey of each tableeach table
  • 36. 36 Schema Design Database organization must look like business must be recognizable by business user approachable by business user Must be simple Schema Types Star Schema Fact Constellation Schema Snowflake schema
  • 37. 37 Star Schema A single fact table and for each dimension one dimension table Does not capture hierarchies directly T i m e p r o d c u s t c i t y f a c t date, custno, prodno, cityname, ...
  • 38. 38 De-normalization Normalization in a data warehouse may lead to lots of small tables Can lead to excessive I/O’s since many tables have to be accessed De-normalization is the answer especially since updates are rare
  • 39. Data Warehouse vs. Data Marts What comes first
  • 40. 40 From the Data Warehouse to Data Marts Departmentally Structured Individually Structured Data Warehouse Organizationally Structured Less More History Normalized Detailed Data Information
  • 41. 41 Data Warehouse and Data Marts OLAP Data Mart Lightly summarized Departmentally structured Organizationally structured Atomic Detailed Data Warehouse Data
  • 42. 42 Characteristics of the Departmental Data Mart OLAP Small Flexible Customized by Department Source is departmentally structured data warehouse
  • 43. 43 Data Mart Centric Data Marts Data Sources Data Warehouse
  • 44. II. On-Line Analytical Processing (OLAP) Making Decision Support Possible
  • 45. 45 Limitations of SQL “A Freshman in Business needs a Ph.D. in SQL” -- Ralph Kimball
  • 46. 46 Typical OLAP Queries Write a multi-table join to compare sales for each product line YTD this year vs. last year. Repeat the above process to find the top 5 product contributors to margin. Repeat the above process to find the sales of a product line to new vs. existing customers. Repeat the above process to find the customers that have had negative sales growth.
  • 47. 47 * Reference: http://www.arborsoft.com/essbase/wht_ppr/coddTOC.html* Reference: http://www.arborsoft.com/essbase/wht_ppr/coddTOC.html What Is OLAP? Online Analytical Processing - coined by EF Codd in 1994 paper contracted by Arbor Software* Generally synonymous with earlier terms such as Decisions Support, Business Intelligence, Executive Information System OLAP = Multidimensional Database MOLAP: Multidimensional OLAP (Arbor Essbase, Oracle Express) ROLAP: Relational OLAP (Informix MetaCube, Microstrategy DSS Agent)
  • 48. 48 Strengths of OLAP It is a powerful visualization paradigm It provides fast, interactive response times It is good for analyzing time series It can be useful to find some clusters and outliers Many vendors offer OLAP tools
  • 49. 49 A Visual Operation: Pivot (Rotate) 1010 4747 3030 1212 JuiceJuice ColaCola MilkMilk CreamCream NYNY LALA SFSF 3/1 3/2 3/3 3/43/1 3/2 3/3 3/4 DateDate Month Month RegionRegion ProductProduct
  • 50. 50 Nature of OLAP Analysis Aggregation -- (total sales, percent-to-total) Comparison -- Budget vs. Expenses Ranking -- Top 10, quartile analysis Access to detailed and aggregate data Complex criteria specification Visualization
  • 51. 51 For a Successful Warehouse From day one establish that warehousing is a joint user/builder project Establish that maintaining data quality will be an ONGOING joint user/builder responsibility Train the users one step at a time Consider doing a high level corporate data model in no more than three weeks From Larry Greenfield, http://pwp.starnetinc.com/larryg/index.html
  • 52. 52 For a Successful Warehouse Look closely at the data extracting, cleaning, and loading tools Implement a user accessible automated directory to information stored in the warehouse Determine a plan to test the integrity of the data in the warehouse From the start get warehouse users in the habit of 'testing' complex queries
  • 53. 53 Data Warehouse W.H. Inmon, Building the Data Warehouse, Second Edition, John Wiley and Sons, 1996 W.H. Inmon, J. D. Welch, Katherine L. Glassey, Managing the Data Warehouse, John Wiley and Sons, 1997 Barry Devlin, Data Warehouse from Architecture to Implementation, Addison Wesley Longman, Inc 1997
  • 54. 54 Data Warehouse W.H. Inmon, John A. Zachman, Jonathan G. Geiger, Data Stores Data Warehousing and the Zachman Framework, McGraw Hill Series on Data Warehousing and Data Management, 1997 Ralph Kimball, The Data Warehouse Toolkit, John Wiley and Sons, 1996
  • 55. 55 OLAP and DSS Erik Thomsen, OLAP Solutions, John Wiley and Sons 1997 Microsoft TechEd Transparencies from Microsoft TechEd 98 Essbase Product Literature Oracle Express Product Literature Microsoft Plato Web Site Microstrategy Web Site
  • 56. 56 Data Mining Michael J.A. Berry and Gordon Linoff, Data Mining Techniques, John Wiley and Sons 1997 Peter Adriaans and Dolf Zantinge, Data Mining, Addison Wesley Longman Ltd. 1996 KDD Conferences