SlideShare une entreprise Scribd logo
1  sur  44
Shanu Sharma, CSE-ASET
DATA WAREHOUSE-
THE BUILDING BLOCKS
Shanu Sharma, CSE-ASET
TOPICS COVERED
 Definition of Data warehouse
 Characteristics of Data Warehouse
 Data mart
 Components of data warehouse
 Meta data
 Applications of Data warehouse
 OLTP v/s Data Warehouse
Shanu Sharma, CSE-ASET
CONCEPT OF DATA WAREHOUSE
Take all the data you already have in the organization,
clean and transform it, and then provide useful strategic
information.
Shanu Sharma, CSE-ASET
DEFINITION OF DATA WAREHOUSE
(1996 )Bill Inmon considered to be the father of data
warehousing stated.
 “A DW is a subject-oriented, integrated, non-volatile,
time-variant collection of data in favor of decision-
making”.
Sean Kelly said Data in the data warehouse is
“Separate available, integrated, time-stamped, subject-
oriented, non-volatile, accessible”
Shanu Sharma, CSE-ASET
CHARACTERISTICS OF DATA WAREHOUSE
Subject
Oriented
Integrated
Time
Variant
Non
Volatile
Shanu Sharma, CSE-ASET
1. SUBJECT ORIENTED DATA
 In operational systems data is stored by individual
applications or business process. Like data about
individual order , customer etc.
 For example in banking industry data sets for saving or
checking accounts contain data about that particular
application.
 But in DW data is stored by real world business
objectives or events not by the applications.
Shanu Sharma, CSE-ASET
In DW subject is the organization method
Subjects vary with enterprise
Shanu Sharma, CSE-ASET
2. INTEGRATED DATA
 Data in DW comes from several operational systems.
 Different datasets have different file formats.
Example: Data for subject Account comes from 3 different
data sources.
So variations could be there, like:
Naming conventions could be different.
Attributes for data items could be different.
Like: Saving account no. could be of 8 bytes long but only 6
bytes for checking accounts.
Shanu Sharma, CSE-ASET
 Before moving the data into the data warehouse,
you have to go through a process of
transformation, consolidation, and integration of
the source data.
 Here are some of the items that would need
standardization:
 Naming conventions
 Codes
 Data attributes
Shanu Sharma, CSE-ASET
Shanu Sharma, CSE-ASET
TIME VARIANT DATA
 In operational systems the stored data contains current
values.
Like in saving account system the balance is the current
balance of the customer.
 But the data in the DW is meant for analysis and decision
making.
 Comparative analysis is one of the best techniques for
business performance evaluation
 Time is critical factor for comparative analysis
 Every data structure in DW contains time element
Shanu Sharma, CSE-ASET
 So, DW has to contain historical data and current
values.
 Data is stored as snapshots over past and current
periods.
The time-variant nature of the data in a data warehouse
 Allows for analysis of the past
 Relates information to the present
 Enables forecasts for the future
Shanu Sharma, CSE-ASET
NON VOLATILE DATA
 Data from operational systems are moved into DW after
specific intervals
 Every business transaction don‟t update in DW
 Data from DW is not deleted
 Data is neither changed by individual transactions
Shanu Sharma, CSE-ASET
Subject Oriented
Organized along the lines
of the subjects of the
corporation. Typical
subjects are customer,
product, vendor and
transaction.
Time-Variant
Every record in the
data warehouse has
some form of time
variancy attached to it.
Non-Volatile
Refers to the inability of
data to be updated. Every
record in the data
warehouse is time
stamped in one form or
another.
Shanu Sharma, CSE-ASET
DATA GRANULARITY
Data granularity refers to the level of details of data in data
warehouse.
The lower the level of details, the finer is the data granularity.
Shanu Sharma, CSE-ASET
DATA WAREHOUSES AND DATA MARTS
 In 1998 Bill Inmon stated ,
“The single most important issue facing the IT manager this
year is whether to build the data warehouse first or the
data mart first”.
How are they different ?
Shanu Sharma, CSE-ASET
Shanu Sharma, CSE-ASET
 In any organization for managing data for analysis
purpose there are basically two approaches.
1. Top Down Approach
The centralized data warehouse would feed the
dependent data marts that may be designed based on
a dimensional data model.
In this approach data in the data warehouse is stored at
the lowest level of granularity based on a normalized
data model.
Shanu Sharma, CSE-ASET
Advantages:
 An enterprise view of data
 Not a union of disparate data marts
 Centralized rules and control
Disadvantages:
 Slow approach
 High exposure to risk of failure
Shanu Sharma, CSE-ASET
2. Bottom Up Approach
In this approach first data marts are created to provide
analytical capability for specific business subjects based on
dimension data model.
Then these data marts are joined or unioned by conforming
the dimensions to create a DW.
Advantages:
 Faster and easier implementation
 Less risk of failure
 Allows project team to learn and grow
Disadvantages:
 Redundant data in every data mart.
 Inconsistent data
Shanu Sharma, CSE-ASET
DW: BUILDING BLOCKS OR COMPONENTS
Shanu Sharma, CSE-ASET
1. SOURCE DATA COMPONENT
 Production data
Comes from various operational systems of the enterprise.
 Internal Data
Like private documents, customer profiles, departmental
databases etc.
 External Data
Statistics data produced by external agencies. Used for
comparing performance against other organizations.
 Archived Data
In every operational systems, the old data periodically stored
in archived files or on disk storage. This data is also required
as the data warehouse keeps historical snapshots of data.
Shanu Sharma, CSE-ASET
2. DATA STAGING COMPONENT
After data is extracted, data is to be prepared
Data extracted from sources needs to be changed,
converted and made ready in suitable format
 Three major functions to make data ready
 Extract
 Transform
 Load
 Staging area provides a place and area with a set of
functions to
 Clean
 Change
 Combine
 Convert
Shanu Sharma, CSE-ASET
Different techniques are used for extracting data from
different data sources.
Data transformation includes
Data cleaning- like correction of misselling, resolution of
conflicts, providing default values for missing data
elements etc, remove duplication.
Standardization of Data- standardize data types, field
length. Semantic standardization like resolving
synonyms and homonyms.
Sorting, Merging etc.
Shanu Sharma, CSE-ASET
Data Loading: Data Movement to the Data Warehouse
Shanu Sharma, CSE-ASET
3. DATA STORAGE COMPONENTS
 Separate repository
 Data structured for efficient processing
 Updated after specific periods
 Only read-only
Shanu Sharma, CSE-ASET
4. INFORMATION DELIVERY COMPONENT
 It includes various methods of delivering information on
the basis of users. Ex.
 Ad hoc reports or predefined reports for novice and casual
users.
 Statistical analysis for business analyst.
 It also provides information to data mining applications.
Shanu Sharma, CSE-ASET
Shanu Sharma, CSE-ASET
METADATA COMPONENT
 Metadata component is the data about the data in the data
warehouse.
 Metadata in a data warehouse contains the answers to
questions about the data in the data warehouse.
 It serves as a directory of the contents of the data
warehouse
Shanu Sharma, CSE-ASET
TYPES OF METADATA
 Operational Metadata
Contains information about the operational data sources
like field lengths, data types etc.
 Extraction and Transformation Metadata
extraction frequencies, extraction methods etc.
 End-User Metadata
Shanu Sharma, CSE-ASET
TYPES & TYPICAL APPLICATIONS OF DWH
32
APPLICATION AREAS
Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providers Value added data
Utilities Power usage analysis
Shanu Sharma, CSE-ASET
TYPICAL APPLICATIONS
Impact on organization‟s core business is to
streamline and maximize profitability.
 Fraud detection.
 Profitability analysis.
 Direct mail/database marketing.
 Credit risk prediction.
 Yield management.
 Inventory management.
.
Shanu Sharma, CSE-ASET
TYPICAL APPLICATIONS
Fraud detection
 By observing data usage patterns.
 People have typical purchase patterns.
 Deviation from patterns.
 Certain cities notorious for fraud.
 Certain items bought by stolen cards.
 Similar behavior for stolen phone cards.
Shanu Sharma, CSE-ASET
TYPICAL APPLICATIONS
Profitability Analysis
 Banks know if they are profitable or not.
 Don‟t know which customers are profitable.
 Typically more than 50% are NOT profitable.
 Don‟t know which one?
 Balance is not enough, transactional behavior is the key.
 Restructure products and pricing strategies.
 Life-time profitability models (next 3-5 years).
Shanu Sharma, CSE-ASET
TYPICAL APPLICATIONS
Direct mail marketing
 Targeted marketing.
 Offering high bandwidth package NOT to all users.
 Know from call detail records of web surfing.
 Saves marketing expense, saving pennies.
 Knowing your customers better.
Shanu Sharma, CSE-ASET
TYPICAL APPLICATIONS
Credit risk prediction
 Who should get a loan?
 Qualitative decision making NOT subjective.
 Different interest rates for different customers.
 Do not subsidize bad customer on the basis of good.
Shanu Sharma, CSE-ASET
TYPICAL APPLICATIONS
Yield Management
 Works for fixed inventory businesses.
 Item prices vary for varying customers.
 Example: Air Lines, Hotels etc.
 Price of (say) Air Ticket depends on:
 How much in advance ticket was bought?
 How many vacant seats were present?
 How profitable is the customer?
 Ticket is one-way or return?
Shanu Sharma, CSE-ASET
RECENT APPLICATION
Agriculture Systems
 Agri and related data collected for decades.
 Decision making based on expert judgment.
 Lack of integration results in underutilization.
 What is required, in which amount and when?
40
DATA WAREHOUSE VS. OLTP
OLTP (On Line Transaction Processing)
Select tx_date, balance from tx_table
Where account_ID = 23876;
41
DATA WAREHOUSE VS. OLTP
DWH
Select balance, age, sal, gender from
customer_table, tx_table
Where age between (30 and 40) and
Education = „graduate‟ and
CustID.customer_table =
Customer_ID.tx_table;
42
DATA WAREHOUSE VS. OLTP
OLTP DWH
Primary key used Primary key NOT used
No concept of Primary Index Primary index used
Few rows returned Many rows returned
May use a single table Uses multiple tables
High selectivity of query Low selectivity of query
Indexing on primary key
(unique)
Indexing on primary index
(non-unique)
Shanu Sharma, CSE-ASET43
COMPARISON OF RESPONSE TIMES
 On-line analytical processing (OLAP) queries must be
executed in a small number of seconds.
 Often requires denormalization and/or sampling.
 Complex query scripts and large list selections can
generally be executed in a small number of minutes.
 Sophisticated clustering algorithms (e.g., data mining)
can generally be executed in a small number of hours
(even for hundreds of thousands of customers).
Shanu Sharma, CSE-ASET44
DATA WAREHOUSE FOR DECISION SUPPORT
& OLAP
 Putting Information technology to help the
knowledge worker make faster and better
decisions
 Which of my customers are most likely to go to
the competition?
 What product promotions have the biggest
impact on revenue?
 How did the share price of software companies
correlate with profits over last 10 years?

Contenu connexe

Tendances

Classification of data mart
Classification of data martClassification of data mart
Classification of data martkhush_boo31
 
Executive information sysytem
Executive  information sysytemExecutive  information sysytem
Executive information sysytemHimanshu Sahu
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse ArchitecturesTheju Paul
 
Introduction to structured query language (sql)
Introduction to structured query language (sql)Introduction to structured query language (sql)
Introduction to structured query language (sql)Dhani Ahmad
 
Components Of Executive Information System
Components Of Executive Information SystemComponents Of Executive Information System
Components Of Executive Information SystemTheju Paul
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Business Intelligence and decision support system
Business Intelligence and decision support system Business Intelligence and decision support system
Business Intelligence and decision support system Shrihari Shrihari
 
Dimensional model | | Fact Tables | | Types
Dimensional model | | Fact Tables | | TypesDimensional model | | Fact Tables | | Types
Dimensional model | | Fact Tables | | Typesumair saeed
 
Data warehousing Demo PPTS | Over View | Introduction
Data warehousing Demo PPTS | Over View | Introduction Data warehousing Demo PPTS | Over View | Introduction
Data warehousing Demo PPTS | Over View | Introduction Kernel Training
 
Data warehouse project on retail store
Data warehouse project on retail storeData warehouse project on retail store
Data warehouse project on retail storeSiddharth Chaudhary
 
MIS: Business Intelligence
MIS: Business IntelligenceMIS: Business Intelligence
MIS: Business IntelligenceJonathan Coleman
 
Presentation on Business Intelligence (BI)
Presentation on Business Intelligence (BI)Presentation on Business Intelligence (BI)
Presentation on Business Intelligence (BI)AkashBorse2
 
Components of information system
Components of information systemComponents of information system
Components of information systemSofia Priyadarshini
 

Tendances (20)

Classification of data mart
Classification of data martClassification of data mart
Classification of data mart
 
Executive information sysytem
Executive  information sysytemExecutive  information sysytem
Executive information sysytem
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse Architectures
 
Introduction to structured query language (sql)
Introduction to structured query language (sql)Introduction to structured query language (sql)
Introduction to structured query language (sql)
 
Components Of Executive Information System
Components Of Executive Information SystemComponents Of Executive Information System
Components Of Executive Information System
 
Information system
Information systemInformation system
Information system
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Business Intelligence and decision support system
Business Intelligence and decision support system Business Intelligence and decision support system
Business Intelligence and decision support system
 
ETL Testing Overview
ETL Testing OverviewETL Testing Overview
ETL Testing Overview
 
Dimensional model | | Fact Tables | | Types
Dimensional model | | Fact Tables | | TypesDimensional model | | Fact Tables | | Types
Dimensional model | | Fact Tables | | Types
 
Data warehousing Demo PPTS | Over View | Introduction
Data warehousing Demo PPTS | Over View | Introduction Data warehousing Demo PPTS | Over View | Introduction
Data warehousing Demo PPTS | Over View | Introduction
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Data warehouse project on retail store
Data warehouse project on retail storeData warehouse project on retail store
Data warehouse project on retail store
 
MIS: Business Intelligence
MIS: Business IntelligenceMIS: Business Intelligence
MIS: Business Intelligence
 
Presentation on Business Intelligence (BI)
Presentation on Business Intelligence (BI)Presentation on Business Intelligence (BI)
Presentation on Business Intelligence (BI)
 
Components of information system
Components of information systemComponents of information system
Components of information system
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
OLTP vs OLAP
OLTP vs OLAPOLTP vs OLAP
OLTP vs OLAP
 
ERP module
ERP moduleERP module
ERP module
 

Similaire à Dwdm 2(data warehouse)

Data warehouse
Data warehouseData warehouse
Data warehouseMR Z
 
The Data Warehouse Essays
The Data Warehouse EssaysThe Data Warehouse Essays
The Data Warehouse EssaysMelissa Moore
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseSOMASUNDARAM T
 
Dataware housing
Dataware housingDataware housing
Dataware housingwork
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousingShubha Brota Raha
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
Data miningvs datawarehouse
Data miningvs datawarehouseData miningvs datawarehouse
Data miningvs datawarehouseSuman Astani
 
Business Intelligence Industry Perspective Session I
Business Intelligence   Industry Perspective Session IBusiness Intelligence   Industry Perspective Session I
Business Intelligence Industry Perspective Session IPrithwis Mukerjee
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overviewashok kumar
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Harish Chand
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptDougSchoemaker
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptPalaniKumarR2
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data WarehousesMichael Lamont
 
Types Of Sap Hana Models
Types Of Sap Hana ModelsTypes Of Sap Hana Models
Types Of Sap Hana ModelsAshley Thomas
 

Similaire à Dwdm 2(data warehouse) (20)

Business Analytics
 Business Analytics  Business Analytics
Business Analytics
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
The Data Warehouse Essays
The Data Warehouse EssaysThe Data Warehouse Essays
The Data Warehouse Essays
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
Data miningvs datawarehouse
Data miningvs datawarehouseData miningvs datawarehouse
Data miningvs datawarehouse
 
Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
Unit 1
Unit 1Unit 1
Unit 1
 
Business Intelligence Industry Perspective Session I
Business Intelligence   Industry Perspective Session IBusiness Intelligence   Industry Perspective Session I
Business Intelligence Industry Perspective Session I
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.ppt
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data Warehouses
 
ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
 
Types Of Sap Hana Models
Types Of Sap Hana ModelsTypes Of Sap Hana Models
Types Of Sap Hana Models
 

Dernier

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Dwdm 2(data warehouse)

  • 1. Shanu Sharma, CSE-ASET DATA WAREHOUSE- THE BUILDING BLOCKS
  • 2. Shanu Sharma, CSE-ASET TOPICS COVERED  Definition of Data warehouse  Characteristics of Data Warehouse  Data mart  Components of data warehouse  Meta data  Applications of Data warehouse  OLTP v/s Data Warehouse
  • 3. Shanu Sharma, CSE-ASET CONCEPT OF DATA WAREHOUSE Take all the data you already have in the organization, clean and transform it, and then provide useful strategic information.
  • 4. Shanu Sharma, CSE-ASET DEFINITION OF DATA WAREHOUSE (1996 )Bill Inmon considered to be the father of data warehousing stated.  “A DW is a subject-oriented, integrated, non-volatile, time-variant collection of data in favor of decision- making”. Sean Kelly said Data in the data warehouse is “Separate available, integrated, time-stamped, subject- oriented, non-volatile, accessible”
  • 5. Shanu Sharma, CSE-ASET CHARACTERISTICS OF DATA WAREHOUSE Subject Oriented Integrated Time Variant Non Volatile
  • 6. Shanu Sharma, CSE-ASET 1. SUBJECT ORIENTED DATA  In operational systems data is stored by individual applications or business process. Like data about individual order , customer etc.  For example in banking industry data sets for saving or checking accounts contain data about that particular application.  But in DW data is stored by real world business objectives or events not by the applications.
  • 7. Shanu Sharma, CSE-ASET In DW subject is the organization method Subjects vary with enterprise
  • 8. Shanu Sharma, CSE-ASET 2. INTEGRATED DATA  Data in DW comes from several operational systems.  Different datasets have different file formats. Example: Data for subject Account comes from 3 different data sources. So variations could be there, like: Naming conventions could be different. Attributes for data items could be different. Like: Saving account no. could be of 8 bytes long but only 6 bytes for checking accounts.
  • 9. Shanu Sharma, CSE-ASET  Before moving the data into the data warehouse, you have to go through a process of transformation, consolidation, and integration of the source data.  Here are some of the items that would need standardization:  Naming conventions  Codes  Data attributes
  • 11. Shanu Sharma, CSE-ASET TIME VARIANT DATA  In operational systems the stored data contains current values. Like in saving account system the balance is the current balance of the customer.  But the data in the DW is meant for analysis and decision making.  Comparative analysis is one of the best techniques for business performance evaluation  Time is critical factor for comparative analysis  Every data structure in DW contains time element
  • 12. Shanu Sharma, CSE-ASET  So, DW has to contain historical data and current values.  Data is stored as snapshots over past and current periods. The time-variant nature of the data in a data warehouse  Allows for analysis of the past  Relates information to the present  Enables forecasts for the future
  • 13. Shanu Sharma, CSE-ASET NON VOLATILE DATA  Data from operational systems are moved into DW after specific intervals  Every business transaction don‟t update in DW  Data from DW is not deleted  Data is neither changed by individual transactions
  • 14. Shanu Sharma, CSE-ASET Subject Oriented Organized along the lines of the subjects of the corporation. Typical subjects are customer, product, vendor and transaction. Time-Variant Every record in the data warehouse has some form of time variancy attached to it. Non-Volatile Refers to the inability of data to be updated. Every record in the data warehouse is time stamped in one form or another.
  • 15. Shanu Sharma, CSE-ASET DATA GRANULARITY Data granularity refers to the level of details of data in data warehouse. The lower the level of details, the finer is the data granularity.
  • 16. Shanu Sharma, CSE-ASET DATA WAREHOUSES AND DATA MARTS  In 1998 Bill Inmon stated , “The single most important issue facing the IT manager this year is whether to build the data warehouse first or the data mart first”. How are they different ?
  • 18. Shanu Sharma, CSE-ASET  In any organization for managing data for analysis purpose there are basically two approaches. 1. Top Down Approach The centralized data warehouse would feed the dependent data marts that may be designed based on a dimensional data model. In this approach data in the data warehouse is stored at the lowest level of granularity based on a normalized data model.
  • 19. Shanu Sharma, CSE-ASET Advantages:  An enterprise view of data  Not a union of disparate data marts  Centralized rules and control Disadvantages:  Slow approach  High exposure to risk of failure
  • 20. Shanu Sharma, CSE-ASET 2. Bottom Up Approach In this approach first data marts are created to provide analytical capability for specific business subjects based on dimension data model. Then these data marts are joined or unioned by conforming the dimensions to create a DW. Advantages:  Faster and easier implementation  Less risk of failure  Allows project team to learn and grow Disadvantages:  Redundant data in every data mart.  Inconsistent data
  • 21. Shanu Sharma, CSE-ASET DW: BUILDING BLOCKS OR COMPONENTS
  • 22. Shanu Sharma, CSE-ASET 1. SOURCE DATA COMPONENT  Production data Comes from various operational systems of the enterprise.  Internal Data Like private documents, customer profiles, departmental databases etc.  External Data Statistics data produced by external agencies. Used for comparing performance against other organizations.  Archived Data In every operational systems, the old data periodically stored in archived files or on disk storage. This data is also required as the data warehouse keeps historical snapshots of data.
  • 23. Shanu Sharma, CSE-ASET 2. DATA STAGING COMPONENT After data is extracted, data is to be prepared Data extracted from sources needs to be changed, converted and made ready in suitable format  Three major functions to make data ready  Extract  Transform  Load  Staging area provides a place and area with a set of functions to  Clean  Change  Combine  Convert
  • 24. Shanu Sharma, CSE-ASET Different techniques are used for extracting data from different data sources. Data transformation includes Data cleaning- like correction of misselling, resolution of conflicts, providing default values for missing data elements etc, remove duplication. Standardization of Data- standardize data types, field length. Semantic standardization like resolving synonyms and homonyms. Sorting, Merging etc.
  • 25. Shanu Sharma, CSE-ASET Data Loading: Data Movement to the Data Warehouse
  • 26. Shanu Sharma, CSE-ASET 3. DATA STORAGE COMPONENTS  Separate repository  Data structured for efficient processing  Updated after specific periods  Only read-only
  • 27. Shanu Sharma, CSE-ASET 4. INFORMATION DELIVERY COMPONENT  It includes various methods of delivering information on the basis of users. Ex.  Ad hoc reports or predefined reports for novice and casual users.  Statistical analysis for business analyst.  It also provides information to data mining applications.
  • 29. Shanu Sharma, CSE-ASET METADATA COMPONENT  Metadata component is the data about the data in the data warehouse.  Metadata in a data warehouse contains the answers to questions about the data in the data warehouse.  It serves as a directory of the contents of the data warehouse
  • 30. Shanu Sharma, CSE-ASET TYPES OF METADATA  Operational Metadata Contains information about the operational data sources like field lengths, data types etc.  Extraction and Transformation Metadata extraction frequencies, extraction methods etc.  End-User Metadata
  • 31. Shanu Sharma, CSE-ASET TYPES & TYPICAL APPLICATIONS OF DWH
  • 32. 32 APPLICATION AREAS Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
  • 33. Shanu Sharma, CSE-ASET TYPICAL APPLICATIONS Impact on organization‟s core business is to streamline and maximize profitability.  Fraud detection.  Profitability analysis.  Direct mail/database marketing.  Credit risk prediction.  Yield management.  Inventory management. .
  • 34. Shanu Sharma, CSE-ASET TYPICAL APPLICATIONS Fraud detection  By observing data usage patterns.  People have typical purchase patterns.  Deviation from patterns.  Certain cities notorious for fraud.  Certain items bought by stolen cards.  Similar behavior for stolen phone cards.
  • 35. Shanu Sharma, CSE-ASET TYPICAL APPLICATIONS Profitability Analysis  Banks know if they are profitable or not.  Don‟t know which customers are profitable.  Typically more than 50% are NOT profitable.  Don‟t know which one?  Balance is not enough, transactional behavior is the key.  Restructure products and pricing strategies.  Life-time profitability models (next 3-5 years).
  • 36. Shanu Sharma, CSE-ASET TYPICAL APPLICATIONS Direct mail marketing  Targeted marketing.  Offering high bandwidth package NOT to all users.  Know from call detail records of web surfing.  Saves marketing expense, saving pennies.  Knowing your customers better.
  • 37. Shanu Sharma, CSE-ASET TYPICAL APPLICATIONS Credit risk prediction  Who should get a loan?  Qualitative decision making NOT subjective.  Different interest rates for different customers.  Do not subsidize bad customer on the basis of good.
  • 38. Shanu Sharma, CSE-ASET TYPICAL APPLICATIONS Yield Management  Works for fixed inventory businesses.  Item prices vary for varying customers.  Example: Air Lines, Hotels etc.  Price of (say) Air Ticket depends on:  How much in advance ticket was bought?  How many vacant seats were present?  How profitable is the customer?  Ticket is one-way or return?
  • 39. Shanu Sharma, CSE-ASET RECENT APPLICATION Agriculture Systems  Agri and related data collected for decades.  Decision making based on expert judgment.  Lack of integration results in underutilization.  What is required, in which amount and when?
  • 40. 40 DATA WAREHOUSE VS. OLTP OLTP (On Line Transaction Processing) Select tx_date, balance from tx_table Where account_ID = 23876;
  • 41. 41 DATA WAREHOUSE VS. OLTP DWH Select balance, age, sal, gender from customer_table, tx_table Where age between (30 and 40) and Education = „graduate‟ and CustID.customer_table = Customer_ID.tx_table;
  • 42. 42 DATA WAREHOUSE VS. OLTP OLTP DWH Primary key used Primary key NOT used No concept of Primary Index Primary index used Few rows returned Many rows returned May use a single table Uses multiple tables High selectivity of query Low selectivity of query Indexing on primary key (unique) Indexing on primary index (non-unique)
  • 43. Shanu Sharma, CSE-ASET43 COMPARISON OF RESPONSE TIMES  On-line analytical processing (OLAP) queries must be executed in a small number of seconds.  Often requires denormalization and/or sampling.  Complex query scripts and large list selections can generally be executed in a small number of minutes.  Sophisticated clustering algorithms (e.g., data mining) can generally be executed in a small number of hours (even for hundreds of thousands of customers).
  • 44. Shanu Sharma, CSE-ASET44 DATA WAREHOUSE FOR DECISION SUPPORT & OLAP  Putting Information technology to help the knowledge worker make faster and better decisions  Which of my customers are most likely to go to the competition?  What product promotions have the biggest impact on revenue?  How did the share price of software companies correlate with profits over last 10 years?