SlideShare une entreprise Scribd logo
1  sur  40
Data
Warehousing
represented by
Murli
Data Warehousing
•Aims of information technology:
• To help workers in their everyday business activity
and improve their productivity – clerical data
processing tasks
• To help knowledge Employee (executives,
managers, analysts) make faster and better decisions
– decision support systems
•Two types of applications:
• Operational applications
• Analytical applications
•In most organizations, data about specific parts of
business is there - lots and lots of data, somewhere, in
some form.
•Data is available but not information -- and not the
right information at the right time.
•There is a need to
• bring together information .
• off-load decision support applications from the on-line
transaction system
Data Warehousing (Contd..)
Data Warehouse
•“A data warehouse is a subject-oriented, integrated, time-
variant, and nonvolatile collection of data in support of
management’s decision-making process.” --- W. H. Inmon
•Collection of data that is used primarily in organizational
decision making
•A decision support database that is maintained separately
from the organization’s operational database
Data Warehouse - Subject
Oriented
•Data that gives information about a particular subject.
•Data for Model& Analysis.
•Provide a simple and concise view around particular
subject issues by excluding data that are not useful in the
decision support process.
Data Warehouse – Integrated
•It Constructed by integrating multiple, heterogeneous
data sources.
•Data cleaning and data integration techniques are
applied.
•When data is moved to the warehouse, it is converted
-
•
Data Warehouse - Time Variant
•Data is stable in a data warehouse.
•Its adds historical as well as current data.
•Every key structure in the data warehouse -
Contains an element of time, explicitly or implicitly
•But the key of operational data may or may not
contain “time element”.
Data Warehouse - Non-Volatile
• A physically separate store of data transformed from
the operational environment.
• No update & delete on historical data .
•Operational update of data does not occur in the data
warehouse
•Appended
• Initial loading of data and access of data.
Data modifications & schema
design
• A data warehouse is updated on a regular
basis by the ETL process (run nightly or weekly)
using bulk data modification techniques.
• Data warehouses often use denormalized or
partially denormalized schemas (such as a star
schema) to optimize query performance.
Why Separate Data Warehouse?
•Separate & historical data are needed for decision support.
•Complex decision .
•Missing Data.
•Data consolidation
.
•Data quality.
Advantages of Data Warehousing
•High query performance
•Queries not visible outside warehouse
•Local processing at sources unaffected
•Can operate when sources unavailable
•Can query data not stored in a DBMS
•Extra information at warehouse
• Modify, summarize (store aggregates)
• Add historical information
Decision Support System
• Information technology to help knowledge employees
(executives, managers, analysts) make faster and
better decisions
• OLAP is an element of decision support system
• Data mining is a powerful, high-performance data
analysis tool for decision support.
Three-Tier Decision Support
Systems•Warehouse database server
• Almost always a relational DBMS, rarely flat files
•OLAP servers
• Relational OLAP (ROLAP): extended relational DBMS that
maps operations on multidimensional data to standard
relational operators
• Multidimensional OLAP (MOLAP): special-purpose server
that directly implements multidimensional data and operations
•Clients
• Query and reporting tools
• Analysis tools
• Data mining tools
The Complete Decision Support
System
Information Sources Data Warehouse
Server
(Tier 1)
OLAP Servers
(Tier 2)
Clients
(Tier 3)
Operational
DB’s
Semistructured
Sources
extract
transform
load
refresh
etc.
Data Marts
Data
Warehouse
e.g., MOLAP
e.g., ROLAP
serve
OLAP
Query/Reporting
Data Mining
serve
serve
Data Sources
•Data sources are often the operational systems,
providing the lowest level of data.
•Data sources are designed for operational use, not for
decision support, and the data reflect this fact.
•Multiple data sources are often from different systems,
run on a wide range of hardware and much of the
software is built in-house or highly customized.
•Multiple data sources introduce a large number of
issues -- semantic conflicts.
Creating and Maintaining a
Warehouse
•Data warehouse needs several tools that automate or support tasks
such as:
• Data extraction from different external data sources,
operational databases, files of standard applications
• Data cleaning (finding and resolving inconsistency in the
source data)
• inconsistent field lengths, inconsistent descriptions, inconsistent value
assignments, missing entries and violation of integrity constraints.
• optional fields in data entry are significant sources of inconsistent data.
• Integration and transformation of data (between different data
formats, languages, etc.)
• Data loading (loading the data into the data warehouse)
• checking integrity constraints, sorting, summarizing, etc.
• Data replication (replicating source database into the data
warehouse)
• used to incrementally refresh a warehouse when sources change
• Data refreshment
• propagating updates on source data to the data stored in the warehouse
• Periodically or immediately
• Data archiving
Creating and Maintaining a
Warehouse
The Data Warehousing Models
•Enterprise Warehouse
• collects all the information about subjects spanning entire
organization
•Data Mart
• a subset of corporate-wide data that is of value to a specific
group of users
• its scope is confined to specific, selected groups, such as
marketing data mart
• Independent Vs. Dependent (directly from warehouse) data
mart
•Virtual warehouse
• a set of views over operational databases
• only some summary views are materialized
Physical Structure of Data
Warehouse
•There are three basic architectures for constructing a
data warehouse:
• Centralized
• Distributed
• Federated
• Tiered
•The data warehouse is distributed for: load
balancing, scalability and higher availability
The logical data
warehouse is
only virtual
•The central data
warehouse is physical
•There exist local data
marts on different tiers
which store copies or
summarization of the
previous tier.
Physical Structure of Data
Warehouse
(Contd..)
Data Processing Models
•There are two basic data processing models:
• OLTP (On-Line Transaction Processing)
• Describes processing at operational sites
• aim is reliable and efficient processing of a large number
of transactions and ensuring data consistency.
• OLAP (On-Line Analytical Processing)
• Describes processing at warehouse
• aim is efficient multidimensional processing of large data
volumes.
OLTP vs. OLAP
• OLTP OLAP
•users Clerk, IT professional Knowledge worker
•Function day to day operations decision support
•DB design application-oriented subject-oriented
•data current, up-to-date historical, summarized
• detailed, flat relational multidimensional
• isolated integrated,
consolidated
•usage repetitive ad-hoc
•access read/write, lots of scans
• index/hash on prim. key
•unit of work short, simple transaction complex query
•# records accessed tens millions
•#users thousands hundreds
•DB size 100MB-GB 100GB-TB
•metric transaction throughput query throughput, response
OLAP
•Main goal: support ad-hoc but complex querying
performed by business analysts
•Interactive process of creating, managing, analyzing
and reporting on data
•Extends spreadsheet-like analysis to work with huge
amounts of data in a data warehouse
•Data exploration and aggregation in various ways
•Typical applications include accessing the effectiveness
of a marketing campaign, product sales forecasting, spot
trends
•Allows a sophisticated user to analyse data using complex,
multi-dimensional views
•Place key performance indicators (measures) into context
(dimensions)
• Measures are pre-aggregated
• Data retrieval is significantly faster
•The proposed cube is made available to business analysts
who can browse the data using a variety of tools, making ad
hoc interatctive and analytical processing
OLAP (Contd..)
OLAP Server Architectures
•Relational OLAP (ROLAP):
• Use relational or extended-relational DBMS to store and manage
warehouse data and OLAP middleware to support missing pieces
• Include optimization of DBMS backend, implementation of
aggregation navigation logic, and additional tools and services
• Greater scalability
• schema design: Star, Snowflake, Fact Constellation
•Multidimensional OLAP (MOLAP):
• Array based multidimensional storage engine (sparse matrix
techniques)
• Fast indexing to pre-computed summarized data
• Schema design: Cube
•Hybrid OLAP (HOLAP):
• User flexibility - low level: relational, high level:array
ROLAP
•Special schema design: snow flake
•Special indexes: bitmap, multi-table join
•Proven technology (relational models, DBMS)
• Tend to outperform specialized MDDB especially
on large data sets
•Products
• IBM DB2, Oracle, Sybase IQ, RedBrick, Informix
Measures and Dimensions
•Measures: key performance indicators that you want to
evaluate
• Typically numerical, including volume, sales and cost
• A rule of thumb: if a number makes business sense
when aggregated, then it is a measure
• Examples
• Aggregate daily volume to month, quarter and year
• Aggregating telephone numbers would not make sense-
not measures
• Affects what should be stored in the data warehouse
Measures and Dimensions
(Contd..)
•Dimensions: categories of data analysis
• Typical dimensions include product, time,
region
• A rule of thumb: when a report is requested
“by” something, that something is usually a
dimension
• Example
• Sales report: view sales by month, by region
• Two dimensions needed are time and region
Conceptual Modeling
•Star schema.
•Snowflake schema.
•Fact constellations, or Galaxy schema .
Sta
r
Star Schema
•Fact table
•Dimension tables
•Measures
•A single fact table and for each dimension one dimension table
•Does not capture hierarchies directly
tim
e ite
m
time_ke
y day item_ke
yday_of_the_wee
k
Sales Fact
Table
item_nam
emont
h
bran
dquarte
r
time_ke
y
typ
eyea
r
supplier_typ
e
item_ke
ybranch_ke
y locatio
n
branc
h
location_ke
y location_ke
y
branch_ke
y
units_sol
d
stree
t
branch_nam
e dollars_sol
d
cit
y
branch_typ
e province_or_stree
t
avg_sale
s
countr
y
Measure
s
1
2
Example - Star
Schema
Dimension Hierarchies
stor
e
sTyp
e
city regio
n
•snowflake schema
•constellations
tim
e ite
m
time_ke
y day item_ke
y
supplie
r
Sales Fact
Table
day_of_the_wee
k
item_nam
e
supplier_ke
y
mont
h
bran
d
time_ke
y
supplier_typ
e
quarte
r
typ
eyea
r
item_ke
y
supplier_ke
ybranch_ke
ybranc
h
location_ke
ybranch_ke
y
locatio
n
units_sol
d
branch_nam
e
location_ke
y
dollars_sol
d
branch_typ
e
cit
y
stree
tavg_sale
s
city_ke
y
city_ke
y city
Measure
s
province_or_stree
tcountr
y
1
3
•Represent dimensional hierarchy directly by normalizing tables.
•Easy to maintain and saves storage
Example of Snowflake
Schema
tim
e Shipping Fact
Table
ite
m
time_ke
y day time_ke
y
item_ke
yday_of_the_wee
k
Sales Fact
Table
item_nam
e
item_ke
y
mont
h
bran
dquarte
r
time_ke
y
shipper_ke
y
typ
eyea
r
supplier_typ
e
item_ke
y
from_locatio
nbranch_ke
y
to_locatio
nlocatio
n
branc
h
location_ke
y
dollars_cos
tlocation_ke
y
branch_ke
y
units_sol
d
units_shippe
d
stree
t
branch_nam
e dollars_sol
d
cit
y
branch_typ
e province_or_stree
t
avg_sale
s
countr
y
shippe
rMeasure
s
shipper_ke
yshipper_nam
e
location_key
shipper_type 1
4
Multiple fact tables that share many dimension tables
Example of Fact
Constellation
Aggregates
• Add up amounts for day 1
In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
8
1
Aggregates (Contd..)
• Add up amounts by day
In SQL: SELECT date, sum(amt) FROM
SALE
GROUP BY date
• Add up amounts by day, product
In SQL: SELECT date, sum(amt) FROM
SALE
GROUP BY date, prodId
drill-down
rollu
p
Aggregates (Contd..)
Points to be noticed
about ROLAP
•Defines complex, multi-dimensional data with simple model
•Reduces the number of joins a query has to process
•Allows the data warehouse to evolve with relatively low
maintenance
•Can contain both detailed and summarized data.
•ROLAP is based on familiar, proven, and already selected
technologies.
•BUT!!!
•SQL for multi-dimensional manipulation of calculations.
Thank You

Contenu connexe

Tendances

Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
A P
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
ashok kumar
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Eyad Manna
 

Tendances (20)

Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubey
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Enterprise Data Warehousing Positioning
Enterprise Data Warehousing PositioningEnterprise Data Warehousing Positioning
Enterprise Data Warehousing Positioning
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its concepts
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
Business intelligence and data warehouses
Business intelligence and data warehousesBusiness intelligence and data warehouses
Business intelligence and data warehouses
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Chapter 5 data resource management
Chapter 5  data resource managementChapter 5  data resource management
Chapter 5 data resource management
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Oracle: DW Design
Oracle: DW DesignOracle: DW Design
Oracle: DW Design
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
OLAP
OLAPOLAP
OLAP
 

En vedette (13)

Tong fen mu jian fa
Tong fen mu   jian faTong fen mu   jian fa
Tong fen mu jian fa
 
Coordination
CoordinationCoordination
Coordination
 
Tong fen mu jian fa
Tong fen mu   jian faTong fen mu   jian fa
Tong fen mu jian fa
 
Yifenmu jia fa
Yifenmu jia faYifenmu jia fa
Yifenmu jia fa
 
Tarea 1- Tecn. aplicadas a la Educacion.
Tarea 1- Tecn. aplicadas a la Educacion.Tarea 1- Tecn. aplicadas a la Educacion.
Tarea 1- Tecn. aplicadas a la Educacion.
 
1
11
1
 
matematik pecahan
matematik pecahanmatematik pecahan
matematik pecahan
 
Yifenmu jia fa
Yifenmu jia faYifenmu jia fa
Yifenmu jia fa
 
Multipart verbs
Multipart verbsMultipart verbs
Multipart verbs
 
Cas ti
Cas tiCas ti
Cas ti
 
Tong fen mu
Tong fen muTong fen mu
Tong fen mu
 
Tarea
TareaTarea
Tarea
 
Akamai
AkamaiAkamai
Akamai
 

Similaire à Data warehouse introduction

ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
ParnalSatle
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousing
Ahmad Shlool
 

Similaire à Data warehouse introduction (20)

DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Lecture1
Lecture1Lecture1
Lecture1
 
Datawarehouse org
Datawarehouse orgDatawarehouse org
Datawarehouse org
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
 
data warehousing
data warehousingdata warehousing
data warehousing
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha Durganathan
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousing
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousing
 
Management information system database management
Management information system database managementManagement information system database management
Management information system database management
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL Server
 

Dernier

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Dernier (20)

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 

Data warehouse introduction

  • 2. Data Warehousing •Aims of information technology: • To help workers in their everyday business activity and improve their productivity – clerical data processing tasks • To help knowledge Employee (executives, managers, analysts) make faster and better decisions – decision support systems •Two types of applications: • Operational applications • Analytical applications
  • 3. •In most organizations, data about specific parts of business is there - lots and lots of data, somewhere, in some form. •Data is available but not information -- and not the right information at the right time. •There is a need to • bring together information . • off-load decision support applications from the on-line transaction system Data Warehousing (Contd..)
  • 4. Data Warehouse •“A data warehouse is a subject-oriented, integrated, time- variant, and nonvolatile collection of data in support of management’s decision-making process.” --- W. H. Inmon •Collection of data that is used primarily in organizational decision making •A decision support database that is maintained separately from the organization’s operational database
  • 5. Data Warehouse - Subject Oriented •Data that gives information about a particular subject. •Data for Model& Analysis. •Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.
  • 6. Data Warehouse – Integrated •It Constructed by integrating multiple, heterogeneous data sources. •Data cleaning and data integration techniques are applied. •When data is moved to the warehouse, it is converted - •
  • 7. Data Warehouse - Time Variant •Data is stable in a data warehouse. •Its adds historical as well as current data. •Every key structure in the data warehouse - Contains an element of time, explicitly or implicitly •But the key of operational data may or may not contain “time element”.
  • 8. Data Warehouse - Non-Volatile • A physically separate store of data transformed from the operational environment. • No update & delete on historical data . •Operational update of data does not occur in the data warehouse •Appended • Initial loading of data and access of data.
  • 9. Data modifications & schema design • A data warehouse is updated on a regular basis by the ETL process (run nightly or weekly) using bulk data modification techniques. • Data warehouses often use denormalized or partially denormalized schemas (such as a star schema) to optimize query performance.
  • 10. Why Separate Data Warehouse? •Separate & historical data are needed for decision support. •Complex decision . •Missing Data. •Data consolidation . •Data quality.
  • 11. Advantages of Data Warehousing •High query performance •Queries not visible outside warehouse •Local processing at sources unaffected •Can operate when sources unavailable •Can query data not stored in a DBMS •Extra information at warehouse • Modify, summarize (store aggregates) • Add historical information
  • 12. Decision Support System • Information technology to help knowledge employees (executives, managers, analysts) make faster and better decisions • OLAP is an element of decision support system • Data mining is a powerful, high-performance data analysis tool for decision support.
  • 13. Three-Tier Decision Support Systems•Warehouse database server • Almost always a relational DBMS, rarely flat files •OLAP servers • Relational OLAP (ROLAP): extended relational DBMS that maps operations on multidimensional data to standard relational operators • Multidimensional OLAP (MOLAP): special-purpose server that directly implements multidimensional data and operations •Clients • Query and reporting tools • Analysis tools • Data mining tools
  • 14. The Complete Decision Support System Information Sources Data Warehouse Server (Tier 1) OLAP Servers (Tier 2) Clients (Tier 3) Operational DB’s Semistructured Sources extract transform load refresh etc. Data Marts Data Warehouse e.g., MOLAP e.g., ROLAP serve OLAP Query/Reporting Data Mining serve serve
  • 15. Data Sources •Data sources are often the operational systems, providing the lowest level of data. •Data sources are designed for operational use, not for decision support, and the data reflect this fact. •Multiple data sources are often from different systems, run on a wide range of hardware and much of the software is built in-house or highly customized. •Multiple data sources introduce a large number of issues -- semantic conflicts.
  • 16. Creating and Maintaining a Warehouse •Data warehouse needs several tools that automate or support tasks such as: • Data extraction from different external data sources, operational databases, files of standard applications • Data cleaning (finding and resolving inconsistency in the source data) • inconsistent field lengths, inconsistent descriptions, inconsistent value assignments, missing entries and violation of integrity constraints. • optional fields in data entry are significant sources of inconsistent data.
  • 17. • Integration and transformation of data (between different data formats, languages, etc.) • Data loading (loading the data into the data warehouse) • checking integrity constraints, sorting, summarizing, etc. • Data replication (replicating source database into the data warehouse) • used to incrementally refresh a warehouse when sources change • Data refreshment • propagating updates on source data to the data stored in the warehouse • Periodically or immediately • Data archiving Creating and Maintaining a Warehouse
  • 18. The Data Warehousing Models •Enterprise Warehouse • collects all the information about subjects spanning entire organization •Data Mart • a subset of corporate-wide data that is of value to a specific group of users • its scope is confined to specific, selected groups, such as marketing data mart • Independent Vs. Dependent (directly from warehouse) data mart •Virtual warehouse • a set of views over operational databases • only some summary views are materialized
  • 19. Physical Structure of Data Warehouse •There are three basic architectures for constructing a data warehouse: • Centralized • Distributed • Federated • Tiered •The data warehouse is distributed for: load balancing, scalability and higher availability
  • 20. The logical data warehouse is only virtual •The central data warehouse is physical •There exist local data marts on different tiers which store copies or summarization of the previous tier. Physical Structure of Data Warehouse (Contd..)
  • 21. Data Processing Models •There are two basic data processing models: • OLTP (On-Line Transaction Processing) • Describes processing at operational sites • aim is reliable and efficient processing of a large number of transactions and ensuring data consistency. • OLAP (On-Line Analytical Processing) • Describes processing at warehouse • aim is efficient multidimensional processing of large data volumes.
  • 22. OLTP vs. OLAP • OLTP OLAP •users Clerk, IT professional Knowledge worker •Function day to day operations decision support •DB design application-oriented subject-oriented •data current, up-to-date historical, summarized • detailed, flat relational multidimensional • isolated integrated, consolidated •usage repetitive ad-hoc •access read/write, lots of scans • index/hash on prim. key •unit of work short, simple transaction complex query •# records accessed tens millions •#users thousands hundreds •DB size 100MB-GB 100GB-TB •metric transaction throughput query throughput, response
  • 23. OLAP •Main goal: support ad-hoc but complex querying performed by business analysts •Interactive process of creating, managing, analyzing and reporting on data •Extends spreadsheet-like analysis to work with huge amounts of data in a data warehouse •Data exploration and aggregation in various ways •Typical applications include accessing the effectiveness of a marketing campaign, product sales forecasting, spot trends
  • 24. •Allows a sophisticated user to analyse data using complex, multi-dimensional views •Place key performance indicators (measures) into context (dimensions) • Measures are pre-aggregated • Data retrieval is significantly faster •The proposed cube is made available to business analysts who can browse the data using a variety of tools, making ad hoc interatctive and analytical processing OLAP (Contd..)
  • 25. OLAP Server Architectures •Relational OLAP (ROLAP): • Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middleware to support missing pieces • Include optimization of DBMS backend, implementation of aggregation navigation logic, and additional tools and services • Greater scalability • schema design: Star, Snowflake, Fact Constellation •Multidimensional OLAP (MOLAP): • Array based multidimensional storage engine (sparse matrix techniques) • Fast indexing to pre-computed summarized data • Schema design: Cube •Hybrid OLAP (HOLAP): • User flexibility - low level: relational, high level:array
  • 26. ROLAP •Special schema design: snow flake •Special indexes: bitmap, multi-table join •Proven technology (relational models, DBMS) • Tend to outperform specialized MDDB especially on large data sets •Products • IBM DB2, Oracle, Sybase IQ, RedBrick, Informix
  • 27. Measures and Dimensions •Measures: key performance indicators that you want to evaluate • Typically numerical, including volume, sales and cost • A rule of thumb: if a number makes business sense when aggregated, then it is a measure • Examples • Aggregate daily volume to month, quarter and year • Aggregating telephone numbers would not make sense- not measures • Affects what should be stored in the data warehouse
  • 28. Measures and Dimensions (Contd..) •Dimensions: categories of data analysis • Typical dimensions include product, time, region • A rule of thumb: when a report is requested “by” something, that something is usually a dimension • Example • Sales report: view sales by month, by region • Two dimensions needed are time and region
  • 29. Conceptual Modeling •Star schema. •Snowflake schema. •Fact constellations, or Galaxy schema .
  • 30. Sta r
  • 31. Star Schema •Fact table •Dimension tables •Measures •A single fact table and for each dimension one dimension table •Does not capture hierarchies directly
  • 32. tim e ite m time_ke y day item_ke yday_of_the_wee k Sales Fact Table item_nam emont h bran dquarte r time_ke y typ eyea r supplier_typ e item_ke ybranch_ke y locatio n branc h location_ke y location_ke y branch_ke y units_sol d stree t branch_nam e dollars_sol d cit y branch_typ e province_or_stree t avg_sale s countr y Measure s 1 2 Example - Star Schema
  • 34. tim e ite m time_ke y day item_ke y supplie r Sales Fact Table day_of_the_wee k item_nam e supplier_ke y mont h bran d time_ke y supplier_typ e quarte r typ eyea r item_ke y supplier_ke ybranch_ke ybranc h location_ke ybranch_ke y locatio n units_sol d branch_nam e location_ke y dollars_sol d branch_typ e cit y stree tavg_sale s city_ke y city_ke y city Measure s province_or_stree tcountr y 1 3 •Represent dimensional hierarchy directly by normalizing tables. •Easy to maintain and saves storage Example of Snowflake Schema
  • 35. tim e Shipping Fact Table ite m time_ke y day time_ke y item_ke yday_of_the_wee k Sales Fact Table item_nam e item_ke y mont h bran dquarte r time_ke y shipper_ke y typ eyea r supplier_typ e item_ke y from_locatio nbranch_ke y to_locatio nlocatio n branc h location_ke y dollars_cos tlocation_ke y branch_ke y units_sol d units_shippe d stree t branch_nam e dollars_sol d cit y branch_typ e province_or_stree t avg_sale s countr y shippe rMeasure s shipper_ke yshipper_nam e location_key shipper_type 1 4 Multiple fact tables that share many dimension tables Example of Fact Constellation
  • 36. Aggregates • Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 8 1
  • 37. Aggregates (Contd..) • Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date
  • 38. • Add up amounts by day, product In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId drill-down rollu p Aggregates (Contd..)
  • 39. Points to be noticed about ROLAP •Defines complex, multi-dimensional data with simple model •Reduces the number of joins a query has to process •Allows the data warehouse to evolve with relatively low maintenance •Can contain both detailed and summarized data. •ROLAP is based on familiar, proven, and already selected technologies. •BUT!!! •SQL for multi-dimensional manipulation of calculations.

Notes de l'éditeur

  1. Data modificationsThe end users of a data warehouse do not directly update the data warehouse.