SlideShare a Scribd company logo
1 of 2
Download to read offline
Data Mining Lecture 2
What is Data Warehouse?
Defined in many different ways, but not
rigorously
 A decision support database that is
maintained separately from the organisation’s
operational database
 Support information processing by providing
a solid platform of consolidated, historical
data for analysis
Definition by Inmon
 “A data warehouse is a subject-oriented,
integrated, time-variant, and non-volatile
collection of data in support of management’s
decision-making process”
Data warehousing
 The process of constructing and using data
warehouses
Data Warehouse—Subject-Oriented
 Organised around major subjects, such as
customer, product, sales
 Focusing on the modelling and analysis of
data for decision makers, not on daily
operations or transaction processing
 Provide a simple and concise view around
particular subject issues by excluding data
that are not useful in the decision support
process
Data Warehouse—Integrated
Constructed by integrating multiple,
heterogeneous data sources
 relational databases, flat files, on-line
transaction records
Data cleaning and data integration
techniques are applied
 Ensure consistency in naming conventions,
encoding structures, attribute measures, etc.
among different data sources
o E.g., Hotel price: currency, tax,
breakfast covered, etc.
 When data is moved to the warehouse, it is
converted
Data Warehouse—Time Variant
The time horizon for the data warehouse is
significantly
longer than that of operational systems
 Operational database: current value data
 Data warehouse data: provide information
from a historical perspective (e.g., past 5-10
years)
Every key structure in the data warehouse
 Contains an element of time, explicitly or
implicitly
 But the key of operational data may or may
not contain “time element”
Data Warehouse—Non-Volatile
- Physically separate stores of data
transformed from the operational
environment
- Operational update of data does not
occur in the data warehouse
environment
 Does not require transaction processing,
recovery, and concurrency control
mechanisms
 Requires only two operations in data
accessing: initial loading of data and access of
data
Data Warehouse vs. Heterogeneous
DB
Traditional heterogeneous DB integration
- Build wrappers/mediators on top of
heterogeneous databases
- Query driven approach
o When a query is posed to a client site,
a meta-dictionary is used to translate
the query into queries appropriate for
individual heterogeneous sites
involved, and the results are
integrated into a global answer set
o Complex information filtering,
compete for resources
Data warehouse
- update-driven, high performance
- Information from heterogeneous sources is
integrated in advance and stored in
warehouses for direct access and analysis
Data Warehouse vs. Operational
DB
OLTP (On-Line Transaction Processing)
 Major task of traditional relational DB
 Day-to-day operations: purchasing, inventory,
banking, manufacturing, payroll, registration,
accounting, etc.
OLAP (On-Line Analytical Processing)
 Major task of data warehouse system
 Data analysis and decision making
Distinct features (OLTP vs. OLAP)
 User and system orientation: customer vs.
market
 Data contents: current, detailed vs. historical,
consolidated
 Database design: ER + application vs. star +
subject
 View: current, local vs. evolutionary,
integrated
 Access patterns: update vs. read-only but
complex queries
OLTP vs. OLAP
Why Separate Data Warehouse?
High performance for both systems
 DBMS— tuned for OLTP
-access methods, indexing,
concurrency control, recovery
 Warehouse—tuned for OLAP
-complex OLAP queries,
multidimensional view, consolidation
Different functions and different data
 Missing data: Decision support requires
historical data which operational DBs do not
typically maintain
 Data consolidation: DS requires consolidation
(aggregation, summarisation) of data from
heterogeneous sources
 Data quality: different sources typically use
inconsistent data representations, codes and
formats which have to be reconciled

More Related Content

What's hot

Prescriptive analytics
Prescriptive analyticsPrescriptive analytics
Prescriptive analyticsIpsita Kulari
 
Lesson 2 network database system
Lesson 2 network database systemLesson 2 network database system
Lesson 2 network database systemGiO Friginal
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
Data pre processing
Data pre processingData pre processing
Data pre processingjunnubabu
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data MiningValerii Klymchuk
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouseJ M
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data modeljagdish_93
 

What's hot (20)

Data mining notes
Data mining notesData mining notes
Data mining notes
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Prescriptive analytics
Prescriptive analyticsPrescriptive analytics
Prescriptive analytics
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
Data Mining Technique - SEMMA
Data Mining Technique - SEMMAData Mining Technique - SEMMA
Data Mining Technique - SEMMA
 
Lesson 2 network database system
Lesson 2 network database systemLesson 2 network database system
Lesson 2 network database system
 
Data cube
Data cubeData cube
Data cube
 
OLAP
OLAPOLAP
OLAP
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 

Similar to Data Mining Lecture 2 - What is Data Warehouse

11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3ambujm
 
11667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect411667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect4ambujm
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouseKrish_ver2
 
data warehousing
data warehousingdata warehousing
data warehousing143sohil
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingsumit621
 
Introduction to data warehouse dmbi
Introduction to data warehouse dmbiIntroduction to data warehouse dmbi
Introduction to data warehouse dmbiShaishavShah8
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseSOMASUNDARAM T
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptPalaniKumarR2
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptSumathiG8
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptSamPrem3
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodellingmeghu123
 

Similar to Data Mining Lecture 2 - What is Data Warehouse (20)

11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3
 
11667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect411667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect4
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
Chpt2.ppt
Chpt2.pptChpt2.ppt
Chpt2.ppt
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Introduction to data warehouse dmbi
Introduction to data warehouse dmbiIntroduction to data warehouse dmbi
Introduction to data warehouse dmbi
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
2. olap warehouse
2. olap warehouse2. olap warehouse
2. olap warehouse
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodelling
 

Recently uploaded

ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 

Recently uploaded (20)

ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 

Data Mining Lecture 2 - What is Data Warehouse

  • 1. Data Mining Lecture 2 What is Data Warehouse? Defined in many different ways, but not rigorously  A decision support database that is maintained separately from the organisation’s operational database  Support information processing by providing a solid platform of consolidated, historical data for analysis Definition by Inmon  “A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process” Data warehousing  The process of constructing and using data warehouses Data Warehouse—Subject-Oriented  Organised around major subjects, such as customer, product, sales  Focusing on the modelling and analysis of data for decision makers, not on daily operations or transaction processing  Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process Data Warehouse—Integrated Constructed by integrating multiple, heterogeneous data sources  relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied  Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources o E.g., Hotel price: currency, tax, breakfast covered, etc.  When data is moved to the warehouse, it is converted Data Warehouse—Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems  Operational database: current value data  Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Every key structure in the data warehouse  Contains an element of time, explicitly or implicitly  But the key of operational data may or may not contain “time element” Data Warehouse—Non-Volatile - Physically separate stores of data transformed from the operational environment - Operational update of data does not occur in the data warehouse environment  Does not require transaction processing, recovery, and concurrency control mechanisms  Requires only two operations in data accessing: initial loading of data and access of data Data Warehouse vs. Heterogeneous DB Traditional heterogeneous DB integration - Build wrappers/mediators on top of heterogeneous databases - Query driven approach o When a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set o Complex information filtering, compete for resources Data warehouse - update-driven, high performance - Information from heterogeneous sources is integrated in advance and stored in warehouses for direct access and analysis
  • 2. Data Warehouse vs. Operational DB OLTP (On-Line Transaction Processing)  Major task of traditional relational DB  Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. OLAP (On-Line Analytical Processing)  Major task of data warehouse system  Data analysis and decision making Distinct features (OLTP vs. OLAP)  User and system orientation: customer vs. market  Data contents: current, detailed vs. historical, consolidated  Database design: ER + application vs. star + subject  View: current, local vs. evolutionary, integrated  Access patterns: update vs. read-only but complex queries OLTP vs. OLAP Why Separate Data Warehouse? High performance for both systems  DBMS— tuned for OLTP -access methods, indexing, concurrency control, recovery  Warehouse—tuned for OLAP -complex OLAP queries, multidimensional view, consolidation Different functions and different data  Missing data: Decision support requires historical data which operational DBs do not typically maintain  Data consolidation: DS requires consolidation (aggregation, summarisation) of data from heterogeneous sources  Data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled