SlideShare une entreprise Scribd logo
1  sur  13
DATA WAREHOUSING AND DATA
MINING
INTRODUCTION TO DATA WAREHOUSING
NAME- DEBADITYA GHOSH
UNIVERSITY ROLL- 10900121235
COURSE NAME- DATA WAREHOUSING AND DATA MINING
COURSE CODE- PEC-IT602B
WHAT IS DATA WAREHOUSING?
Defined in many different ways, but not rigorously
• Ø A decision support database that is maintained separately from the organization’s
operational database
• Ø Support information processing by providing a solid platform of consolidated,
historical data for analysis
A single, complete and consistent store of data obtained from a variety of different
sources made available to end users in a what they can understand and use in a business
context. [Barry Devlin]
Alternatively,
A process of transforming data into information and making it available to users in a
timely enough manner to make a difference.
WHY DATA WAREHOUSING?
The railway reservation system has been operational for over a decade and
large amount of data is generated each day on train bookings. Much of this
data is probably archived for audit purposes. This archived operational data
can be effectively used for tactical strategic management of the railways.
For example, by analyzing the reservation data it would be possible to find
out traffic patterns in various sectors and use it to add or remove bogies in
certain trains, to decide on the mix of various classes of accommodation,
etc. For this analysis building a data warehouse is an effective solution.
SOME APPLICATIONS OF DATA
WAREHOUSING
 Retail
• Customer Loyalty
• Market Planning
 Financial
• Risk management
• Fraud Detection
Airlines
• Route Profitability
• Yield management
 Manufacturing
• Cost Reduction
• Logistics management
Utilities
• Asset management
• Resource Management
Government
• Manpower planning
• Cost control
DATABASE VS DATA WAREHOUSING
Database is a collection of related information stored in a structured
form in terms of table so that it makes easier insertion, deletion and
manipulation of data. Database consists of tables that contain
attributes. Whereas a data warehouse is a database system optimized
for reporting and analysis. It generally refers to the combination of
many different databases across entire enterprise. Once the data
entered in the data warehouse, it can be then only loaded, refreshed
and accessed for queries.
STRATEGIC INFORMATION
Who needs strategic information in an enterprise?
What exactly do we mean by strategic information?
The executives and managers who are responsible for keeping the
enterprise competitive need information to make proper decisions need
information to formulate the business strategies, establish goals, set
objectives, and monitor results.
CHARACTERISTICS OF STRATEGIC
INFORMATION
Integrated
Must have a single, enterprise-wide view.
 DATA INTEGRITY
Information must be accurate and must conform to business rules.
 ACCESSIBLE
Easily accessible with intuitive access paths, and responsive for analysis.
 CREDIBLE
Every business factor must have one and only one value.
 TIMELY
Information must be available within the stipulated time frame.
MILESTONES OF DATA
WAREHOUSING
 1983—Teradata introduces a database management system (DBMS) designed for decision-support
systems.
 1988—The article An Architecture for a Business and Information Systems introducing the term
“business data warehouse” is published by Barry Devlin and Paul Murphy in the IBM Systems Journal.
 1990—Red Brick Systems introduces Red Brick Warehouse, a DBMS specifically for data warehousing.
 1991—Bill Inmon publishes his book Building the Data Warehouse
 1991—Prism Solutions introduces Prism Warehouse Manager software for developing a data
warehouse.
 1995—The Data Warehousing Institute, a premier institution that promotes data warehousing is
founded.
 1996—Ralph Kimball publishes a seminal book The Data Warehousing Toolkit.
 1997—Oracle 8, with support for STAR schema queries, is released.
FORMS OF DATA WAREHOUSING
A data warehouse is a
• subject-oriented
• integrated
• time-variant
• nonvolatile
collection of data that is used primarily in organizational decision
making.
DATA WAREHOUSE - SUBJECT-
ORIENTED
Organized around major subjects, such as customer,
product, sales. Focusing on the modeling and analysis of
data for decision makers, not on daily operations or
transaction processing. Provide a simple and concise view
around particular subject issues by excluding data that are
not useful in the decision support process.
DATA WAREHOUSE - INTEGRATED
Constructed by integrating multiple, heterogeneous data sources relational
databases, flat files, on-line transaction records
Data cleaning and data integration techniques are applied.
Ensure consistency in naming conventions, encoding structures, attribute
measures, etc. among different data sources.
E.g., Hotel price: currency, tax etc.
When data is moved to the warehouse, it is converted.
DATA WAREHOUSE – TIME VARIANT
The time horizon for the data warehouse is significantly longer than that of
operational systems.
Operational database: current value data
Data warehouse data: provide information from a historical perspective
(e.g., past 5-10 years)
Every key structure in the data warehouse
Contains an element of time, explicitly or implicitly. But the key of
operational data may or may not contain “time element”
DATA WAREHOUSE – TIME VARIANT
A physically separate store of data transformed from the operational
environment
Operational update of data does not occur in the data warehouse
environment.
Does not require transaction processing, recovery, and
concurrency control mechanisms Requires only two operations in
data accessing: initial loading of data and access of data.

Contenu connexe

Similaire à Data warehouse and data mining.pptx

Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
A P
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
obieefans
 
An Overview On Data Warehousing An Overview On Data Warehousing
An Overview On Data Warehousing An Overview On Data WarehousingAn Overview On Data Warehousing An Overview On Data Warehousing
An Overview On Data Warehousing An Overview On Data Warehousing
BRNSSPublicationHubI
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
work
 

Similaire à Data warehouse and data mining.pptx (20)

Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
Unit 1
Unit 1Unit 1
Unit 1
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Data Warehouse: A Primer
Data Warehouse: A PrimerData Warehouse: A Primer
Data Warehouse: A Primer
 
Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Data Mining
Data MiningData Mining
Data Mining
 
DW 101
DW 101DW 101
DW 101
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
An Overview On Data Warehousing An Overview On Data Warehousing
An Overview On Data Warehousing An Overview On Data WarehousingAn Overview On Data Warehousing An Overview On Data Warehousing
An Overview On Data Warehousing An Overview On Data Warehousing
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
 
Data Catalog as a Business Enabler
Data Catalog as a Business EnablerData Catalog as a Business Enabler
Data Catalog as a Business Enabler
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdf
 
ERP technology Areas.pptx
ERP technology Areas.pptxERP technology Areas.pptx
ERP technology Areas.pptx
 

Dernier

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Dernier (20)

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 

Data warehouse and data mining.pptx

  • 1. DATA WAREHOUSING AND DATA MINING INTRODUCTION TO DATA WAREHOUSING NAME- DEBADITYA GHOSH UNIVERSITY ROLL- 10900121235 COURSE NAME- DATA WAREHOUSING AND DATA MINING COURSE CODE- PEC-IT602B
  • 2. WHAT IS DATA WAREHOUSING? Defined in many different ways, but not rigorously • Ø A decision support database that is maintained separately from the organization’s operational database • Ø Support information processing by providing a solid platform of consolidated, historical data for analysis A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. [Barry Devlin] Alternatively, A process of transforming data into information and making it available to users in a timely enough manner to make a difference.
  • 3. WHY DATA WAREHOUSING? The railway reservation system has been operational for over a decade and large amount of data is generated each day on train bookings. Much of this data is probably archived for audit purposes. This archived operational data can be effectively used for tactical strategic management of the railways. For example, by analyzing the reservation data it would be possible to find out traffic patterns in various sectors and use it to add or remove bogies in certain trains, to decide on the mix of various classes of accommodation, etc. For this analysis building a data warehouse is an effective solution.
  • 4. SOME APPLICATIONS OF DATA WAREHOUSING  Retail • Customer Loyalty • Market Planning  Financial • Risk management • Fraud Detection Airlines • Route Profitability • Yield management  Manufacturing • Cost Reduction • Logistics management Utilities • Asset management • Resource Management Government • Manpower planning • Cost control
  • 5. DATABASE VS DATA WAREHOUSING Database is a collection of related information stored in a structured form in terms of table so that it makes easier insertion, deletion and manipulation of data. Database consists of tables that contain attributes. Whereas a data warehouse is a database system optimized for reporting and analysis. It generally refers to the combination of many different databases across entire enterprise. Once the data entered in the data warehouse, it can be then only loaded, refreshed and accessed for queries.
  • 6. STRATEGIC INFORMATION Who needs strategic information in an enterprise? What exactly do we mean by strategic information? The executives and managers who are responsible for keeping the enterprise competitive need information to make proper decisions need information to formulate the business strategies, establish goals, set objectives, and monitor results.
  • 7. CHARACTERISTICS OF STRATEGIC INFORMATION Integrated Must have a single, enterprise-wide view.  DATA INTEGRITY Information must be accurate and must conform to business rules.  ACCESSIBLE Easily accessible with intuitive access paths, and responsive for analysis.  CREDIBLE Every business factor must have one and only one value.  TIMELY Information must be available within the stipulated time frame.
  • 8. MILESTONES OF DATA WAREHOUSING  1983—Teradata introduces a database management system (DBMS) designed for decision-support systems.  1988—The article An Architecture for a Business and Information Systems introducing the term “business data warehouse” is published by Barry Devlin and Paul Murphy in the IBM Systems Journal.  1990—Red Brick Systems introduces Red Brick Warehouse, a DBMS specifically for data warehousing.  1991—Bill Inmon publishes his book Building the Data Warehouse  1991—Prism Solutions introduces Prism Warehouse Manager software for developing a data warehouse.  1995—The Data Warehousing Institute, a premier institution that promotes data warehousing is founded.  1996—Ralph Kimball publishes a seminal book The Data Warehousing Toolkit.  1997—Oracle 8, with support for STAR schema queries, is released.
  • 9. FORMS OF DATA WAREHOUSING A data warehouse is a • subject-oriented • integrated • time-variant • nonvolatile collection of data that is used primarily in organizational decision making.
  • 10. DATA WAREHOUSE - SUBJECT- ORIENTED Organized around major subjects, such as customer, product, sales. Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.
  • 11. DATA WAREHOUSE - INTEGRATED Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources. E.g., Hotel price: currency, tax etc. When data is moved to the warehouse, it is converted.
  • 12. DATA WAREHOUSE – TIME VARIANT The time horizon for the data warehouse is significantly longer than that of operational systems. Operational database: current value data Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Every key structure in the data warehouse Contains an element of time, explicitly or implicitly. But the key of operational data may or may not contain “time element”
  • 13. DATA WAREHOUSE – TIME VARIANT A physically separate store of data transformed from the operational environment Operational update of data does not occur in the data warehouse environment. Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: initial loading of data and access of data.