2. Data Warehouse
• Pool of data to support decision making.
• Structured to be available in ready to use form
• Subject Oriented
• Integrated
• Time-variant
• Nonvolatile
• Additional characteristics like
1.Web based
2.Relational/multidimensional
3.Client/Server
4.Real time
5.Include metadata
3. Types of Data warehouse
DATA Mart
• Dependent
– Created from warehouse
– Replicated
• Functional subset of warehouse
• Independent
– Scaled down, less expensive version of data warehouse
– Designed for a department or SBU
– Organization may have multiple data marts
• Difficult to integrate
4. • Operational DATA Stores: Provides a fairly
recent form of customer information file(CIF)
• Enterprise DATA Warehouses: Used across the
enterprise for decision support
• METADATA: Describes the structure of and
meaning about data, contributing to their
effective use.
5. Data warehousing process overview
Major components
• Data sources
• Data extraction
• Data loading
• Comprehensive database
• Metadata
• Middleware tools
6.
7. Data Warehousing Architectures
• May have one or more tiers
– Determined by warehouse, data acquisition (back
end), and client (front end)
• One tier, where all run on same platform, is rare
• Two tier usually combines DSS engine (client) with
warehouse
– More economical
• Three tier separates these functional parts
8.
9. Architecture considerations
• Which DBMS to use?
• Parallel processing
• Partitioning
• Which data migration tools be used?
• What tools for data retrieval and analysis?
11. Architecture Selection Factors
• Information interdependence
• Senior management Info needs
• Urgency for a DW
• Nature of end user tasks
• Constraints on resources
• Strategic view
• Compatibility with existing systems
• Ability of in-house IT staff
• Technical and Political factors
13. Data Integration, Extraction And Load
process
1.DATA INTEGRATION
Comprises three major processes
• Data Access: ability to access & extract data
from any data source
• Data federation: Integration of business views
across multiple data store
• Change capture: Based on the identification,
capture, and delivery of the changes made to
enterprise data source.
14. 2.Extraction, Transformation And Load(ETL)
• Is an integral component in any data-centric
project.
• ETL consists:
Extraction-From all relevant sources
Transformation-Converting extracted data in the
form so it can place in data warehouse or
another database
Load- Inserting the data in the data warehouse.
15. ETL Process
Transient
Data
source Data
Warehouse
Data
Mart
Packaged
application
Legacy
system
Extract
Other
Internal
applications
Transform Cleanse Load
16. Benefits of Data Warehouse
• Allows extensive analysis in numerous ways.
• A consolidated view of corporate data.
• Better and more timely information.
• Enhance system performance.
• Simplification of data access.
• Enhance business knowledge, enhance
customer service and satisfaction, facilitate
decision making.
17. Assignment
• Data warehousing vendors?
• Data warehousing case study found on the
internet.
18. Data Warehouse development
Approaches
The Inmon Model: The EDW Approach
• Emphasizes top-down development
• Employing established database development
methodologies and tools
The Kimball Model: The Data Mart Approach
• Plan big, build small
• Subject oriented or department oriented
• Focus on the requests of a specific department.
20. Successful Implementation of Data
warehouse
• Establishment of service-level agreements and data-refresh
requirements.
• Identification of data sources and their governance
policies.
• Data quality planning & model designing.
• ETL tool selection.
• Relational database software and platform selection.
• Data transport and data conversion.
• Reconciliation process
• End-user support
21. Issues in implementation of data
warehouse
• Starting with the wrong sponsorship chain.
• Setting expectation that you cannot meet and
frustrating executives at the moment of truth.
• Engaging in politically native behavior.
• Loading the warehouse with information just
because it is available.
• Believing that data warehousing database design
is the same as transactional database design.
Continue……..
22. • Choosing a data warehouse manager who is
technology oriented rather than user oriented
• Focusing on traditional internal record-oriented
data and ignoring the value of external data of
text, image, and perhaps, sound and video.
• Delivering data with overlapping and confusing
definitions.
• Believing promise of performance, capacity and
scalability.
• Believing that your problem are over when the
data warehouse is up and running.
23. Risks in Data Warehouse Projects
• No mission or objective
• Quality of source data
unknown
• Skills not in place
• Inadequate budget
• Lack of supporting software
• Source data not understood
• Weak sponsor
• Users not computer literate
• Geographically distributed
environment
• Unrealistic user expectations
• Architectural and design risks
• Scope creep and changing
requirements
• Vendors out of control
• Multiple platforms
• Key people leaving project
• Loss of the sponsor
• Too much new technology
• Having to fix an operational
system
• Team geography and
language culture
24. Massive Data Warehouse And
Scalability
• Data warehouse needs scalability.
• Good scalability means: queries and other
data access functions grow ideally with the
size of warehouse.
• Specialized methods have been developed to
create scalable data warehouse.
• Scalability is difficult in managing hundreds of
terabytes.
25. Issues pertaining to scalability
• The amount of data in warehouse.
• How quickly the warehouse is expected to
grow.
• The number of concurrent users.
• The complexity of user queries.
26. Real-Time Data warehousing
• Also knows as active data warehousing.
• Process of loading & providing data via the
data warehouse.
• Evolved from EDW (Enterprise Data Warehousing)
concept.
• Allows information-based decision making at
finger tips.
• Positively affect almost all aspects of customer
service, SCM, logistics.
27. Comparison between Traditional And
Active Data Warehousing Environment
Traditional Data Warehouse
Environment
• Strategic decisions only
• Result sometimes hard to
measure
• Moderate user concurrency
• Highly restrictive reporting
used to confirm or check
existing processes and
patterns.
• Power users, knowledge
workers, internal users.
Active Data Warehouse
Environment
• Strategic and tactical decision
• Result measured with
operations
• High number of users accessing
simultaneously
• Flexible ad hoc reporting, as well
as machine-assisted modeling to
discover new hypotheses.
• Operational staffs, call centers,
external users.
28. Data Warehouse Administration
• Due to huge size, data warehouse requires
strong monitoring.
• A data warehouse administrator(DWA) should
posses following features-
1. Should be familiar with high performance software,
hardware, and networking tech.
2. Should familiar with decision making process.
3. Significant to keep the existing requirement and
capabilities of data warehouse.
4. Must posses excellent communication skills.
29. Data Warehouse Security issues
• Security and privacy of information is significant
concern.
• Companies must create effective and flexible
security procedures.
• Effective security in data warehouse focus on:
1. Establishing effective corporate and security policies and
procedures.
2. Implementing logical security procedures and techniques to
restrict access.
3. Limiting physical access to the data center environment.
4. Establishing an effective internal control review process with
an emphasis on security and privacy.