(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
Data warehouse and data mining.pptx
1. DATA WAREHOUSING AND DATA
MINING
INTRODUCTION TO DATA WAREHOUSING
NAME- DEBADITYA GHOSH
UNIVERSITY ROLL- 10900121235
COURSE NAME- DATA WAREHOUSING AND DATA MINING
COURSE CODE- PEC-IT602B
2. WHAT IS DATA WAREHOUSING?
Defined in many different ways, but not rigorously
• Ø A decision support database that is maintained separately from the organization’s
operational database
• Ø Support information processing by providing a solid platform of consolidated,
historical data for analysis
A single, complete and consistent store of data obtained from a variety of different
sources made available to end users in a what they can understand and use in a business
context. [Barry Devlin]
Alternatively,
A process of transforming data into information and making it available to users in a
timely enough manner to make a difference.
3. WHY DATA WAREHOUSING?
The railway reservation system has been operational for over a decade and
large amount of data is generated each day on train bookings. Much of this
data is probably archived for audit purposes. This archived operational data
can be effectively used for tactical strategic management of the railways.
For example, by analyzing the reservation data it would be possible to find
out traffic patterns in various sectors and use it to add or remove bogies in
certain trains, to decide on the mix of various classes of accommodation,
etc. For this analysis building a data warehouse is an effective solution.
5. DATABASE VS DATA WAREHOUSING
Database is a collection of related information stored in a structured
form in terms of table so that it makes easier insertion, deletion and
manipulation of data. Database consists of tables that contain
attributes. Whereas a data warehouse is a database system optimized
for reporting and analysis. It generally refers to the combination of
many different databases across entire enterprise. Once the data
entered in the data warehouse, it can be then only loaded, refreshed
and accessed for queries.
6. STRATEGIC INFORMATION
Who needs strategic information in an enterprise?
What exactly do we mean by strategic information?
The executives and managers who are responsible for keeping the
enterprise competitive need information to make proper decisions need
information to formulate the business strategies, establish goals, set
objectives, and monitor results.
7. CHARACTERISTICS OF STRATEGIC
INFORMATION
Integrated
Must have a single, enterprise-wide view.
DATA INTEGRITY
Information must be accurate and must conform to business rules.
ACCESSIBLE
Easily accessible with intuitive access paths, and responsive for analysis.
CREDIBLE
Every business factor must have one and only one value.
TIMELY
Information must be available within the stipulated time frame.
8. MILESTONES OF DATA
WAREHOUSING
1983—Teradata introduces a database management system (DBMS) designed for decision-support
systems.
1988—The article An Architecture for a Business and Information Systems introducing the term
“business data warehouse” is published by Barry Devlin and Paul Murphy in the IBM Systems Journal.
1990—Red Brick Systems introduces Red Brick Warehouse, a DBMS specifically for data warehousing.
1991—Bill Inmon publishes his book Building the Data Warehouse
1991—Prism Solutions introduces Prism Warehouse Manager software for developing a data
warehouse.
1995—The Data Warehousing Institute, a premier institution that promotes data warehousing is
founded.
1996—Ralph Kimball publishes a seminal book The Data Warehousing Toolkit.
1997—Oracle 8, with support for STAR schema queries, is released.
9. FORMS OF DATA WAREHOUSING
A data warehouse is a
• subject-oriented
• integrated
• time-variant
• nonvolatile
collection of data that is used primarily in organizational decision
making.
10. DATA WAREHOUSE - SUBJECT-
ORIENTED
Organized around major subjects, such as customer,
product, sales. Focusing on the modeling and analysis of
data for decision makers, not on daily operations or
transaction processing. Provide a simple and concise view
around particular subject issues by excluding data that are
not useful in the decision support process.
11. DATA WAREHOUSE - INTEGRATED
Constructed by integrating multiple, heterogeneous data sources relational
databases, flat files, on-line transaction records
Data cleaning and data integration techniques are applied.
Ensure consistency in naming conventions, encoding structures, attribute
measures, etc. among different data sources.
E.g., Hotel price: currency, tax etc.
When data is moved to the warehouse, it is converted.
12. DATA WAREHOUSE – TIME VARIANT
The time horizon for the data warehouse is significantly longer than that of
operational systems.
Operational database: current value data
Data warehouse data: provide information from a historical perspective
(e.g., past 5-10 years)
Every key structure in the data warehouse
Contains an element of time, explicitly or implicitly. But the key of
operational data may or may not contain “time element”
13. DATA WAREHOUSE – TIME VARIANT
A physically separate store of data transformed from the operational
environment
Operational update of data does not occur in the data warehouse
environment.
Does not require transaction processing, recovery, and
concurrency control mechanisms Requires only two operations in
data accessing: initial loading of data and access of data.