TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Data warehousing and Data mining
1. Data Mining and
Data Warehousing Techniques
Presented to : Muhammad Faisal
Presented by:
Faizan Saleem
Pireh Pirzada
Ahmed Hassan
Muhammad Usman
BSE-4 | DATABASE MANAGEMENT SYSTEM
2. Topics
Why we need Data warehouses and
Data mining?
What Data warehouses and Data
mining?
History of Data warehouses and Data
mining?
Techniques of Data warehouses and
Data mining
3. Why we need Data Mining and
Ware-housing
Problem Scenario
Solution
Needs of Data warehouses and Data Mining
6. Problem Scenario 1
ABC Pvt Ltd is a company with
branches at Karachi, Lahore,
Peshawar and Islamabad.
The Sales Manager wants quarterly
sales report.
Each branch has a separate
operational system.
10. Problem Scenario 2
A Shopping Super Market has huge
operational database. Whenever
Executives wants some report the OLTP
system becomes slow and data entry
operators have to wait for some time.
12. Solutions for Shopping Mart
Extract data needed for analysis from
operational database and Store it in warehouse.
Refresh warehouse at regular interval so that it
contains up to date information for analysis.
Warehouse will contain data with historical
perspective.
14. Need for Data Warehousing
Industry has huge amount of operational data
Knowledge worker wants to turn this data into
useful information.
This information is used by them to support
strategic decision making .
15. Need for Data Warehousing
It is a platform for consolidated historical data
for analysis.
It stores data of good quality so that knowledge
worker can make correct decisions.
16. Need for Data Warehousing
From business perspective
It is latest marketing weapon
Helps to keep customers by learning more
about their needs .
Valuable tool in today’s competitive fast
evolving world.
17. Why Mine Data? Commercial Viewpoint
Lots of data is being collected and warehoused
Web data, e-commerce
Purchases at department/ grocery stores
Bank/Credit Card
transactions
Computers have become cheaper and more powerful
Competitive Pressure is Strong
Provide better, customized services for an edge (e.g.
in Customer Relationship Management)
18. Why Mine Data in Scientific Viewpoint
Data collected and stored at enormous speeds
(GB/hour)
Remote sensors on a satellite
telescopes scanning the skies
Microarrays generating gene expression data
Scientific simulations generating terabytes of
data
19. What is Data Mining and Ware-
housing?
Definition Data Warehouse
Data Ware houses Uses
Definition Data Warehouse
Data Mining Uses
Data Ware Housing Verses Data Mining
Examples
20. What is Data Ware-Housing?
20
Data warehousing can be
said to be the process of
centralizing or
aggregating data from
multiple sources into one
common repository.
A process of transforming data
into information and making it
available to users in a timely
enough manner to make a
difference.
Data Information
21. Data Ware-Housing Uses
Reporting and Data Analysis.
Data warehouses store current as well as historical
data and are used for creating trending reports for
senior management reporting such as annual and
quarterly comparisons.
22.
23. What is Data Mining?
23
Data mining is the process
of mining and discovering
of new information in
terms of patterns or rules
from vast amounts of data
involving methods at the
intersection of artificial
intelligence, machine
learning, statistics, and
database systems.
24. What is Data Mining?
Extract information and transform it into an
understandable structure.
Uses past data to analyze the outcome of a particular
problem or situation.
25. Data Mining Uses
To decide upon marketing strategies for their product.
They can use data to compare and contrast among
competitors.
Data mining interprets its data into real time analysis
that can be used to:
increase sales,
promote new product,
or delete product that is not value-added to the company.
26. Data Mining works with Warehouse
Data
26
Data Warehousing provides
the Enterprise with a memory
Data Mining provides
the Enterprise with
intelligence
27. Data ware-housing VS data
mining
Data Ware Housing
Occurs before any Data
mining process.
data warehousing is the
process of compiling and
organizing data into one
common database
Data Mining
Relies on data
warehousing data to
detect meaningful
patterns.
data mining is the
process of extracting
meaningful data from
that database.
28. Example of data mining
Credit Card Fraud.
Data it collection on shoppers to find patterns
in their shopping habits.
A great example of data warehousing that
everyone can relate to is what Facebook does.
29. History of Data Mining and
Ware-housing?
Data Warehouse History
Data Mining History
30. History of Data warehouse
1960s — General Mills and Dartmouth College, in a joint
research project, develop the
terms dimensions and facts.
1970s — ACNielsen and IRI provide dimensional data
marts for retail sales.
1970s — Bill Inmon begins to define and discuss the
term: Data Warehouse
31. History of Data warehouse
1975 — Sperry Univac Introduce MAPPER (MAintain,
Prepare, and Produce Executive Reports) is a database
management and reporting system that includes the
world's first 4GL.
32. History of Data warehouse
1983 — Tera data introduces a database management
system specifically designed for decision support.
1983 — Sperry Corporation Martyn Richard Jones defines
the Sperry Information Center approach, which while
not being a true DW in the Inmon sense, did contain
many of the characteristics of DW structures.
33. History of Data warehouse
1984 — Metaphor Computer Systems releases Data
Interpretation System (DIS). DIS was a
hardware/software package and GUI for business users
to create a database management and analytic system.
34. History of Data warehouse
1988 — Barry Devlin and Paul Murphy publish the article
in IBM Systems Journal where they introduce the term
"business data warehouse".
1990 — Red Brick Systems, founded by Ralph Kimball,
introduces Red Brick Warehouse, a database
management system specifically for data warehousing.
1991 — Prism Solutions, founded by Bill Inmon,
introduces Prism Warehouse Manager, software for
developing a data warehouse.
35. History of Data warehouse
1992 — Bill Inmon publishes the book Building the Data
Warehouse.
1995 — The Data Warehousing Institute, a for-profit
organization that promotes data warehousing, is
founded.
36. History of Data warehouse
1996 — Ralph Kimball publishes the book The Data
Warehouse Toolkit.
2000 — Daniel Linstedt releases the Data Vault, enabling
real time auditable Data Warehouses warehouse.
37. Brief History Of Data Mining
The term "Data mining" was introduced in the 1990s.
Data mining can be tracked through classical statistics,
artificial intelligence, and machine learning.
Statistics are the foundation of most technologies on
which data mining is built. All of these are used to study
data and data relationships.
38. Artificial intelligence, or AI, which is built upon
heuristics as opposed to statistics, attempts to
apply human-thought-like processing to statistical
problems. AI concepts were adopted for RDBMS ‘s
Query processor.
Brief History Of Data Mining
39. Brief History Of Data Mining
Machine learning is the union of statistics
and AI. It could be considered an
evolution of AI, because it blends AI
heuristics with advanced statistical
analysis.
41. Processes Used in Data Mining
It is done by two Methods:
• Prediction Methods
• Description Methods
42. How it works
Data mining involves six common tasks
o Classification [Predictive]
o Clustering [Descriptive]
o Association Rule Discovery [Descriptive]
o Sequential Pattern Discovery [Descriptive]
o Regression [Predictive]
o Deviation Detection [Predictive]
43. Anomaly detection
What is Anomaly Detection ?
Types of Anomaly Detection:
• Unsupervised anomaly detection
• Supervised anomaly detection
• Semi-supervised anomaly detection
44. Association rule learning
What is Association rule learning
The examples:
• In super Market
• Inventory Management
45. Classification
What is it ?
Given a collection of records (training set )
Find a model for class attribute as a function of the values
of other attributes
Goal: previously unseen records should be assigned a class
as accurately as possible.
Example:
47. Sequential Pattern
Discovery
What is it?
Example:
In point-of-sale transaction sequences,
Computer Bookstore:
(Intro_To_Visual_C) (C++_Primer) -->
(Perl_for_dummies,Tcl_Tk)
Athletic Apparel Store:
(Shoes) (Racket, Racketball) --> (Sports_Jacket)
(A B) (C) (D E)
48. Regression
What is it ?
Example:
Pagerank as used by google
• Page structure implicitly holds importance of a page
• Important pages are linked to by important pages
49. Applications Of Data Mining
Data Mining Applications in Sales/Marketing
Data Mining Applications in Banking / Finance
Data Mining Applications in Health Care and Insurance
Data Mining Applications in Transportation
Data Mining Applications in Medicine
50. Data Mining Applications in
Sales/Marketing
enables businesses to understand the hidden patterns
inside historical purchasing transaction
Market basket analysis
Identify customer’s behavior
51. Data Mining Applications
in Banking / Finance
credit card fraud detection
identify customers loyalty
identify stock trading rules
Identify users by method of payment/transaction
52. Data Mining Applications
in Health Care and Insurance
Claims analysis
Forecasts of customers
Detect risky customers
Fraudulent behavior
56. Star Schema
Star schema is the simplest form of a dimensional model, in
which data is organized into facts and dimensions.
A star schema is diagramed by surrounding each fact with
its associated dimensions.
The resulting diagram resembles a star.
Star schemas are optimized for querying large data sets and
are used in data warehouses and data marts to support
OLAP cubes, business intelligence and analytic applications,
and queries.
57. Elements of star schema
Dimension tables
A dimension contains reference information
about the fact, such as date, product, or
customer.
Demoralized, decoded and cleaned set of
descriptive data elements
Geography dimension tables describe
location data, such as country, state, or city
Employee dimension tables describe
employees, such as salespeople
58. Fact Tables
A fact is an event that is counted or measured,
such as a sale or login.
Contains foreign keys referencing dimension
records
Contain either additive or semi-additive
measures for analysis
59.
60. Example
Each dimension table has a primary key on its Id column, relating
to one of the columns (viewed as rows in the example schema) of
the Fact_Sales table's three-column (compound) primary key
(Date_Id, Store_Id, Product_Id).
The non-primary key Units_Sold column of the fact table in this
example represents a measure or metric that can be used in
calculations and analysis.
The non-primary key columns of the dimension tables represent
additional attributes of the dimensions (such as the Year of the
Dim_Date dimension).
For example, the following query answers how many TV sets have
been sold, for each brand and country, in 1997:
SELECT P.Brand, S.Country, SUM(F.Units_Sold)FROM
Fact_Sales FINNER JOIN Dim_Date D ON F.Date_Id = D.IdINNER
JOIN Dim_Store S ON F.Store_Id = S.IdINNER JOIN Dim_Product P
ON F.Product_Id = P.IdWHERE D.YEAR = 1997AND
P.Product_Category = 'tv'GROUP BY P.Brand, S.Country
61. Snowflake
Schema
Star Schema
Ease of
maintenance/change:
No redundancy
and hence more
easy to maintain
and change
Has redundant data and hence less easy to
maintain/change
Ease of Use:
More complex
queries and hence
less easy to
understand
Less complex queries and easy to
understand
Query Performance:
More foreign keys-
and hence more
query execution
time
Less no. of foreign keys and hence lesser
query execution time
Normalization:
Has normalized
tables
Has De-normalized tables
62. Type of
Datawarehouse:
Good to use for
datawarehouse
core to simplify
complex
relationships
(many:many)
Good for datamarts with simple
relationships (1:1 or 1:many)
Joins:
Higher number of
Joins
Fewer Joins
Dimension table:
It may have more
than one
dimension table
for each
dimension
Contains only single dimension table for
each dimension
When to use:
When dimension
table is relatively
big in size,
snowflaking is
better as it
reduces space.
When dimension table contains less number
of rows, we can go for Star schema.
Data warehousing can be said to be the process of centralizing or aggregating data from multiple sources into one common repository. A process of transforming data into information and making it available to users in a timely enough manner to make a difference.
, e.g. regression analysis, standard distribution, standard deviation, etc (STATISTICS)
. Machine learning attempts to let computer programs learn about the data they study, such that programs make different decisions based on the qualities of the studied data, using statistics for fundamental concepts, and adding more advanced AI heuristics and algorithms to achieve its goals.Data mining, in many ways, is fundamentally the adaptation of machine learning techniques to business applications. Data mining is best described as the union of historical and recent developments in statistics, AI, and machine learning. These techniques are then used together to study data and find previously-hidden trends or patterns within.
Data Mining Applications in Sales/MarketingData mining enables businesses to understand the hidden patterns inside historical purchasing transaction data, thus helping in planning and launching new marketing campaigns in prompt and cost effective way. The following illustrates several data mining applications in sale and marketing.Data mining is used for market basket analysis to provide information on what product combinations were purchased together, when they were bought and in what sequence. This information helps businesses promote their most profitable products and maximize the profit. In addition, it encourages customers to purchase related products that they may have been missed or overlooked.Retail companies uses data mining to identify customer’s behavior buying patterns.
Several data mining techniques e.g., distributed data mining have been researched, modeled and developed to help credit card fraud detection.Data mining is used to identify customers loyalty by analyzing the data of customer’s purchasing activities such as the data of frequency of purchase in a period of time, total monetary value of all purchases and when was the last purchase. After analyzing those dimensions, the relative measure is generated for each customer. The higher of the score, the more relative loyal the customer is.To help bank to retain credit card customers, data mining is applied. By analyzing the past data, data mining can help banks predict customers that likely to change their credit card affiliation so they can plan and launch different special offers to retain those customers.Credit card spending by customer groups can be identified by using data mining.The hidden correlation’s between different financial indicators can be discovered by using data mining.From historical market data, data mining enables to identify stock trading rules.
The growth of the insurance industry entirely depends on the ability of converting data into the knowledge, information or intelligence about customers, competitors and its markets. Data mining is applied in insurance industry lately but brought tremendous competitive advantages to the companies who have implemented it successfully. The data mining applications in insurance industry are listed below:Data mining is applied in claims analysis such as identifying which medical procedures are claimed together.Data mining enables to forecasts which customers will potentially purchase new policies.Data mining allows insurance companies to detect risky customers’ behavior patterns.Data mining helps detect fraudulent behavior.
Data mining helps determine the distribution schedules among warehouses and outlets and analyze loading patterns.
Data mining enables to characterize patient activities to see incoming office visits.Data mining helps identify the patterns of successful medical therapies for different illnesses.