This document introduces data mining. It defines data mining as the process of extracting useful information from large databases. It discusses technologies used in data mining like statistics and machine learning. It also covers data mining models and tasks such as classification, regression, clustering, and forecasting. Finally, it provides an overview of the data mining process and examples of data mining tools.
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Data mining introduction
1. Introduction To Data mining
BY: BASMA GAMAL
RESARCHER AT COMPUTER SCIENCE- MINA UNIVERSITY
2. Outline
What is Data Mining?
Technologies used in data mining
Technologies used in data mining
Database Processing vs. Data Mining Processing
Data Mining Models and Tasks
Patterns in Data Mining
Types of Data
Data Mining Tools
3. What is Data Mining?
Data Mining is the process of extracting useful information from large database.
Data mining is also called as Knowledge discovery, Knowledge extraction, data/pattern analysis,
information harvesting, etc.
The information or knowledge extracted so can be used for any of the following applications:
oMarket Analysis
oFraud Detection
oCustomer Retention
oProduction Control
oScience Exploration
4. Technologies used in data mining
Statistics
•It uses the mathematical analysis to express representations, model and summarize empirical
data or real world observations.
•Statistical analysis involves the collection of methods, applicable to large amount of data to
conclude and report the trend.
Machine learning
•Arthur Samuel defined machine learning as a field of study that gives computers the ability to
learn without being programmed.
•When the new data is entered in the computer, algorithms help the data to grow or change due
to machine learning, an algorithm is constructed to predict the data from the available
database (Predictive analysis).
5. 5
Database Processing vs. Data Mining
Processing
Query
◦ Well defined
◦ SQL
Query
◦ Poorly defined
◦ No precise query language
Data
– Operational data
Output
– Precise
– Subset of database
Data
– Not operational data
Output
– Fuzzy
– Not a subset of database
7. Patterns in Data Mining
•1. Association
The items or objects in relational databases, transactional databases or any
other information repositories are considered, while finding associations or
correlations.
2. Classification
•The goal of classification is to construct a model with the help of historical
data that can accurately predict the value.
It maps the data into the predefined groups or classes and searches for the
new patterns.
For example:
To predict weather on a particular day will be categorized into - sunny, rainy, or cloudy.
8. 3. Regression
Creates predictive models. Regression analysis is used to make predictions based on existing
data by applying formulas.
Regression is very useful for finding (or predicting) the information on the basis of previously
known information.
4. Cluster analysis
It is a process of portioning a set of data into a set of meaningful subclass, called as cluster.
It is used to place the data elements into the related groups without advanced knowledge of
the group definitions.
5. Forecasting
Forecasting is concerned with the discovery of knowledge or information patterns in data that
can lead to reasonable predictions about the future.
10. Business understanding:
•In this phase, business and data-mining goals are established.
•Understand business and client objectives.
•Using business objectives and current scenario, define your data mining goals.
Data understanding:
In this phase, sanity check on data is performed to check whether its
appropriate for the data mining goals.
11. Data preparation:
In this phase, data is made production ready.
The data preparation process consumes about 90% of the time of the project.
Modelling
In this phase, mathematical models are used to determine data patterns.
Evaluation:
In this phase, patterns identified are evaluated against the business objectives.
13. Types of Data
Data mining can be performed on following types of data:
Relational databases
Data warehouses
Advanced DB and information repositories
Object-oriented and object-relational databases
Transactional and Spatial databases
Heterogeneous and legacy databases
Multimedia and streaming database
Text databases
Text mining and Web mining
14. Data Mining Tools
Following are 2 popular Data Mining Tools widely used in Industry:
R language is an open source tool for statistical computing and graphics. R has a wide variety of
statistical, classical statistical tests, time-series analysis, classification and graphical techniques.
It offers effective data handing and storage facility.
Oracle Data Mining popularly knowns as ODM is a module of the Oracle Advanced Analytics
Database. This Data mining tool allows data analysts to generate detailed insights and makes
predictions. It helps predict customer behavior, develops customer profiles, identifies cross-
selling opportunities.