Lecture2 (1).ppt

Introduction to Data Mining
• What is Data Mining?
• Related technologies
• Data Mining techniques
• Data Mining Goals
• Stages of data mining process
• Knowledge representation methods
• Applications

What is Data Mining?
• The process of extracting information to identify patterns, trends,
and useful data that would allow the business to take the data-
driven decision from huge sets of data is called Data Mining.
• Data mining is the act of automatically searching for large stores
of information to find trends and patterns that go beyond simple
analysis procedures.
• Data Mining is a process used by organizations to extract
specific data from huge databases to solve business problems.
It primarily turns raw data into useful information.
• Data mining utilizes complex mathematical algorithms for data
segments and evaluates the probability of future events. Data
Mining is also called Knowledge Discovery of Data (KDD).

Related Technologies
Data mining is related to many concepts. We briefly
introduce each concept and indicate how it is related to
data mining.
• Machine Learning
• DBMS
• OLAP
• Statistics

Machine Learning
• Machine learning is the area of AI that examines how to write programs that
can learn.
• In data mining, machine learning is often used for prediction or classification.
• Applications that typically use machine learning techniques include speech
recognition, training moving robots, classification of astronomical structures,
and game playing.
• When machine learning is applied to data mining tasks, a model is used to
represent the data (such as a graphical structure like a neural network or a
decision tree).
• During the learning process, a sample of the database is used to train the
system to properly perform the desired task.
• Then the system is applied to the general database to actually perform the
task.

Machine Learning
• Machine learning algorithms are divided into two types:
1. Unsupervised Learning
2. Supervised Learning
1. Unsupervised Machine Learning:
Unsupervised learning does not depend on trained data sets to predict the
results, but it utilizes direct techniques such as clustering and association in
order to predict the results.
2. Supervised Machine Learning:
Supervised learning is a learning process in which we teach or train the
machine using data which is well leveled implies that some data is already
marked with the correct responses. After that, the machine is provided with
the new sets of data so that the supervised learning algorithm analyzes the
training data and gives an accurate result.

OLAP
• OLAP stands for On-Line Analytic Processing.
• OLAP systems are targeted to provide more complex query
results than traditional OLTP or database systems.
• OLAP is performed on data warehouses or data marts. The
primary goal of OLAP is to support ad hoc querying needed to
support DSS.
• The multidimensional view of data is fundamental to OLAP
applications.
• OLAP tools can be classified as ROLAP or MOLAP.
• ROLAP- Relational OLAP
• MOLAP- Multidimensional OLAP

OLAP operations
There are several types of OLAP operations supported by OLAP tools:
• A simple query may look at a single cell within the cube [Figure (a)] .
• Slice: Look at a subcube to get more specific information. This is performed
by selecting on one dimension. As seen in Figure (c), this is looking at a
portion of the cube.
• Dice: Look at a subcube by selecting on two or more dimensions. This can be
performed by a slice on one dimension and then rotating the cube to select
on a second dimension. In Figure (d)
• Roll up (dimension reduction, aggregation): Roll up allows the user to ask
questions that move up an aggregation hierarchy. Figure (b) represents a roll
up from (a).
• Drill down: Figure (a) represents a drill down from (b). These functions allow a
user to get more detailed fact information by navigating lower in the
aggregation hierarchy.
• Visualization: Visualization allows the OLAP users to actually "see" results of
an operation.

DBMS
• A database is a collection of data usually associated with some
organization or enterprise.
• Schema
– e.g. (ID,Name,Address,Salary,JobNo) may be the schema for a
personnel database.
• A database management system (DBMS) is the software used to access a
database.
• Data model is used to describe the data, attributes, and relationships
among them.
– ER Model.

DBMS
• Transaction
• Query:
SELECT Name
FROM T
WHERE Salary > 100000
• A major difference between data mining queries and those of database
systems is the output .
• Basic database queries always output either a subset of the database or
aggregates of the data. A data mining query outputs a KDD object.

Statistics
• Simple statistical concepts as determining a data distribution and calculating
a mean and a variance can be viewed as data mining techniques.
• Statistical inference: Generalizing a model created from a sample of the
data to the entire dataset.
• Exploratory Data Analysis:
– Data can actually drive the creation of the model
– Opposite of traditional statistical view.
• Statistics research has produced many of the proposed data mining
algorithms.
• The difference between the data mining and statistics is data mining is
targeted to business users not to the statistician.

Goals of Data Mining?
• Data mining is one of the most useful techniques that help
entrepreneurs, researchers, and individuals to extract valuable
information from huge sets of data.
• Data mining Store and manage the data in a multidimensional
database system.
• Data mining Provide data access to business analysts and
information technology professionals.
• Data mining Analyze the data by application software.
• Data mining Present the data in a useful format, such as a
graph or table.

Lecture2 (1).ppt

Recommandé

Recommandé

Contenu connexe

Similaire à Lecture2 (1).ppt

Similaire à Lecture2 (1).ppt (20)

Plus de Minakshee Patil

Plus de Minakshee Patil (7)

Dernier

Dernier (20)

Lecture2 (1).ppt