Knowledge Discovery & Representation

1.Introduction
 Knowledge discovery describes the process of automatically searching large
volumes of data for patterns that can be considered knowledge about the data.
 It can be categorized according to
1) what kind of data is searched
2) in what form is the result of the search represented.
 Knowledge discovery developed out of the Data mining domain, and is closely
related to it both in terms of methodology and terminology.
 Knowledge representation is a formalism for representing at least the data,
information and knowledge things in an application.
 Knowledge can be represented either as programs in an imperative language or
can be also represented as rules in a declarative language.

2.Knowledge Discovery
 It is also known as Knowledge Discovery in Databases (KDD).
Data
Knowledge
Discovery
Process
useful
information
Requires
much elapsed time.

3. Data Mining
Data mining involves many different algorithms to accomplish different
tasks
 Data mining algorithms can be characterized as consisting of three parts:
• The purpose of algorithm is to fit a model to the data.
Model
• Some criteria must be used to fit one model over another.
Preference
• All algorithms require some technique to search the data.
Search

4. Classification of Data
Mining

5. Working of Data Mining
 Data mining provides link between separate transaction and analytical systems.
 Data mining software analyzes relationships and patterns in stored transaction data
based on user queries.
 Generally four types of relationships are sought: classes, clusters, associations,
sequential patters.
Extract, transform,
and load
transaction data
Present the
data in a useful
format
Analyze the data
by application
software
Store and
manage the
data
Provide data
access to
business analysts
& IT professionals
Data mining

5. Clustering
WHAT IS A CLUSTER….?
 A cluster is collection of objects
which are “similar” between them
and are “dissimilar” to the objects
belonging to other clusters.
WHAT IS CLUSTERING….?
 The process of organizing objects
into groups whose members are
similar in some way.
 Distance-based clustering &
Conceptual clustering are some of
the types of clustering…

Possible applications of
Clustering
Marketing
Biology Libraries
WorldWideWeb

Problems of clustering
Problems
Cant address
all
requirements
adequately
Large data
items can
cause time
complexity
The result
can be
interpreted in
different
ways
If obvious
distance
measure does not
exist defining it
is not easy

Clustering
algorithms
Exclusive Overlapping Hierarchical Probabilistic
Classification of Clustering
Algorithms

K-means Clustering
Original data K-means clustering
Clustering on “mouse” data set
 K-means is as iterative
clustering algorithm in
which items are moved
among sets of clusters
until the desired set is
reached.
This definition
assumes that each ‘tuple’
has only one numeric
value as apposed to a
‘tuple’ with many
attribute values.

K-means algorithm
Input:
• D = {t1,t2,……..tn} //set of elements
• k //Number of desired clusters
Output:
• K //Set of clusters
Assign initial values for means m1,m2………..mk;
Repeat
Assign each item ti to the cluster which has the closest mean;
Calculate the new mean for each cluster;
Until

---Example---
k = 2
{2,4,10,12,3,20,
30,11,25}
I
N
P
U
T
Output
m1 m2 K1 K2
2 4 {2,3} {4,10,12,20,30
,11,25}
2.5 16 {2,3,4} {10,12,20,30,1
1,25}
3 18 {2,3,4,10} {12,20,30,11,2
5}
4.75 19.6 {2,3,4,10,11,12} {20,30,25}
7 25 {2,3,4,10,11,12} {20,30,25}

Knowledge Discovery & Representation

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

Similaire à Knowledge Discovery & Representation

Similaire à Knowledge Discovery & Representation (20)

Knowledge Discovery & Representation