My dm ppt

Clustering is a method of finding similar objects
that together make a group i.e., the members of a
particular group has most similar properties to
each other.
While in terms of graph theory clustering is a
method in which degree of nodes should be
identified in which they tend to cluster each other
or we can say they form an optimal clique.
When we travel along one node to another it is
most likely to be lie under one cluster then to
another.

Clustering over graph taken in two ways one on
undirected graph and other is directed or weighted
graph.
On an unweighted graph: Start at a vertex,
choose an outgoing edge uniformly at
random, walk along that edge, and repeat.
On a weighted graph: Start at a vertex u,
choose an incident edge e with weight we with
probability
we / Σd wd
where d ranges over the edges incident to u,
walk along that edge, and repeat.

“ MCL Algorithm”
It is introduced by “Stijn Marinus Van Dongen” in
year 2000.
Markov clustering algorithm is based on the
random walks calculated by using markov chains
which in turn is calculated by using transition
probability matrix.
The basic idea of clustering is that in random
walk that visiting cluster will not leave the cluster
until its most vertices are not visited.
In short the basic idea here is of flow simulation.

w
u
Suppose you start at u.
What’s the
probability you are at
w after 3 steps?
Let vu be the vector
that is 0
everywhere except
index u.
At step 0, vu[w] gives
the
probability you are at
node w.

After 1 step, (TGvu)[w] gives the
probability that you are at w.
after k steps, the probability that
you are at w is:
(TG
k
vu)[w]
In other words, TGk
vu is a vector giving our
probability of being at any node after taking k steps
and starting from u.

MCL algorithm works in two ways:
K-paths clustering.
Random walks.
Here, we are going to discuss simulation of random
walks in graphs to find clusters over them.
According to van dongen :
Number of u-v paths of length k is larger if u, v
are in the same dense cluster, and smaller if they
belong to different clusters.
A random walk on the graph won’t leave a dense
cluster until many of its vertices have been visited.

Random walks therefore helps to find where the
flow going and so where the cluster lies which make
it more optimal.
MCL work on the phenomena of probability
where next time probability depend on the current
probability and not on the past ,the process may
change or remains in the same state depends on the
probability distribution.
The number of Higher-Length paths in G is large
for pairs of vertices lying in the same dense cluster

Small for pairs of vertices belonging to different
clusters.
Two basic operations done over in MCL are:
Expansion
Inflation
Expansion operator: Expansion operator is
responsible for strengthening more strength
regions, it is responsible for allowing flow to
connect different regions of graph.
•Expansion is done by doing normal matrix
product of a stochastic matrix.

Inflation operator: while inflation operator is
responsible for eliminating weak regions, it is
responsible for both strengthening and weakening
of current.
 Inflation doing by taking hadamard power of
matrix and then normalizing it or rescaling it, such
that the resulting matrix is stochastic again.
Algorithm says that the flow is easier in the dense
region then in sparse boundaries, but in larger data
and long run this effect disappears.
oA walker starts from some arbitrary point .
oHe successively visits new vertices by selecting
arbitrarily one of the outgoing edges.

Following figure
showing clustering over
graph:
Different node colors
showing different clusters
and their links to other
clusters.

Steps Of MCL Algorithm
 Input will be an un-directed graph.
Create the associated matrix.
Add self loops to resolve issues of stucking into
local minima(this is an optional step).
 Normalize the matrix.
Perform expansion operation by simply taking
nth
power matrix of matrix(it is a normal matrix
multiplication i.e., simply squaring the matrix).

Perform inflation operation, in it first we take
hadamard power of matrix and then rescale it so
that its columns sum to 1(inflation parameter is r).
Repeat expansion and inflation operation
respectively until a steady state is reached.
 Now after getting an idempotent matrix the
resulting matrix should interpret in order to find
clusters.

Example :
Step 1:
Taking
undirected
graph as an
input:
1
2
3
4
In the following example we expanding matrix by
power of 2 and inflate it with the power of 2.

1
2
3
4
0 1 1 1
1 0 0 1
1 0 0 0
1 1 0 0
1 1 1 1
1 1 0 1
1 0 1 0
1 1 0 1
Diagonal
matrix

¼ 1/3 ½ 1/3
1/4 1/3 0 1/3
1/4 0 1/2 0
1/4 1/3 0 1/3
Normalizing
matrix so it will no
more be symmetric
Perform expansion operation by taking nth
power of
matrix.
¼ 1/3 ½ 1/3
1/4 1/3 0 1/3
1/4 0 1/2 0
1/4 1/3 0 1/3
¼ 1/3 ½ 1/3
1/4 1/3 0 1/3
1/4 0 1/2 0
1/4 1/3 0 1/3

.35 .31 .38 .31
.23 .31 .13 .31
.19 .08 .38 .08
.23 .31 .13 .31
Expansion operation
completed
Inflation operation perform:
.35 .31 .38 .31
.23 .31 .13 .31
.19 .08 .38 .08
.23 .31 .13 .31
.35 .31 .38 .31
.23 .31 .13 .31
.19 .08 .38 .08
.23 .31 .13 .31

.13 .09 .14 .09
.05 .09 .02 .09
.04 .01 .14 .01
.05 .09 .02 .09
Repeat expansion and inflation of matrix till
steady state is reached , further resulting matrices
will be:
Inflation operation
completed
.47 .33 .45 .33
.20 .33 .05 .33
.13 .02 .45 .02
.20 .33 .05 .33
.70 .33 .49 .33
.12 .33 .01 .33
.05 .02 .49 --
.12 .33 .01 .33

.94 .33 .50 .33
.03 .33 -- .33
.01 -- .50 --
.13 .33 -- .33
1 .33 .50 .33
-- .33 -- .33
-- -- .50 --
-- .33 -- .33
Attractors and the elements they attract are swept
together into the same cluster:
In this case, {1},{2,4},{3}

My dm ppt

Recommandé

Recommandé

Contenu connexe

Similaire à My dm ppt

Similaire à My dm ppt (20)

Dernier

Dernier (20)

My dm ppt