SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME
109
A DISTANCE BASED CLUSTERING ALGORITHM
1
Neeti Arora, 2
Dr.Mahesh Motwani
1
Research student Rajiv Gandhi Technical University, Bhopal, India
2
Department of Computer Engineering, Rajiv Gandhi Technical University, Bhopal, India
ABSTRACT
Clustering is an unsupervised data mining technique used to determine the objects that are
similar in characteristics and group them together. K-means is a widely used partitional clustering
algorithm but the performance of K-means strongly depends on the initial guess of centers (centroid)
and the final cluster centroids may not be the optimal ones. Therefore it is important for K-means to
have good choice of initial centroids. We have developed a clustering algorithm based on distance
criteria to select a good set of initial centroids. Once some point d is selected as initial centroid, the
proposed algorithm computes average of data points to avoid the points near to d from being selected
as next initial centroids. These initial centroids are given as input to the K-means technique leading to
a clustering algorithm that result in better clustering as compared to the K-means partition clustering
algorithm, agglomerative hierarchical clustering algorithm and Hierarchical partitioning clustering
algorithm.
Keywords: Initial centroids; Recall; Precision; Partitional clustering; Agglomerative hierarchical
clustering and Hierarchical partitioning clustering.
I. INTRODUCTION
Clustering is the data mining technique that groups the data objects into classes or clusters, so
that objects within a cluster have high similarity in comparison to one another(intra-class similarity)
but are very dissimilar to objects in other clusters(inter-class similarity).There are many clustering
methods available, and each of them may give a different grouping of a dataset. These methods are
divided into two basic types: hierarchical and partitional clustering [1]. Hierarchical clustering either
merges smaller clusters into larger ones (Agglomerative) or splits larger clusters into smaller ones
(Divisive). Agglomerative clustering starts with one point in each cluster and at each step selects and
merges two most clusters that are most similar in characteristics. The process is repeated till the
required number of clusters is formed. Divisive clustering starts by considering all data points in one
cluster and splits them until the required number of clusters is formed. The agglomerative and divisive
clustering methods further differ in the rule by which it is decided which two small clusters are merged
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 5, Issue 5, May (2014), pp. 109-119
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2014): 8.5328 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME
110
or which large cluster is split.
Partitional clustering, attempts to directly decompose the data set into a set of disjoint clusters.
Each object is a member of the cluster with which it is most similar. K-means [2, 3] is one of the most
widely used partitional clustering algorithms. The algorithm clusters the n data points into k groups,
where k is provided as an input parameter. It defines k centroids, one for each cluster. For this k data
points are selected at random from D as initial centroids. The next step is to take each point belonging
to the given data set and assign it to the nearest centroid. Euclidean distance is generally considered to
determine the distance between data points and the centroids. When all the points are included in some
clusters, the mean of each cluster is calculated to find the new centroids for each cluster. It then assigns
all the data points to clusters based upon their proximity to the mean of the cluster. The cluster’s mean
is then recomputed and the process begins again. In due course, a situation is reached where the
clusters centers do not change anymore. This becomes the convergence criterion for clustering.
Algorithm K-means:
Input: Dataset D of n data points {d1, d2,…,dn} and k the number of clusters
Output: Set of k Clusters
1. Chose k data points at random from D as initial centroids.
2. Repeat
Assign each data point in D to the group that has the closest centroid.
Recalculate the positions of the k centroids by taking the mean.
Until the centroids no longer move.
The K-means algorithm has a drawback that it results in different types of clusters depending
on the initialization process, which randomly generates the initial centers. Thus the performance of
K-means depends on the initial guess of centers as different runs of K-means on the same input data
might produce different results [4].
Researchers [5, 6] have found that hierarchical clustering algorithms are better than partitional
clustering algorithms in terms of clustering quality. But the computational complexity of hierarchical
clustering algorithms is higher than that of partitional clustering algorithms [7, 8]. The hybrid
hierarchical partitioning algorithms [9] combine the advantages of the partitional and hierarchical
clustering. Bisecting K-Means [10] is a hybrid hierarchical partitioning algorithm that produces the
required k clusters by performing a sequence of k − 1 repeated bisections. In this approach, the input
dataset is first clustered into two groups using K-means (2-way), and then one of these groups is
selected and bisected further. This process continuous until the desired number of clusters is found. It
is shown in [10] that the hierarchical partitioning algorithm produces better clustering than the
traditional clustering algorithms.
We have developed a new clustering algorithm Distance Based Clustering Algorithm (DBCA)
that unlike K-Means does not perform random generation of the initial centers and does not produce
different results for the same input data. The proposed algorithm produce better quality clusters than
the partitional clustering algorithm, agglomerative hierarchical clustering algorithm and the
hierarchical partitioning clustering algorithm.
II. RELATED WORK
Cluster center initialization algorithm CCIA [4] is based on observation, that some of the
patterns are very similar to each other and that is why they have same cluster membership irrespective
to the choice of initial cluster centers. Also, an individual attribute may provide some information
about initial cluster center. The experimental results show the improved performance of the proposed
algorithm.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME
111
A technique for refining initial centroids for K-means algorithm is presented in [11]. The
method proposed refines the initial points by analyzing distribution of data and probability of data
density. The method operates over small subsamples of a given data base, hence requiring a small
proportion of the total memory needed to store the full database. The clustering results obtained by
refined initial centroids are superior to the randomly chosen initial centroids in K-means algorithm.
A global K-means [12] clustering algorithm incrementally adds one cluster center at a time and
performs n executions of the K-means algorithm from appropriate initial positions. Here n is the size of
the data set. An improved version of K-means proposed in [13] evaluates the distance between every
pair of data points and then finds out those data points which are similar. This method finally chooses
the initial centroids according to these data points found. This method produces better clusters as
compared to the original K-means algorithm.
Optimized K-means algorithm proposed in [14] spreads the initial centroids in the feature
space so that the distances among them are as far as possible. An efficient enhanced K-means method
is proposed in [15] for assigning data points to clusters. The original K-means algorithm is having high
time complexity because each iteration computes the distances between data points and all the
centroids. The method used in [15] uses a distance functions based on a heuristics to reduce the number
of distance calculations. This improves the execution time of the K-means algorithm. Initial centroids
are determined randomly like K-means algorithm, therefore this method also produces different
clusters in every run of the algorithm like in K-means algorithm. Thus the clustering results are same
as K-means algorithm but the time complexity of this algorithm is lower than K-means. A new
algorithm for finding the initial centroids is proposed in [16] by embedding Hierarchical clustering
into K-means clustering.
An algorithm in [17] computes initial cluster centers for K-means algorithm by partitioning the
data set in a cell using a cutting plane that divides cell in two smaller cells. The plane is perpendicular
to the data axis with the highest variance and is designed to reduce the sum squared errors of the two
cells as much as possible, while at the same time keep the two cells far apart as possible. Also they
partitioned the cells one at a time until the number of cells equals to the predefined number of clusters,
k. In their method the centers of the k cells become the initial cluster centers for K-means algorithm.
A clustering approach inspired by the process of placing pillars in order to make a stable house is
proposed in [18]. This technique chooses the position of initial centroids by using the farthest
accumulated distance metric between each data point and all previous centroids, and then, a data point
which has the maximum distance will be selected.
III. PROPOSED ALGORITHM
We have developed an algorithm Distance Based Clustering Algorithm (DBCA) for clustering
that finds out a good set of initial centroids from the given dataset. The selection of initial centroids is
done such that the performance of K-means clustering is improved. The proposed algorithm produces
better quality clusters than the partitional clustering algorithm, agglomerative hierarchical clustering
algorithm and the hierarchical partitioning clustering algorithm.
A. Basic Concept
The K-Means algorithm performs random generation of the initial centroids that may result in
placement of initial centroids very close to each other. Due to this, these initial centroids may be
trapped in local optima. The technique proposed in this paper selects a good set of initial centroids in
such a way that they are spread out within the data space and are as far as possible from each other. Fig.
1 illustrates the selection of four initial centroids C1, C2, C3 and C4 with the proposed technique. As is
evident there are four clusters in the data space. The proposed algorithm tries to determine the four
groups that are evident in the given dataset. The algorithm then defines one initial centroid in each of
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME
112
these four groups by taking average of all the points of that group. The refined initial centroid
condition keeps C1, C2, C3 and C4 distributed as far as possible from each other and makes K-means
to converge to a better local minimum.
Fig. 1. Illustration of selecting a good set of initial centroids.
B. Distance Based Clustering
Let the clustering of n data points in the given dataset D is to be done into k clusters. In
Distance Based Clustering Algorithm (DBCA), we calculate the distance of each data point di= 1 to n in
the given dataset D from all other data points and store these distances in a distance matrix DM. We
now calculate total distance of each data point di= 1 to n with all other data points. The total distance for
a point di is sum of all elements in the row of the DM corresponding to di. These sums are stored in a
sum of distances vector SD. The vector SD is sorted in decreasing order of total distance values. Let
P-SD is the vector of data points corresponding to the sorted vector SD, i.e. P-SD [1] will be the data
point whose sum of distances (available in SD [1]) from all other data points is maximum and P-SD
[2] will be the data point whose sum of distances (available in SD [2]) from all other data points is
second highest. In general P-SD [i] will be the data point whose sum of distances from all other data
points is in SD [i].
Now we define a variable x as shown in Eq. (1).
x = floor (n/k) (1)
Here, the floor (n/k) function maps the real number (n/k) to the largest previous integer i.e. it
returns the largest integer not greater than (n/k).
We now find the average av of first x number of points of the vector P-SD and make this
average av the first initial centroid. Put this initial centroid point av in the set S of initial centroids. Now
discard these first x numbers of points from the vector P-SD and find the average av of next x number
of points of the vector P-SD. Make this average av the second initial centroid and put it in the set S of
initial centroids. Now discard these x numbers of points of the vector P-SD and find the average av of
next x number of points of the vector P-SD. Make this average av the third initial centroid and put it in
the set S of initial centroids. This process is repeated till k numbers of initial centroids are defined.
These k initial centroids are now used in the K-means process as substitute for the k random initial
centroids. K-means is now invoked for clustering the dataset D into k number of clusters using the
initial centroids available in set S.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME
113
Algorithm DBCA:
Input: Graph G of n data points {d1, d2,…,dn} in D and k the number of clusters.
Output: Initial k cluster centroids and the set of k clusters.
1. Find the distance matrix DM from G;
2. Calculate the sum vector SD from DM;
3. SD = Sort (SD, decreasing); // sort SD in decreasing order
4. P-SD = data points corresponding to the sorted vector SD
5. ck = 1; // ck indicates counter for number of initial centroids
6. x = floor (n/k)
7. sum =0;
8. for i = 1 to x // // average of first x number of points corresponding to the SD[i]
sum = sum + P-SD[i];
end-do
9. av = sum/x;
10. S = av; // S is set of final initial centroids
11. ck = ck +1;
12. while (ck <=k)
{
sum = 0;
delete the starting x number of points from P-SD; // discard the starting x points
for i = 1 to x // average of next x number of points corresponding to the SD[i]
sum = sum + P-SD[i];
end-do
av = sum/x; // next point to be chosen as initial centroid
S = S UNION av;
ck = ck + 1;
} End while
13. K-means(D, S, k);
IV. EXPERIMENTAL RESULTS
The experiments are conducted on synthetic and real data points to compute different sets of
clusters using the partitional clustering algorithm, agglomerative hierarchical clustering algorithm,
hierarchical partitioning clustering algorithm and the proposed DBCA algorithm. The experiments are
performed on core i5 processor with a speed of 2.5 GHz and 4 GB RAM using Matlab. The
comparison of the quality of the clustering achieved with DBCA is made with the quality of the
clustering achieved with:
1) Partitional clustering technique [1, 2, 3]. The K-means is the partitional clustering technique
available as a built in function in Matlab [19].
2) Hierarchical clustering technique [1, 10]. The ClusterData is agglomerative hierarchical
clustering technique available as a built in function in Matlab [19]. The single linkage is the
default option used in ClusterData to create hierarchical cluster tree.
3) Hierarchical partitioning technique [10]. CLUTO [20] is a software package for clustering
datasets and for analyzing the characteristics of the various clusters. CLUTO contains both
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME
114
partitional and hierarchical clustering algorithms. The repeated bisections method available in
CLUTO is a hierarchical partitioning algorithm that produces the required k clusters by
performing a sequence of k − 1 repeated bisections. In this approach, the input dataset is first
clustered into two groups, and then one of these groups is selected and bisected further. This
process continuous until the desired number of clusters is found. This effectively is the bisect
K-means divisive clustering algorithm and is the default option in CLUTO named as Cluto-rb.
Recall and Precision [21] has been used to evaluate the quality of clustering achieved. Recall is
defined as the proportion of data points that have been correctly put into a cluster among all the
relevant points that should have been in that cluster. Precision is defined as the proportion of data
points that have been correctly put into a cluster among all the points put into that cluster.
Recall = Data points correctly put into a cluster / All the points that should have been in that cluster
Precision = Data points correctly put into a cluster / total points put into that cluster
The comparison of the quality of the clustering achieved with DBCA and other algorithms is
made in terms of the percentage accuracy of grouping of objects in correct clusters. The recall
accuracy is calculated by dividing the number of relevant data points correctly put in a particular group
(by running the clustering algorithm) by the total number of data points that should have ideally been
in that group. The precision accuracy is calculated by dividing the number of relevant data points
correctly put in a particular group (by running the clustering algorithm) by the total number of all data
points put in that group.
A. Experiments on Synthetic Dataset R2400, R12000 and R24000 datasets
We have created the synthetic datasets by generating random data points in 2 dimensions. We
have generated four synthetic datasets with 2400, 12000 and 24000 records in 2 dimensions. Let the
names of these datasets are R2400, R12000 and R24000 respectively. These random data points are
generated and distributed among 12 groups. Equal numbers of data points are generated in each group.
The range of points in the first group is chosen from [0,100] to [200,300]. The range of points in the
second group is [0,700] to [200,900]. The range of points in the third group is from [0, 1300] to [200,
1500]. The range in the fourth group is from [0, 1900] to [200, 2100]. The range of points in the fifth
group is from [0, 2500] to [200, 2700]. The range of points in the sixth group is from [0, 3100] to [200,
3300]. The range of points in the seventh group is from [0, 3700] to [200, 3900]. The range of points in
the eighth group is from [0, 4300] to [200, 4500]. The range of points in the ninth group is from [0,
4900] to [200, 5100]. The range of points in the tenth group is from [0, 5500] to [200, 5700]. The range
of points in the eleventh group is from [0, 6100] to [200, 6300], and finally the range of points in the
twelfth group is from [0, 6700] to [200, 6900].
The recall and precision for creating twelve clusters using K-means, ClusterData, Cluto-rb and
the DBCA algorithm is found for datasets to see how correctly the data points are put into a cluster. The
results of clustering using these algorithms are shown in Table I. The graph plotted for average recall
in percentage using these techniques is shown in Fig. 2 and the graph plotted for average precision in
percentage using these techniques is shown in Fig. 3. As can be seen from the results, DBCA algorithm
performs 100 % correct clustering and is far better than the K-means and the Cluto-rb algorithm.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME
115
TABLE I: Average Recall and Average Precision for synthetically generated datasets
Fig. 2. Average Recall for synthetic datasets.
Fig. 3. Average Precision for synthetic datasets.
R15 dataset
R15 [22] is a 2 dimensional shape dataset with 600 vectors. These 600 vectors belong to 15
classes with 40 vectors per class. These 15 clusters of 40 points each are positioned in two rings (7
clusters in each ring) around a central cluster. The dataset is downloaded from the website [23].
We have formed 15 clusters of these images using K-means, ClusterData, Cluto-rb and DBCA
algorithm. The average recall and precision using these algorithms is determined and shown in Table
II. The graph plotted for average recall in percentage using these techniques is shown in Fig. 2 and the
graph plotted for average precision in percentage using these techniques is shown in Fig. 3. The results
show that the recall and precision of DBCA algorithm is better than the recall and precision of
K-means, ClusterData and Cluto-rb. Hence the DBCA algorithm produces better quality clusters.
0
20
40
60
80
100
120
K-means
ClusterData
Cluto-rb
DBCA
A
v
g
r
e
c
a
l
l
0
20
40
60
80
100
120
K-means
ClusterData
Cluto-rb
DBCA
A
v
g
p
r
e
c
i
s
i
o
n
Algorithm
R2400 R12000 R24000
Recall
(%)
Precision
(%)
Recall
(%)
Precision
(%)
Recall
(%)
Precision
(%)
DBCA 100 100 100 100 100 100
K-means 78.21 91.67 75.13 75 65.83 65.83
ClusterData 100 100 100 100 100 100
Cluto-rb 22.67 26.03 21.63 25.92 18.5 17.83
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME
116
TABLE II: Average Recall and Average Precision for R15 shape datasets
Algorithm Average
Recall (%)
Average
Precision (%)
DBCA 39.87 99.67
K-means 32.47 78.88
ClusterData 28.93 74.36
Cluto-rb 25.80 62.77
B. Experiments on real web datasets
The proposed algorithm DBCA has been implemented by us on web datasets of text and
images. We have first processed and described the image datasets in terms of features that are inherent
in the images themselves. The features extracted from the images are smaller in size as compared to the
original image data and are being used in place of the images themselves for mining. These are
represented in the form of real–valued components called the feature vectors. The color moments are
used by us as the property in the feature extraction technique [24].
K-means, ClusterData and the Cluto-rb and the DBCA algorithm are applied to cluster the real
datasets. The image datasets are clustered over the extracted features. Recall and Precision of the
clusters formed using these techniques are calculated to determine and compare the quality of
clustering.
The datasets and experimental results achieved using K-means, ClusterData, Cluto-rb and
DBCA are described below.
TABLE III: Average Recall and Average Precision for real datasets
Algorithm
Corel (Wang) Wine
Recall
(%)
Precision
(%)
Recall
(%)
Precision
(%)
DBCA 100 100 41.67 72
K-means 60.7 86.71 33.67 61
ClusterData 100 100 25.37 47
Cluto-rb 33.29 27.91 39.67 69
Corel (Wang) Dataset
Corel (Wang) database consists of 700 images that are a subset of 1,000 images of the Corel
stock photo database which have been manually selected and which form 10 classes of 100 images
each. This database is downloaded from website [25] and contains 1025 features per image.
We have formed 10 clusters of these images using K-means, ClusterData, Cluto-rb and DBCA
algorithm. The average recall and precision using these algorithms is determined and shown in Table
III. The graph plotted for average recall in percentage using these techniques is shown in Fig. 4 and the
graph plotted for average precision in percentage using these techniques is shown in Fig. 5. The results
show that the recall and precision of DBCA are better than the recall and precision of K-means,
ClusterData and Cluto-rb. Hence the DBCA algorithm produces better quality clusters.
Wine Dataset
The wine recognition dataset [26] results from the chemical analysis of wines grown in the
same region in Italy but derived from three different cultivars. The analysis determined the quantities
of 13 constituents found in each of the three types of classes of wines. The dataset consists of 178
instances. The number of instances in class 1, class2 and class3 are 59, 71 and 48 respectively. The
wine dataset is downloaded from the website [27].
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME
117
We have formed 3 clusters of these images using K-means, ClusterData, Cluto-rb and DBCA
algorithm. The average recall and precision using these algorithms is determined and shown in Table
3. The graph plotted for average recall in percentage using these techniques is shown in Figure 4 and
the graph plotted for average precision in percentage using these techniques is shown in Figure 5. The
results show that the recall and precision of DBCA are better than the recall and precision of K-means,
ClusterData and Cluto-rb. Hence the DBCA algorithm produces better quality clusters.
Fig. 4. Average Recall for real datasets.
Fig.5. Average Precision for real datasets.
V. CONCLUSION
The quality of clustering of K-means algorithm depends on the initialization of k data points.
These k initial centroids are chosen randomly by the K-means algorithm and this is the major weakness
of K-means. The clustering algorithm proposed in this paper is based on the assumption that good
clusters are formed when the choice of initial k centroids is such that they are as far as possible from
each other. DBCA algorithm proposed here is based on computing the total distance of a node from all
other nodes. The algorithm is tested on both synthetic database and real database from web. The
experimental results show the effectiveness of the idea and the algorithm produce better quality
clusters than the partitional clustering algorithm, agglomerative hierarchical clustering algorithm and
the hierarchical partitioning clustering algorithm.
0
20
40
60
80
100
120
Corel (Wang) Wine
K-Means
ClusterData
Cluto-rb
DBCA
0
20
40
60
80
100
120
Corel (Wang) Wine
K-Means
ClusterData
Cluto-rb
DBCA
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME
118
REFERENCES
[1] Margaret Dunham, Data Mining – Introductory and advanced concepts, Pearson Education,
2006.
[2] J. B. MacQueen, “Some Methods for Classification and Analysis of Multivariate
Observations,” in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and
Probability, Berkeley, University of California Press, 1967, pp. 281-297.
[3] S. Lloyd, “Least Squares Quantization in PCM,” IEEE Transactions on Information Theory,
Vol. 28(2), pp. 129-137, 1982.
[4] S. S. Khan and A. Ahmad, “Cluster center initialization algorithm for k-means algorithm,”
Pattern Recognition Letters, Vol. 25(11), pp. 1293-1302, 2004.
[5] Anil K. Jain and Richard C. Dubes, Algorithm for Clustering in Data, Prentice Hall, 1988.
[6] B. Larsen and C. Aone, “Fast and Effective Text Mining using Linear-Time Document
Clustering,” in Proceedings of the 5th ACM SIGKDD International Conference on Knowledge
Discovery and Data, San Diego, CA, USA, 1999, pp. 16-22.
[7] Rui Xu and Donald Wunsch, “Survey of clustering algorithms,” IEEE Transactions on Neural
Networks, Vol. 16(3), pp. 645-678, 2005.
[8] M. Steinbach, G. Karypis and Vipin Kumar, “A comparison of document clustering
techniques,” KDD Workshop on Text Mining, Boston, MA, 2000, pp. 109-111.
[9] Ying Zhao and George Karypis, “Evaluation of Hierarchical Clustering Algorithms for
Document Datasets,” in Proceedings of the 11th International Conference on Information and
Knowledge Management, New York, USA, ACM press, 2002, pp. 515-524.
[10] Keerthiram Murugesam and Jun Zhang, “Hybrid hierarchical clustering: An experimental
analysis,” Technical Report, No. CMIDA-HiPSCCS 001-11, Department of Computer
Science, University of Kentucky, Lexington, KY, 2011.
[11] P. S. Bradley and U. M. Fayyad, “Refining initial points for K-Means clustering,” in
Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann,
San Francisco, CA, 1998, pp. 91-99.
[12] A. Likas, N. Vlassis and J. J. Verbeek, “The global k-means clustering algorithm,” Journal of
Pattern Recognition, Vol. 36(2), pp. 451-461, 2003.
[13] Fang Yuan, Zeng-Hui Meng, Hong-Xia Zhang and Chun-Ru Dong, “A new algorithm to Get
the Initial Centroids,” in Proceedings of the 13th International Conference on Machine
Learning and Cybernetics, Shanghai, 2004, pp. 26-29.
[14] A. R. Barakbah and A. Helen, “Optimized K-means: An algorithm of initial centroids
optimization for K-means,” in Proceedings of Soft Computing, Intelligent Systems and
Information Technology (SIIT), Surabaya, Indonesia, 2005, pp. 263-266.
[15] A. M. Fahim, A. M. Salem, F. A. Torkey and M. A. Ramadan, “An efficient enhanced k-means
clustering algorithm,” Journal of Zhejiang University Science, Vol. 7(10), pp 1626-1633,
2006.
[16] A. R. Barakbah and K. Arai, “Hierarchical K-means: An algorithm for centroids initialization
for K-means,” Reports of the faculty of Science & Engineering, Saga University, Japan, Vol.
36(1), 2007.
[17] S. Deelers and S. Auwatanamongkol, “Engineering k-means algorithm with initial cluster
centers derived from data partitioning along the data axis with the highest variance,” in
Proceedings of the World Academy of Science, Engineering and Technology, Vol. 26, 2007,
pp. 323-328.
[18] A. R. Barakbah and Y. Kiyoki, “A Pillar Algorithm for K-means optimization by distance
maximization for initial centroid designation,” IEEE Symposium on Computational
Intelligence and Data Mining (IDM), Nashville-Tennessee, 2009, pp. 61-68.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME
119
[19] http://www.mathworks.in/products/matlab/
[20] George Karypis, “Cluto: A Clustering Toolkit. Release 2.1.1,” Technical Report, No. 02-017,
University of Minnesota, Department of Computer Science, Minneapolis, MN 55455.
www.cs.umn.edu/~karypis/cluto, 2003.
[21] Gerald Kowalski, Information Retrieval Systems – Theory and Implementation, Kluwer
Academic Publishers, 1977.
[22] C. J. Veenman, M. J. T. Reinders and E. Backer, “A maximum variance cluster algorithm,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24(9), pp. 1273-1280,
2002.
[23] http://cs.joensuu.fi/sipu/datasets/
[24] M. Stricker and M. Orengo, “Similarity of color images,” in Proceedings of Storage and
Retrieval for Image and Video Databases III (SPIE), San Jose, CA, USA, 1995, pp. 381-392.
[25] http://users.dsic.upv.es/~rparedes/english/research/rise/MiPRCV-WP6-RF/
[26] S. Aeberhard, D. Coomans and O. de Vel, “Comparison of Classifiers in High Dimensional
Settings,” Technical Report, Dept. of Computer Science and Dept. of Mathematics and
Statistics, James Cook University of North Queensland, Vol. 92-02, 1992.
[27] http://archive.ics.uci.edu/ml/datasets/Wine
AUTHORS DETAILS
NEETI ARORA was born in Bhopal on 25th
November 1982. She received
her master’s degree in computer applications from Rajiv Gandhi Technical
University, Madhya Pradesh, India in 2007. She has served as LECTURER in
computer application department at Laxmi Narayan College of Technology for 3
years. Currently she is PhD student at Rajiv Gandhi Technical University,
Madhya Pradesh, India. She has published 2 research papers in International
Journals. Her Research Interests are Data Mining and Databasse Management
Systems.
Dr. MAHESH MOTWANI was born in Bhopal on 4th
May 1969. He
received his Bachelor’s degree in Computer Engineering from MANIT Bhopal
India in 1990, Master’s degree in Computer Engineering from I.I.T. Delhi in 1998
and PhD in Computer Engineering from Rajiv Gandhi Technical University,
Madhya Pradesh, India in 2007. He has teaching experience of about 21 years
Before joining teaching in the year 1994 he served as scientist in National
Informatics Center, Government of India. He has served as Head of Computer
science and Engineering department at Government Engineering college, Bhopal
for 7 years and as Head of Information Technology department at Rajiv Gandhi
Technical University, Madhya Pradesh, India for 3 years. Currently he is Deputy
Secretary in the Technical Board at Rajiv Gandhi Technical University, Madhya
Pradesh, India. He has published around 27 research papers in national and
international journals and 20 papers in national and international conferences. He
is also author of one book on “Cryptography and Network Security”. His
Research Interests are Data Mining, Database Management Systems, Network
Security, Computer Graphics and Image Processing. He is life member of ISTE
and senior member of CSI.

Contenu connexe

Tendances

Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniquesPoonam Kshirsagar
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimizationAlexander Decker
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewVahid Mirjalili
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering AlgorithmLino Possamai
 
DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering TypesAshwin Shenoy M
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysisguru_prasadg
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2IAEME Publication
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means ClusteringAnna Fensel
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
 
Clustering
ClusteringClustering
ClusteringMeme Hei
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsPrashanth Guntal
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
 

Tendances (20)

Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimization
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
Chapter8
Chapter8Chapter8
Chapter8
 
Dataa miining
Dataa miiningDataa miining
Dataa miining
 
DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering Types
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
A046010107
A046010107A046010107
A046010107
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Clustering
ClusteringClustering
Clustering
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
 

En vedette

Investigation of surface roughness in machining of aisi 1040 steel
Investigation of surface roughness in machining of aisi 1040 steelInvestigation of surface roughness in machining of aisi 1040 steel
Investigation of surface roughness in machining of aisi 1040 steelIAEME Publication
 
An experimental study on kerosene based pulse detonation engine
An experimental study on kerosene based pulse detonation engineAn experimental study on kerosene based pulse detonation engine
An experimental study on kerosene based pulse detonation engineIAEME Publication
 
The effect of addition of nanofillers on three body abrasive wear behavior of...
The effect of addition of nanofillers on three body abrasive wear behavior of...The effect of addition of nanofillers on three body abrasive wear behavior of...
The effect of addition of nanofillers on three body abrasive wear behavior of...IAEME Publication
 
Optimization of engine operating parameters
Optimization of engine operating parametersOptimization of engine operating parameters
Optimization of engine operating parametersIAEME Publication
 
Design of computer aided process plan for a casing cover plate
Design of computer aided process plan for a casing cover plateDesign of computer aided process plan for a casing cover plate
Design of computer aided process plan for a casing cover plateIAEME Publication
 
3 d flow analysis of an annular diffuser with and without struts
3 d flow analysis of an annular diffuser with and without struts3 d flow analysis of an annular diffuser with and without struts
3 d flow analysis of an annular diffuser with and without strutsIAEME Publication
 
Challenges and opportunities of ict in management education
Challenges and opportunities of ict in management educationChallenges and opportunities of ict in management education
Challenges and opportunities of ict in management educationIAEME Publication
 
Analysis of notch sensitivity factor for ss420 and ss431 over en24
Analysis of notch sensitivity factor for ss420 and ss431 over en24Analysis of notch sensitivity factor for ss420 and ss431 over en24
Analysis of notch sensitivity factor for ss420 and ss431 over en24IAEME Publication
 
Smoothness running of train on uneven tracks with the help of air springs
Smoothness running of train on uneven tracks with the help of air springsSmoothness running of train on uneven tracks with the help of air springs
Smoothness running of train on uneven tracks with the help of air springsIAEME Publication
 
Optimization of snap design parameters to avoid bursting and ring formation o...
Optimization of snap design parameters to avoid bursting and ring formation o...Optimization of snap design parameters to avoid bursting and ring formation o...
Optimization of snap design parameters to avoid bursting and ring formation o...IAEME Publication
 
Development and evaluation of some of the mechanical properties of kenaf poly...
Development and evaluation of some of the mechanical properties of kenaf poly...Development and evaluation of some of the mechanical properties of kenaf poly...
Development and evaluation of some of the mechanical properties of kenaf poly...IAEME Publication
 
Overall equipment effectiveness of critical machines in manufacturing industries
Overall equipment effectiveness of critical machines in manufacturing industriesOverall equipment effectiveness of critical machines in manufacturing industries
Overall equipment effectiveness of critical machines in manufacturing industriesIAEME Publication
 
Analysis of tensile behavior hybrid carbon jute fiber reniforced epoxy composite
Analysis of tensile behavior hybrid carbon jute fiber reniforced epoxy compositeAnalysis of tensile behavior hybrid carbon jute fiber reniforced epoxy composite
Analysis of tensile behavior hybrid carbon jute fiber reniforced epoxy compositeIAEME Publication
 
Technologies of sulphur dioxide reduction in coal fired thermal power plan
Technologies of sulphur dioxide reduction in coal fired thermal power planTechnologies of sulphur dioxide reduction in coal fired thermal power plan
Technologies of sulphur dioxide reduction in coal fired thermal power planIAEME Publication
 
Effect of tip clearance on a centrifugal compressor
Effect of tip clearance on a centrifugal compressorEffect of tip clearance on a centrifugal compressor
Effect of tip clearance on a centrifugal compressorIAEME Publication
 
Cutting force and surface roughness in cryogenic machining of elastomer
Cutting force and surface roughness in cryogenic machining of elastomerCutting force and surface roughness in cryogenic machining of elastomer
Cutting force and surface roughness in cryogenic machining of elastomerIAEME Publication
 
Preguntas de investigacion mateo
Preguntas de investigacion mateoPreguntas de investigacion mateo
Preguntas de investigacion mateoIE Simona Duque
 

En vedette (20)

10120140505006
1012014050500610120140505006
10120140505006
 
Investigation of surface roughness in machining of aisi 1040 steel
Investigation of surface roughness in machining of aisi 1040 steelInvestigation of surface roughness in machining of aisi 1040 steel
Investigation of surface roughness in machining of aisi 1040 steel
 
An experimental study on kerosene based pulse detonation engine
An experimental study on kerosene based pulse detonation engineAn experimental study on kerosene based pulse detonation engine
An experimental study on kerosene based pulse detonation engine
 
The effect of addition of nanofillers on three body abrasive wear behavior of...
The effect of addition of nanofillers on three body abrasive wear behavior of...The effect of addition of nanofillers on three body abrasive wear behavior of...
The effect of addition of nanofillers on three body abrasive wear behavior of...
 
Optimization of engine operating parameters
Optimization of engine operating parametersOptimization of engine operating parameters
Optimization of engine operating parameters
 
Design of computer aided process plan for a casing cover plate
Design of computer aided process plan for a casing cover plateDesign of computer aided process plan for a casing cover plate
Design of computer aided process plan for a casing cover plate
 
3 d flow analysis of an annular diffuser with and without struts
3 d flow analysis of an annular diffuser with and without struts3 d flow analysis of an annular diffuser with and without struts
3 d flow analysis of an annular diffuser with and without struts
 
Challenges and opportunities of ict in management education
Challenges and opportunities of ict in management educationChallenges and opportunities of ict in management education
Challenges and opportunities of ict in management education
 
Analysis of notch sensitivity factor for ss420 and ss431 over en24
Analysis of notch sensitivity factor for ss420 and ss431 over en24Analysis of notch sensitivity factor for ss420 and ss431 over en24
Analysis of notch sensitivity factor for ss420 and ss431 over en24
 
Smoothness running of train on uneven tracks with the help of air springs
Smoothness running of train on uneven tracks with the help of air springsSmoothness running of train on uneven tracks with the help of air springs
Smoothness running of train on uneven tracks with the help of air springs
 
Optimization of snap design parameters to avoid bursting and ring formation o...
Optimization of snap design parameters to avoid bursting and ring formation o...Optimization of snap design parameters to avoid bursting and ring formation o...
Optimization of snap design parameters to avoid bursting and ring formation o...
 
Development and evaluation of some of the mechanical properties of kenaf poly...
Development and evaluation of some of the mechanical properties of kenaf poly...Development and evaluation of some of the mechanical properties of kenaf poly...
Development and evaluation of some of the mechanical properties of kenaf poly...
 
Overall equipment effectiveness of critical machines in manufacturing industries
Overall equipment effectiveness of critical machines in manufacturing industriesOverall equipment effectiveness of critical machines in manufacturing industries
Overall equipment effectiveness of critical machines in manufacturing industries
 
Analysis of tensile behavior hybrid carbon jute fiber reniforced epoxy composite
Analysis of tensile behavior hybrid carbon jute fiber reniforced epoxy compositeAnalysis of tensile behavior hybrid carbon jute fiber reniforced epoxy composite
Analysis of tensile behavior hybrid carbon jute fiber reniforced epoxy composite
 
Technologies of sulphur dioxide reduction in coal fired thermal power plan
Technologies of sulphur dioxide reduction in coal fired thermal power planTechnologies of sulphur dioxide reduction in coal fired thermal power plan
Technologies of sulphur dioxide reduction in coal fired thermal power plan
 
Effect of tip clearance on a centrifugal compressor
Effect of tip clearance on a centrifugal compressorEffect of tip clearance on a centrifugal compressor
Effect of tip clearance on a centrifugal compressor
 
Cutting force and surface roughness in cryogenic machining of elastomer
Cutting force and surface roughness in cryogenic machining of elastomerCutting force and surface roughness in cryogenic machining of elastomer
Cutting force and surface roughness in cryogenic machining of elastomer
 
40220140506006
4022014050600640220140506006
40220140506006
 
20320140503037
2032014050303720320140503037
20320140503037
 
Preguntas de investigacion mateo
Preguntas de investigacion mateoPreguntas de investigacion mateo
Preguntas de investigacion mateo
 

Similaire à 50120140505013

An improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyAn improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyijpla
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1bPRAWEEN KUMAR
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
 
Max stable set problem to found the initial centroids in clustering problem
Max stable set problem to found the initial centroids in clustering problemMax stable set problem to found the initial centroids in clustering problem
Max stable set problem to found the initial centroids in clustering problemnooriasukmaningtyas
 
CLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATACLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATAcsandit
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel basedIJITCA Journal
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptionsrefedey275
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfSowmyaJyothi3
 
Unsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptxUnsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptxAnupama Kate
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptxGandhiMathy6
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...IOSR Journals
 

Similaire à 50120140505013 (20)

An improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyAn improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracy
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
Noura2
Noura2Noura2
Noura2
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
 
Max stable set problem to found the initial centroids in clustering problem
Max stable set problem to found the initial centroids in clustering problemMax stable set problem to found the initial centroids in clustering problem
Max stable set problem to found the initial centroids in clustering problem
 
CLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATACLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATA
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel based
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
Unsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptxUnsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptx
 
K means report
K means reportK means report
K means report
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptx
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
 

Plus de IAEME Publication

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME Publication
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...IAEME Publication
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSIAEME Publication
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSIAEME Publication
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSIAEME Publication
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSIAEME Publication
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOIAEME Publication
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IAEME Publication
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYIAEME Publication
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...IAEME Publication
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEIAEME Publication
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...IAEME Publication
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...IAEME Publication
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...IAEME Publication
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...IAEME Publication
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...IAEME Publication
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...IAEME Publication
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...IAEME Publication
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...IAEME Publication
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTIAEME Publication
 

Plus de IAEME Publication (20)

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdf
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICE
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
 

Dernier

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

50120140505013

  • 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME 109 A DISTANCE BASED CLUSTERING ALGORITHM 1 Neeti Arora, 2 Dr.Mahesh Motwani 1 Research student Rajiv Gandhi Technical University, Bhopal, India 2 Department of Computer Engineering, Rajiv Gandhi Technical University, Bhopal, India ABSTRACT Clustering is an unsupervised data mining technique used to determine the objects that are similar in characteristics and group them together. K-means is a widely used partitional clustering algorithm but the performance of K-means strongly depends on the initial guess of centers (centroid) and the final cluster centroids may not be the optimal ones. Therefore it is important for K-means to have good choice of initial centroids. We have developed a clustering algorithm based on distance criteria to select a good set of initial centroids. Once some point d is selected as initial centroid, the proposed algorithm computes average of data points to avoid the points near to d from being selected as next initial centroids. These initial centroids are given as input to the K-means technique leading to a clustering algorithm that result in better clustering as compared to the K-means partition clustering algorithm, agglomerative hierarchical clustering algorithm and Hierarchical partitioning clustering algorithm. Keywords: Initial centroids; Recall; Precision; Partitional clustering; Agglomerative hierarchical clustering and Hierarchical partitioning clustering. I. INTRODUCTION Clustering is the data mining technique that groups the data objects into classes or clusters, so that objects within a cluster have high similarity in comparison to one another(intra-class similarity) but are very dissimilar to objects in other clusters(inter-class similarity).There are many clustering methods available, and each of them may give a different grouping of a dataset. These methods are divided into two basic types: hierarchical and partitional clustering [1]. Hierarchical clustering either merges smaller clusters into larger ones (Agglomerative) or splits larger clusters into smaller ones (Divisive). Agglomerative clustering starts with one point in each cluster and at each step selects and merges two most clusters that are most similar in characteristics. The process is repeated till the required number of clusters is formed. Divisive clustering starts by considering all data points in one cluster and splits them until the required number of clusters is formed. The agglomerative and divisive clustering methods further differ in the rule by which it is decided which two small clusters are merged INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2014): 8.5328 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME 110 or which large cluster is split. Partitional clustering, attempts to directly decompose the data set into a set of disjoint clusters. Each object is a member of the cluster with which it is most similar. K-means [2, 3] is one of the most widely used partitional clustering algorithms. The algorithm clusters the n data points into k groups, where k is provided as an input parameter. It defines k centroids, one for each cluster. For this k data points are selected at random from D as initial centroids. The next step is to take each point belonging to the given data set and assign it to the nearest centroid. Euclidean distance is generally considered to determine the distance between data points and the centroids. When all the points are included in some clusters, the mean of each cluster is calculated to find the new centroids for each cluster. It then assigns all the data points to clusters based upon their proximity to the mean of the cluster. The cluster’s mean is then recomputed and the process begins again. In due course, a situation is reached where the clusters centers do not change anymore. This becomes the convergence criterion for clustering. Algorithm K-means: Input: Dataset D of n data points {d1, d2,…,dn} and k the number of clusters Output: Set of k Clusters 1. Chose k data points at random from D as initial centroids. 2. Repeat Assign each data point in D to the group that has the closest centroid. Recalculate the positions of the k centroids by taking the mean. Until the centroids no longer move. The K-means algorithm has a drawback that it results in different types of clusters depending on the initialization process, which randomly generates the initial centers. Thus the performance of K-means depends on the initial guess of centers as different runs of K-means on the same input data might produce different results [4]. Researchers [5, 6] have found that hierarchical clustering algorithms are better than partitional clustering algorithms in terms of clustering quality. But the computational complexity of hierarchical clustering algorithms is higher than that of partitional clustering algorithms [7, 8]. The hybrid hierarchical partitioning algorithms [9] combine the advantages of the partitional and hierarchical clustering. Bisecting K-Means [10] is a hybrid hierarchical partitioning algorithm that produces the required k clusters by performing a sequence of k − 1 repeated bisections. In this approach, the input dataset is first clustered into two groups using K-means (2-way), and then one of these groups is selected and bisected further. This process continuous until the desired number of clusters is found. It is shown in [10] that the hierarchical partitioning algorithm produces better clustering than the traditional clustering algorithms. We have developed a new clustering algorithm Distance Based Clustering Algorithm (DBCA) that unlike K-Means does not perform random generation of the initial centers and does not produce different results for the same input data. The proposed algorithm produce better quality clusters than the partitional clustering algorithm, agglomerative hierarchical clustering algorithm and the hierarchical partitioning clustering algorithm. II. RELATED WORK Cluster center initialization algorithm CCIA [4] is based on observation, that some of the patterns are very similar to each other and that is why they have same cluster membership irrespective to the choice of initial cluster centers. Also, an individual attribute may provide some information about initial cluster center. The experimental results show the improved performance of the proposed algorithm.
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME 111 A technique for refining initial centroids for K-means algorithm is presented in [11]. The method proposed refines the initial points by analyzing distribution of data and probability of data density. The method operates over small subsamples of a given data base, hence requiring a small proportion of the total memory needed to store the full database. The clustering results obtained by refined initial centroids are superior to the randomly chosen initial centroids in K-means algorithm. A global K-means [12] clustering algorithm incrementally adds one cluster center at a time and performs n executions of the K-means algorithm from appropriate initial positions. Here n is the size of the data set. An improved version of K-means proposed in [13] evaluates the distance between every pair of data points and then finds out those data points which are similar. This method finally chooses the initial centroids according to these data points found. This method produces better clusters as compared to the original K-means algorithm. Optimized K-means algorithm proposed in [14] spreads the initial centroids in the feature space so that the distances among them are as far as possible. An efficient enhanced K-means method is proposed in [15] for assigning data points to clusters. The original K-means algorithm is having high time complexity because each iteration computes the distances between data points and all the centroids. The method used in [15] uses a distance functions based on a heuristics to reduce the number of distance calculations. This improves the execution time of the K-means algorithm. Initial centroids are determined randomly like K-means algorithm, therefore this method also produces different clusters in every run of the algorithm like in K-means algorithm. Thus the clustering results are same as K-means algorithm but the time complexity of this algorithm is lower than K-means. A new algorithm for finding the initial centroids is proposed in [16] by embedding Hierarchical clustering into K-means clustering. An algorithm in [17] computes initial cluster centers for K-means algorithm by partitioning the data set in a cell using a cutting plane that divides cell in two smaller cells. The plane is perpendicular to the data axis with the highest variance and is designed to reduce the sum squared errors of the two cells as much as possible, while at the same time keep the two cells far apart as possible. Also they partitioned the cells one at a time until the number of cells equals to the predefined number of clusters, k. In their method the centers of the k cells become the initial cluster centers for K-means algorithm. A clustering approach inspired by the process of placing pillars in order to make a stable house is proposed in [18]. This technique chooses the position of initial centroids by using the farthest accumulated distance metric between each data point and all previous centroids, and then, a data point which has the maximum distance will be selected. III. PROPOSED ALGORITHM We have developed an algorithm Distance Based Clustering Algorithm (DBCA) for clustering that finds out a good set of initial centroids from the given dataset. The selection of initial centroids is done such that the performance of K-means clustering is improved. The proposed algorithm produces better quality clusters than the partitional clustering algorithm, agglomerative hierarchical clustering algorithm and the hierarchical partitioning clustering algorithm. A. Basic Concept The K-Means algorithm performs random generation of the initial centroids that may result in placement of initial centroids very close to each other. Due to this, these initial centroids may be trapped in local optima. The technique proposed in this paper selects a good set of initial centroids in such a way that they are spread out within the data space and are as far as possible from each other. Fig. 1 illustrates the selection of four initial centroids C1, C2, C3 and C4 with the proposed technique. As is evident there are four clusters in the data space. The proposed algorithm tries to determine the four groups that are evident in the given dataset. The algorithm then defines one initial centroid in each of
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME 112 these four groups by taking average of all the points of that group. The refined initial centroid condition keeps C1, C2, C3 and C4 distributed as far as possible from each other and makes K-means to converge to a better local minimum. Fig. 1. Illustration of selecting a good set of initial centroids. B. Distance Based Clustering Let the clustering of n data points in the given dataset D is to be done into k clusters. In Distance Based Clustering Algorithm (DBCA), we calculate the distance of each data point di= 1 to n in the given dataset D from all other data points and store these distances in a distance matrix DM. We now calculate total distance of each data point di= 1 to n with all other data points. The total distance for a point di is sum of all elements in the row of the DM corresponding to di. These sums are stored in a sum of distances vector SD. The vector SD is sorted in decreasing order of total distance values. Let P-SD is the vector of data points corresponding to the sorted vector SD, i.e. P-SD [1] will be the data point whose sum of distances (available in SD [1]) from all other data points is maximum and P-SD [2] will be the data point whose sum of distances (available in SD [2]) from all other data points is second highest. In general P-SD [i] will be the data point whose sum of distances from all other data points is in SD [i]. Now we define a variable x as shown in Eq. (1). x = floor (n/k) (1) Here, the floor (n/k) function maps the real number (n/k) to the largest previous integer i.e. it returns the largest integer not greater than (n/k). We now find the average av of first x number of points of the vector P-SD and make this average av the first initial centroid. Put this initial centroid point av in the set S of initial centroids. Now discard these first x numbers of points from the vector P-SD and find the average av of next x number of points of the vector P-SD. Make this average av the second initial centroid and put it in the set S of initial centroids. Now discard these x numbers of points of the vector P-SD and find the average av of next x number of points of the vector P-SD. Make this average av the third initial centroid and put it in the set S of initial centroids. This process is repeated till k numbers of initial centroids are defined. These k initial centroids are now used in the K-means process as substitute for the k random initial centroids. K-means is now invoked for clustering the dataset D into k number of clusters using the initial centroids available in set S.
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME 113 Algorithm DBCA: Input: Graph G of n data points {d1, d2,…,dn} in D and k the number of clusters. Output: Initial k cluster centroids and the set of k clusters. 1. Find the distance matrix DM from G; 2. Calculate the sum vector SD from DM; 3. SD = Sort (SD, decreasing); // sort SD in decreasing order 4. P-SD = data points corresponding to the sorted vector SD 5. ck = 1; // ck indicates counter for number of initial centroids 6. x = floor (n/k) 7. sum =0; 8. for i = 1 to x // // average of first x number of points corresponding to the SD[i] sum = sum + P-SD[i]; end-do 9. av = sum/x; 10. S = av; // S is set of final initial centroids 11. ck = ck +1; 12. while (ck <=k) { sum = 0; delete the starting x number of points from P-SD; // discard the starting x points for i = 1 to x // average of next x number of points corresponding to the SD[i] sum = sum + P-SD[i]; end-do av = sum/x; // next point to be chosen as initial centroid S = S UNION av; ck = ck + 1; } End while 13. K-means(D, S, k); IV. EXPERIMENTAL RESULTS The experiments are conducted on synthetic and real data points to compute different sets of clusters using the partitional clustering algorithm, agglomerative hierarchical clustering algorithm, hierarchical partitioning clustering algorithm and the proposed DBCA algorithm. The experiments are performed on core i5 processor with a speed of 2.5 GHz and 4 GB RAM using Matlab. The comparison of the quality of the clustering achieved with DBCA is made with the quality of the clustering achieved with: 1) Partitional clustering technique [1, 2, 3]. The K-means is the partitional clustering technique available as a built in function in Matlab [19]. 2) Hierarchical clustering technique [1, 10]. The ClusterData is agglomerative hierarchical clustering technique available as a built in function in Matlab [19]. The single linkage is the default option used in ClusterData to create hierarchical cluster tree. 3) Hierarchical partitioning technique [10]. CLUTO [20] is a software package for clustering datasets and for analyzing the characteristics of the various clusters. CLUTO contains both
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME 114 partitional and hierarchical clustering algorithms. The repeated bisections method available in CLUTO is a hierarchical partitioning algorithm that produces the required k clusters by performing a sequence of k − 1 repeated bisections. In this approach, the input dataset is first clustered into two groups, and then one of these groups is selected and bisected further. This process continuous until the desired number of clusters is found. This effectively is the bisect K-means divisive clustering algorithm and is the default option in CLUTO named as Cluto-rb. Recall and Precision [21] has been used to evaluate the quality of clustering achieved. Recall is defined as the proportion of data points that have been correctly put into a cluster among all the relevant points that should have been in that cluster. Precision is defined as the proportion of data points that have been correctly put into a cluster among all the points put into that cluster. Recall = Data points correctly put into a cluster / All the points that should have been in that cluster Precision = Data points correctly put into a cluster / total points put into that cluster The comparison of the quality of the clustering achieved with DBCA and other algorithms is made in terms of the percentage accuracy of grouping of objects in correct clusters. The recall accuracy is calculated by dividing the number of relevant data points correctly put in a particular group (by running the clustering algorithm) by the total number of data points that should have ideally been in that group. The precision accuracy is calculated by dividing the number of relevant data points correctly put in a particular group (by running the clustering algorithm) by the total number of all data points put in that group. A. Experiments on Synthetic Dataset R2400, R12000 and R24000 datasets We have created the synthetic datasets by generating random data points in 2 dimensions. We have generated four synthetic datasets with 2400, 12000 and 24000 records in 2 dimensions. Let the names of these datasets are R2400, R12000 and R24000 respectively. These random data points are generated and distributed among 12 groups. Equal numbers of data points are generated in each group. The range of points in the first group is chosen from [0,100] to [200,300]. The range of points in the second group is [0,700] to [200,900]. The range of points in the third group is from [0, 1300] to [200, 1500]. The range in the fourth group is from [0, 1900] to [200, 2100]. The range of points in the fifth group is from [0, 2500] to [200, 2700]. The range of points in the sixth group is from [0, 3100] to [200, 3300]. The range of points in the seventh group is from [0, 3700] to [200, 3900]. The range of points in the eighth group is from [0, 4300] to [200, 4500]. The range of points in the ninth group is from [0, 4900] to [200, 5100]. The range of points in the tenth group is from [0, 5500] to [200, 5700]. The range of points in the eleventh group is from [0, 6100] to [200, 6300], and finally the range of points in the twelfth group is from [0, 6700] to [200, 6900]. The recall and precision for creating twelve clusters using K-means, ClusterData, Cluto-rb and the DBCA algorithm is found for datasets to see how correctly the data points are put into a cluster. The results of clustering using these algorithms are shown in Table I. The graph plotted for average recall in percentage using these techniques is shown in Fig. 2 and the graph plotted for average precision in percentage using these techniques is shown in Fig. 3. As can be seen from the results, DBCA algorithm performs 100 % correct clustering and is far better than the K-means and the Cluto-rb algorithm.
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME 115 TABLE I: Average Recall and Average Precision for synthetically generated datasets Fig. 2. Average Recall for synthetic datasets. Fig. 3. Average Precision for synthetic datasets. R15 dataset R15 [22] is a 2 dimensional shape dataset with 600 vectors. These 600 vectors belong to 15 classes with 40 vectors per class. These 15 clusters of 40 points each are positioned in two rings (7 clusters in each ring) around a central cluster. The dataset is downloaded from the website [23]. We have formed 15 clusters of these images using K-means, ClusterData, Cluto-rb and DBCA algorithm. The average recall and precision using these algorithms is determined and shown in Table II. The graph plotted for average recall in percentage using these techniques is shown in Fig. 2 and the graph plotted for average precision in percentage using these techniques is shown in Fig. 3. The results show that the recall and precision of DBCA algorithm is better than the recall and precision of K-means, ClusterData and Cluto-rb. Hence the DBCA algorithm produces better quality clusters. 0 20 40 60 80 100 120 K-means ClusterData Cluto-rb DBCA A v g r e c a l l 0 20 40 60 80 100 120 K-means ClusterData Cluto-rb DBCA A v g p r e c i s i o n Algorithm R2400 R12000 R24000 Recall (%) Precision (%) Recall (%) Precision (%) Recall (%) Precision (%) DBCA 100 100 100 100 100 100 K-means 78.21 91.67 75.13 75 65.83 65.83 ClusterData 100 100 100 100 100 100 Cluto-rb 22.67 26.03 21.63 25.92 18.5 17.83
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME 116 TABLE II: Average Recall and Average Precision for R15 shape datasets Algorithm Average Recall (%) Average Precision (%) DBCA 39.87 99.67 K-means 32.47 78.88 ClusterData 28.93 74.36 Cluto-rb 25.80 62.77 B. Experiments on real web datasets The proposed algorithm DBCA has been implemented by us on web datasets of text and images. We have first processed and described the image datasets in terms of features that are inherent in the images themselves. The features extracted from the images are smaller in size as compared to the original image data and are being used in place of the images themselves for mining. These are represented in the form of real–valued components called the feature vectors. The color moments are used by us as the property in the feature extraction technique [24]. K-means, ClusterData and the Cluto-rb and the DBCA algorithm are applied to cluster the real datasets. The image datasets are clustered over the extracted features. Recall and Precision of the clusters formed using these techniques are calculated to determine and compare the quality of clustering. The datasets and experimental results achieved using K-means, ClusterData, Cluto-rb and DBCA are described below. TABLE III: Average Recall and Average Precision for real datasets Algorithm Corel (Wang) Wine Recall (%) Precision (%) Recall (%) Precision (%) DBCA 100 100 41.67 72 K-means 60.7 86.71 33.67 61 ClusterData 100 100 25.37 47 Cluto-rb 33.29 27.91 39.67 69 Corel (Wang) Dataset Corel (Wang) database consists of 700 images that are a subset of 1,000 images of the Corel stock photo database which have been manually selected and which form 10 classes of 100 images each. This database is downloaded from website [25] and contains 1025 features per image. We have formed 10 clusters of these images using K-means, ClusterData, Cluto-rb and DBCA algorithm. The average recall and precision using these algorithms is determined and shown in Table III. The graph plotted for average recall in percentage using these techniques is shown in Fig. 4 and the graph plotted for average precision in percentage using these techniques is shown in Fig. 5. The results show that the recall and precision of DBCA are better than the recall and precision of K-means, ClusterData and Cluto-rb. Hence the DBCA algorithm produces better quality clusters. Wine Dataset The wine recognition dataset [26] results from the chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of classes of wines. The dataset consists of 178 instances. The number of instances in class 1, class2 and class3 are 59, 71 and 48 respectively. The wine dataset is downloaded from the website [27].
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME 117 We have formed 3 clusters of these images using K-means, ClusterData, Cluto-rb and DBCA algorithm. The average recall and precision using these algorithms is determined and shown in Table 3. The graph plotted for average recall in percentage using these techniques is shown in Figure 4 and the graph plotted for average precision in percentage using these techniques is shown in Figure 5. The results show that the recall and precision of DBCA are better than the recall and precision of K-means, ClusterData and Cluto-rb. Hence the DBCA algorithm produces better quality clusters. Fig. 4. Average Recall for real datasets. Fig.5. Average Precision for real datasets. V. CONCLUSION The quality of clustering of K-means algorithm depends on the initialization of k data points. These k initial centroids are chosen randomly by the K-means algorithm and this is the major weakness of K-means. The clustering algorithm proposed in this paper is based on the assumption that good clusters are formed when the choice of initial k centroids is such that they are as far as possible from each other. DBCA algorithm proposed here is based on computing the total distance of a node from all other nodes. The algorithm is tested on both synthetic database and real database from web. The experimental results show the effectiveness of the idea and the algorithm produce better quality clusters than the partitional clustering algorithm, agglomerative hierarchical clustering algorithm and the hierarchical partitioning clustering algorithm. 0 20 40 60 80 100 120 Corel (Wang) Wine K-Means ClusterData Cluto-rb DBCA 0 20 40 60 80 100 120 Corel (Wang) Wine K-Means ClusterData Cluto-rb DBCA
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME 118 REFERENCES [1] Margaret Dunham, Data Mining – Introductory and advanced concepts, Pearson Education, 2006. [2] J. B. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1967, pp. 281-297. [3] S. Lloyd, “Least Squares Quantization in PCM,” IEEE Transactions on Information Theory, Vol. 28(2), pp. 129-137, 1982. [4] S. S. Khan and A. Ahmad, “Cluster center initialization algorithm for k-means algorithm,” Pattern Recognition Letters, Vol. 25(11), pp. 1293-1302, 2004. [5] Anil K. Jain and Richard C. Dubes, Algorithm for Clustering in Data, Prentice Hall, 1988. [6] B. Larsen and C. Aone, “Fast and Effective Text Mining using Linear-Time Document Clustering,” in Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data, San Diego, CA, USA, 1999, pp. 16-22. [7] Rui Xu and Donald Wunsch, “Survey of clustering algorithms,” IEEE Transactions on Neural Networks, Vol. 16(3), pp. 645-678, 2005. [8] M. Steinbach, G. Karypis and Vipin Kumar, “A comparison of document clustering techniques,” KDD Workshop on Text Mining, Boston, MA, 2000, pp. 109-111. [9] Ying Zhao and George Karypis, “Evaluation of Hierarchical Clustering Algorithms for Document Datasets,” in Proceedings of the 11th International Conference on Information and Knowledge Management, New York, USA, ACM press, 2002, pp. 515-524. [10] Keerthiram Murugesam and Jun Zhang, “Hybrid hierarchical clustering: An experimental analysis,” Technical Report, No. CMIDA-HiPSCCS 001-11, Department of Computer Science, University of Kentucky, Lexington, KY, 2011. [11] P. S. Bradley and U. M. Fayyad, “Refining initial points for K-Means clustering,” in Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, 1998, pp. 91-99. [12] A. Likas, N. Vlassis and J. J. Verbeek, “The global k-means clustering algorithm,” Journal of Pattern Recognition, Vol. 36(2), pp. 451-461, 2003. [13] Fang Yuan, Zeng-Hui Meng, Hong-Xia Zhang and Chun-Ru Dong, “A new algorithm to Get the Initial Centroids,” in Proceedings of the 13th International Conference on Machine Learning and Cybernetics, Shanghai, 2004, pp. 26-29. [14] A. R. Barakbah and A. Helen, “Optimized K-means: An algorithm of initial centroids optimization for K-means,” in Proceedings of Soft Computing, Intelligent Systems and Information Technology (SIIT), Surabaya, Indonesia, 2005, pp. 263-266. [15] A. M. Fahim, A. M. Salem, F. A. Torkey and M. A. Ramadan, “An efficient enhanced k-means clustering algorithm,” Journal of Zhejiang University Science, Vol. 7(10), pp 1626-1633, 2006. [16] A. R. Barakbah and K. Arai, “Hierarchical K-means: An algorithm for centroids initialization for K-means,” Reports of the faculty of Science & Engineering, Saga University, Japan, Vol. 36(1), 2007. [17] S. Deelers and S. Auwatanamongkol, “Engineering k-means algorithm with initial cluster centers derived from data partitioning along the data axis with the highest variance,” in Proceedings of the World Academy of Science, Engineering and Technology, Vol. 26, 2007, pp. 323-328. [18] A. R. Barakbah and Y. Kiyoki, “A Pillar Algorithm for K-means optimization by distance maximization for initial centroid designation,” IEEE Symposium on Computational Intelligence and Data Mining (IDM), Nashville-Tennessee, 2009, pp. 61-68.
  • 11. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 109-119 © IAEME 119 [19] http://www.mathworks.in/products/matlab/ [20] George Karypis, “Cluto: A Clustering Toolkit. Release 2.1.1,” Technical Report, No. 02-017, University of Minnesota, Department of Computer Science, Minneapolis, MN 55455. www.cs.umn.edu/~karypis/cluto, 2003. [21] Gerald Kowalski, Information Retrieval Systems – Theory and Implementation, Kluwer Academic Publishers, 1977. [22] C. J. Veenman, M. J. T. Reinders and E. Backer, “A maximum variance cluster algorithm,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24(9), pp. 1273-1280, 2002. [23] http://cs.joensuu.fi/sipu/datasets/ [24] M. Stricker and M. Orengo, “Similarity of color images,” in Proceedings of Storage and Retrieval for Image and Video Databases III (SPIE), San Jose, CA, USA, 1995, pp. 381-392. [25] http://users.dsic.upv.es/~rparedes/english/research/rise/MiPRCV-WP6-RF/ [26] S. Aeberhard, D. Coomans and O. de Vel, “Comparison of Classifiers in High Dimensional Settings,” Technical Report, Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland, Vol. 92-02, 1992. [27] http://archive.ics.uci.edu/ml/datasets/Wine AUTHORS DETAILS NEETI ARORA was born in Bhopal on 25th November 1982. She received her master’s degree in computer applications from Rajiv Gandhi Technical University, Madhya Pradesh, India in 2007. She has served as LECTURER in computer application department at Laxmi Narayan College of Technology for 3 years. Currently she is PhD student at Rajiv Gandhi Technical University, Madhya Pradesh, India. She has published 2 research papers in International Journals. Her Research Interests are Data Mining and Databasse Management Systems. Dr. MAHESH MOTWANI was born in Bhopal on 4th May 1969. He received his Bachelor’s degree in Computer Engineering from MANIT Bhopal India in 1990, Master’s degree in Computer Engineering from I.I.T. Delhi in 1998 and PhD in Computer Engineering from Rajiv Gandhi Technical University, Madhya Pradesh, India in 2007. He has teaching experience of about 21 years Before joining teaching in the year 1994 he served as scientist in National Informatics Center, Government of India. He has served as Head of Computer science and Engineering department at Government Engineering college, Bhopal for 7 years and as Head of Information Technology department at Rajiv Gandhi Technical University, Madhya Pradesh, India for 3 years. Currently he is Deputy Secretary in the Technical Board at Rajiv Gandhi Technical University, Madhya Pradesh, India. He has published around 27 research papers in national and international journals and 20 papers in national and international conferences. He is also author of one book on “Cryptography and Network Security”. His Research Interests are Data Mining, Database Management Systems, Network Security, Computer Graphics and Image Processing. He is life member of ISTE and senior member of CSI.