SlideShare une entreprise Scribd logo
1  sur  7
Télécharger pour lire hors ligne
Feb. 28                                              IJASCSE, Vol 2, Issue 1, 2013




  Improved Performance of Unsupervised Method
             by Renovated K-Means
           P.Ashok                    Dr.G.M Kadhar Nawaz                       E.Elayaraja                        V.Vadivel
      Research Scholar,               Department of Computer              Department of Computer            Department of Computer
    Bharathiar University,                  Application                           Science                           Science
        Coimbatore                  Sona College of Technology,         Periyar University, Salem-11      Periyar University, Salem-11
     Tamilnadu, India.                Salem, Tamilnadu, India                 Tamilnadu, India                  Tamilnadu, India



 Abstract: Clustering is a separation of data into groups of                 This is called the distance-based clustering. Another kind
 similar objects. Every group called cluster consists of objects that   of clustering is conceptual clustering. Two or more objects
 are similar to one another and dissimilar to objects of other          belong to the same cluster if it defines a concept common to
 groups. In this paper, the K-Means algorithm is implemented by         all that objects. In other words, objects are grouped according
 three distance functions and to identify the optimal distance          to their relevance to descriptive concepts, not according to
 function for clustering methods. The proposed K-Means algorithm
 is compared with K-Means, Static Weighted K-Means (SWK-
                                                                        simple similarity measures.
 Means) and Dynamic Weighted K-Means (DWK-Means)                            B. Goals of Clustering
 algorithm by using Davis Bouldin index, Execution Time and
 Iteration count methods. Experimental results show that the                The goal of clustering is to determine the intrinsic
 proposed K-Means algorithm performed better on Iris and Wine           grouping in a set of unlabeled data. The main requirements
 dataset when compared with other three clustering methods.             that a clustering algorithm should satisfy are:
                                                                                scalability
                          I. INTRODUCTION                                       Dealing with different types of attributes.
A. Clustering
                                                                                Discovering clusters with arbitrary shape.
    Clustering [9] is a technique to group together a set of
 items having similar characteristics. Clustering can be                        Ability to deal with noise and outliers.
 considered the most important unsupervised learning
 problem. Like every other problem of this kind, it deals with                  Insensitivity to order of input records.
 finding a structure in collection of unlabeled data. A loose                   High dimensionality and
 definition of clustering could be “the process of organizing
 objects into groups whose members are similar in some                          Interpretability and usability.
 way”. A cluster is therefore a collection of objects which are         Clustering has number of problems. Few among them are
 “similar” between them and are “dissimilar” to the objects             listed below.
 belonging to other clusters. We can show this with a simple
 graphical example.                                                             Current clustering techniques do not address all the
                                                                                 requirements adequately (and concurrently).
                                                                                Dealing with large number of dimensions and large
                                                                                 number of data items can be problematic because of
                                                                                 time complexity.
                                                                                The effectiveness of the method depends on the
                                                                                 definition of “distance” (for distance-based
                                                                                 clustering) and
                                                                                If an obvious distance measure doesn’t exist we
                                                                                 must “define” it, which is not always easy,
                                                                                 especially in multi-dimensional spaces.


                                                                            A large number of techniques have been proposed for
                                                                        forming clusters from distance matrices. Hierarchical
                                                                        techniques, optimization techniques and mixture model are
                      Figure 1. cluster analysis
                                                                        the most important types.
     In this case, we easily identify the 4 clusters into which
 the data can be divided. The similarity criterion is distance of
 two or more objects belong to the same cluster if they are                 We discuss the first two types here. We will discuss
 “close” according to a given distance (in the case of                  mixture models in a separate note that includes their use in
 geometrical distance).                                                 classification and regression as well as clustering.

 www.ijascse.in                                                                                                                Page 41
Feb. 28                                            IJASCSE, Vol 2, Issue 1, 2013

                                                                    function defined either locally (on a subset of the patterns) or
                                                                    globally (defined over all of the patterns). Combinatorial
                                                                    search of the set of possible labeling for an optimum value of
                                                                    a criterion is clearly computationally prohibitive. In practice,
                                                                    therefore, the algorithm is typically run multiple times with
                                                                    different starting states, and the best configuration obtained
                                                                    from all of the runs issued as the output clustering.
                                                                        D. Applications of Clustering
                                                                        Clustering algorithms can be applied in many fields, such
                                                                    as the ones that are described below
                                                                            Marketing: Finding groups of customers with
                                                                             similar behavior gives a large database of customer
                                                                             data containing their properties and past buying
                                                                             records.
                                                                            Biology: Classification of plants and animals given
              Figure2. Taxonomy of Clustering Approaches                     their features given.
    At a high level, we can divide clustering algorithms into               Libraries: Book ordering
 two broad classes as mentioned in the below section.                       Insurance: Identifying groups of motor insurance
                                                                             policy holders with a high average claim cost
 C. Clustering Methods                                                       frauds.
                                                                            City-planning: Identify groups of houses according
     1) Hierarchical Clustering [3]                                          to their house type, value and geographical location.
     We begin assuming that each point is a cluster by itself.              Earthquake studies: Clustering observed earthquake
 We repeatedly merge nearby clusters, using some measure of                  epicenters to identify dangerous zones and
 how close two clusters are (e.g., distance between their                   WWW: Document classification, clustering weblog
 centroids), or how good a cluster the resulting group would                 data to discover groups of similar access patterns.
 be (e.g., the average distance of points in the cluster from the
                                                                        The rest of this paper is organized as follows: section II
 resulting centroids).
                                                                    describes three different Distance functions, Execution Time
     A hierarchical algorithm [8] yields a dendogram,               method and the clustering algorithms viz. for K-Means,
 representing the nested grouping of patterns and similarity        Weighted K-Means and Proposed K-Means Clustering
 levels at which groupings change. The dendogram can be             algorithms have been studied and implemented. Section III
 broken at different levels to yield different clustering of the    presents the experimental analysis conducted on various data
 data. Most hierarchical clustering algorithms are variants of      sets of UCI data repository and section IV concludes the
 the single-link, complete-link, and minimum-variance               paper.
 algorithms. The single-link and complete link algorithms are
 most popular. These two algorithms differ in the way they                    II. Partitional Clustering Algorithms
 characterize the similarity between a pair of clusters. In the
 single-link method, the distance between two clusters is the        A. K-Means Algorithm
 minimum of the distances between all pairs of patterns drawn
                                                                        The K-Means [2] algorithm is an iterative procedure for
 from the two clusters (one pattern from the first cluster, the
                                                                    clustering which requires an initial classification of data. It
 other from the second). In the complete-link algorithm, the
                                                                    computes the center of each cluster, and then computes new
 distance between two clusters is the minimum of all pair wise
                                                                    partitions by assigning every object to the cluster whose
 distances between patterns in the two clusters. In either case,
                                                                    center is the closest to that object. This cycle is repeated
 two clusters are merged to form a larger cluster based on
                                                                    during a given number of iterations or until the assignment
 minimum distance criteria. The complete-link algorithm
                                                                    has not changed during one iteration. This algorithm is based
 produces tightly bound or compact clusters. The single-link
                                                                    on an approach where a random set of cluster base is
 algorithm, by contrast, suffers from a chaining effect. It has a
                                                                    selected from the original dataset, and each element update
 tendency to produce clusters that are straggly or elongated.
                                                                    the nearest element of the base with the average of its
 The clusters obtained by the complete link algorithm are
                                                                    attributes. The K-Means is possibly the most commonly-used
 more compact than those obtained by the single-link
                                                                    clustering algorithm. It is most effective for relatively smaller
 algorithm.
                                                                    data sets. The K-Means finds a locally optimal solution by
     2) Partitional Clustering                                      minimizing a distance measure between each data and its
                                                                    nearest cluster center. The basic K-Means algorithm is
          A Partitional clustering [9] algorithm obtains a          commonly measured by any of intra-cluster or inter-cluster
 single partition of the data instead of a clustering structure,    criterion. A typical intra-cluster criterion is the squared-error
 such as dendogram produced by a hierarchical technique.            criterion. It is the most commonly used and a good measure
 Partitional methods have advantages in applications                of the within-cluster variation across all the partitions. The
 involving large data sets for which the construction of a          process iterates through the following steps:
 dendogram is computationally prohibitive. A problem
 accompanying the use of a Partitional algorithm is the choice              Assignment of data to representative centers upon
 of the number of desired output clusters. The Partitional                   minimum distance and
 technique usually produce clusters by optimizing a criterion               Computation of the new cluster centers.

 www.ijascse.in                                                                                                            Page 42
Feb. 28                                           IJASCSE, Vol 2, Issue 1, 2013

                                                                    which initially represents a cluster mean or center. In the
                                                                    selecting centroids we calculate the weights using the
    K-Means clustering is computationally efficient for large       Weighted K-Means algorithm. For each of the remaining
 data sets with both numeric and categorical [3] attributes.        objects, an object is assigned to the cluster to which it is the
                                                                    most similar, based on the distance between objects and
                                                                    centroids with weights of the corresponding object. It then
 1) Working Principle                                               computes the new mean for each cluster. This process iterates
     The K-Means [9] algorithm works as follows. First, it          until the criterion function converge. Typically, the
 iteratively selects k of the objects, each of which initially      Euclidean distance is used in the clustering process. In the
 represents a cluster mean or center. For each of the remaining     Euclidean distance, we can calculate the Dynamic weights
 objects, an object is assigned to the cluster to which it is the   based on the particular centroids.
 most similar, based on the distance between the object and            Calculate the Sum                                         and
 the cluster mean. It then computes the new mean for each
 cluster. This process iterates until the criterion function                            where j=1, 2, …, n.
 converges. Typically, the Euclidean distance is used.
                                                                             is corresponding weight vector to the

    Algorithm 1: K-Means
     1.   Initialize the number of clusters k.                          Using this equation we can calculate the distance between
                                                                    the centroids and the weights.
     2.   Randomly selecting the centroids in the given
          dataset (            .                                       The weighted K-Means clustering algorithms are
                                                                    explained in the following section. In the static weighted K-
     3.   Compute the distance between the centroids and            Means, the weight is fixed 1.5 as constant but the dynamic
          objects using the Euclidean Distance equation.            weight is calculated by the above equation and the weighted
                                                                    K-Means clustering algorithm is explained in algorithm 2.

     4.   Update the centroids.
                                                                       Algorithm 2: Weighted K-Means
     5.   Stop the process when the new centroids are
          nearer to old one. Otherwise, go to step-3.                  Steps:
                                                                        1.   Initialize the number of clusters k.
                                                                        2.   Randomly selecting the centroids (              in
     B. Weighted K-Means Clustering Algorithm                                the data set.
     Weighted K-Means [9] algorithm is one of the clustering            3.   Calculating the weights       of the corresponding
 algorithms, based on the K-Means algorithm calculating with                 centroids                 ).
 weights. This algorithm is same as normal K-Means                      4.   Calculate Sum
 algorithm just adding with weights. Weighted K-Means
 attempts to decompose a set of objects into a set of disjoint               and                          where j=1, 2… n.
 clusters taking into consideration the fact that the numerical              Where        is corresponding weight vector to
 attributes of objects in the set often do not come from                     the .
 independent identical normal distribution. Weighted K-                 5.   Find the distance between the centroids using the
 Means algorithms are iterative and use hill-climbing to find                Euclidean Distance equation
 an optimal solution (clustering) and thus usually gives                                         dij =
 converge to a local minimum.
                                                                        6.   Update the centroids Stop the process when the
    First, calculate the weights for the corresponding                       new centroids are nearer to old one. Otherwise,
 centroids in the data set and then calculate the distance                   go to step-4.
 between the object and the centroid with the weights of the
 centroids. This method is called the Weighted K-Means
 algorithm.                                                             C. Proposed K-Means
     In the Weighted K-Means algorithm the weights can be
                                                                     In the proposed method, first, it determines the initial cluster
 classified into two types as follows.
                                                                     centroids by using the equation which is given in the
         Dynamic Weights: In the dynamic weights, the               following algorithm 3. The Proposed K-Means algorithm is
 weights are determined during the execution time which can          improved by selecting the initial centroids manually instead
 be changed at runtime.                                              of selecting centroids by randomly. It selects ‘K’ objects
         Static Weights: In the static weights, the weights         and each of which initially represents a cluster mean or
 are not changed during the runtime.                                 centroids. For each of the remaining objects, an object is
                                                                     assigned to the cluster to which it is the most similar based
     2) Working Principle                                            on the distance between the object and the cluster mean. It
                                                                     then computes the new mean for each cluster. This process
    The Weighted K-Means [9] algorithm works as defined              iterates until the criterion function converges. In this paper
 below. First, it iteratively selects K of the objects, each of

 www.ijascse.in                                                                                                            Page 43
Feb. 28                                            IJASCSE, Vol 2, Issue 1, 2013


  the Proposed K-Means algorithm is implemented instead of
  traditional K-Means as explained in the algorithm 3.
                                                                         The dispersion dpi can be seen as a measure of the radius
                                                                     of Pi,
    Algorithm 3: Proposed K-Means


    Steps:
     1.   Using Euclidean distance as a dissimilarity measure,
          compute the distance between every pair of all                  Where ni is the number of objects in the ith cluster. Vi
          objects as follow.                                         is the centroid of the ith cluster. dvij describes the dissimilarity
                                                                     between Pi and Pj,


                                                                        The corresponding DB index is defined as:
     2.   Calculate Mij to make an initial guess at the centres
          of the clusters


                                                                         c is the number of cluster. Hence the ratio is small if the
                                                                     clusters are compact and far from each other. Consequently
                                                                     Davies-Bouldin index will have a small value for a good
     3.   Calculate                                    (3) at        clustering.
          each object and sort them in ascending order.                  E. Distance Measures
     4.   Select K objects having the minimum value as
          initial cluster centroids which are determined by the          Many clustering methods use distance measures [7] to
          above equation. Arbitrarily choose k data points           determine the similarity or dissimilarity between any pair of
          from D as initial centroids.                               objects. It is useful to denote the distance between two
     5.   Assign each point di to the cluster which has the          instances xi and xj as: d (xi, xj). A valid distance measure
          closest centroid.                                          should be symmetric and obtains its minimum value (usually
     6.   Calculate the new mean for each cluster.                   zero) in case of identical vectors. The distance functions are
     7.   Repeat step 5 and step 6 until convergence criteria        classified into 3 types.
          is met.
                                                                         1) Manhattan Distance
                                                                     The Minkowski distance or the L1 distance is called the
                                                                     Manhattan distance [5] and is described in the below
     D. Cluster validity Measure                                     equation.
     Many criteria have been developed for determining
 cluster validity all of which have a common goal to find the
 clustering which results in compact clusters which are well
 separated. Clustering validity is a concept that is used to         It is also known as the City Block distance. This metric
 evaluate the quality of clustering results. The clustering          assumes that in going from one pixel to the other, it is only
 validity index may also be used to find the optimal number of       possible to travel directly along pixel grid lines. Diagonal
 clusters and measure the compactness and separation of              moves are not allowed.
 clusters.
     1) Davies-Bouldin Index                                             2) Euclidian Distance[5]
                                                                       This is the most familiar distance that we use. To find the
     In this paper, DAVIS BOULDIN index [1 and 6] has                shortest distance between two points (x1, y1) and (x2, y2) in a
 been chosen as the cluster validity measure because it has          two dimensional space that is
 been shown to be able to detect the correct number of
 clusters in several experiments. Davis Bouldin validity is the
 combination of two functions. First, calculates the
 compactness of data in the same cluster and the second,
 computes the separateness of data in different clusters.
     This index (Davies and Bouldin, 1979) is a function of              3) Chebyshev Distance
 the ratio of the sum of within-cluster scatter to between-             The Chebyshev [10] distance between two vectors or
 cluster separation. If dpi is the dispersion of the cluster Pi,     points p and q, with standard coordinates pi and qi,
 and dvij denotes the dissimilarity between two clusters Pi and      respectively.
 Pj , then a cluster similarity matrix FR = { FRij , (i, j) = 1. 2
 …..C} is defined as:


 www.ijascse.in                                                                                                                Page 44
Feb. 28                                                    IJASCSE, Vol 2, Issue 1, 2013


               III. EXPERIMENTAL ANALYSIS AND
                            DISCUSSION
     This section describes, the data sets used to analyze the
 methods studied in sections II and III, which are arranged in
 the form of a list in Table1.
 A. Datasets
    1) Iris
     The iris dataset contains the information about the iris
 flower. The data set contains 150 samples with four
 attributes. The dataset is collected from the location which is
 given in the link. http://archive.ics.uci.edu/ml/ machine-
 learning-databases /iris/ iris.data


     2) Ecoli
     The Ecoli dataset contains protein localization sites
 having 350 samples with 8 attributes. The dataset is collected
 from the location which is given in the link.
 http://archive.ics.uci.edu/ml/machine-learning-databases
 /ecoli/ecoli.data
                                                                                      Figure4. Distance functions analysis chart for K-Means
     3) Yeast
    The yeast dataset contains 1400 samples with 8 attributes.
 The dataset is collected from the location which is given in                  From the figure 4 shows that, the various distance
 the link. http://archive.ics.uci.edu/ml/machine-learning-                 functions for K-Means are studied and compared with the
 databases /yeast/yeast.data                                               dataset “Ecoli”. The K-Means clustering methods are
                                                                           executed with varying cluster centroids K from 2 to 10 with
     4) Wine                                                               the data set. It clearly shows that the Euclidean distance
     The dataset contains the information about to determine               function obtained the minimum index values for most of the
 the origin of wines. It contains 200 samples with 13 attributes           different clusters values. Hence the Euclidean distance
 and the dataset is collected from the location which is given             function is better for clustering algorithms than other
 in the link. http://archive.ics.uci.edu/ml/machine-learning-              distance functions.
 databases/ yeast/wine.data
                                                                           2) Performance of Clustering Algorithms
     B. Comparative Study and Performance analysis
     The four clustering algorithms are K-Means, Static                       The four algorithms are studied in the section II which are
 Weighted K-Means (SWK-Means), Dynamic Weighted K-                        implemented by the software MATLAB 2012 (a). All the
 Means (DWK-Means) and Proposed K-Means that are used                     methods are executed and compared by Ecoli, Iris, Yeast and
 to clustering the data sets. To access the quality of the                Wine dataset. The Davis Bouldin index is used to determine
 clusters, the Davis Bouldin measure has been used.                       the performance of the clustering algorithms. The results are
                                                                          obtained from the various clustering algorithms and are listed
     1) Performance of Distance functions                                 in the Table II below.
     The K-Means clustering method is executed by three
 different distance functions are Manhattan, Euclidean and                           TABLE II.       Davis Bouldin index analysis
 chebyshev with iris dataset are used and select the centroid
 value (K) from 2 to 10. The obtained results are listed in the
 table I given below.                                                                                       Davis Bouldin Index
                 TABLE I.      Distance Functions                                        Data                Static
                                                                               S.No                                    Dynamic
                                                                                          set                Weight
                                                                                                   K-                   Weight      Proposed
                                      Distance Functions                                                      K-
                                                                                                  Means                K-Means      K-Means
          S.No   Clusters                                                                                    Means
                            Manhattan     Euclidean   Chebyshev
                                                                                 1       Ecoli     0.71       0.95       0.84         0.65
           1        2         0.510         0.524          0.658
                                                                                 2        Iris     0.65       0.49       0.53         0.73
           2        3         0.653         0.416          0.548
                                                                                 3       Yeast     0.86       0.79       0.84         0.66
           3        4         0.758         0.634          0.811
           4        5         0.912         0.589          0.987                 4       Wine      0.55       0.74       0.68         0.45

           5        6         0.933         0.712          1.023
           6        7         0.847         0.689          0.956
           7        8         0.935         0.881          1.095
           8        9         0.874         0.643          0.912
           9       10         0.901         0.773          0.845


 www.ijascse.in                                                                                                                        Page 45
Feb. 28                                                   IJASCSE, Vol 2, Issue 1, 2013

                                                                             From the figure5 shows that, the performance of the four
                                                                         clustering algorithms are executed by the Iris dataset with
                                                                         varying the cluster centroids from 3 to 15. The proposed K-
                                                                         Means algorithm obtained the minimum execution time for
                                                                         most of the clustering centroids. The SWK-Means clustering
                                                                         algorithm does not produce minimum execution time for all
                                                                         various K values, but the other two algorithms produce
                                                                         minimum execution time for some K centroids values. Hence
                                                                         the proposed K-Means clustering executed in the minimum
                                                                         execution time and performed better than other 3 algorithms.


                                                                              4) Iteration Count Analysis
                                                                           The four Clustering algorithms are compared for their
                                                                         performance using Iteration count method with the Wine
                                                                         dataset. The Iteration count is defined as that the number of
               Figure3. Clustering methods chart for various dataset     times the clustering algorithm is executed until the
                                                                         convergence criteria is met. The cluster centres are increased
     From the figure3 shows that, the four clustering                    each time by 3 and the number of iterations for each
 algorithms are executed by the four different data set called           clustering algorithms are obtained and listed in the below
 Ecoli, Iris, Yeast, Wine with the constant Cluster Centroid             Table IV.
 (K) whose value is 5. The proposed K-Means algorithm
 obtained the minimum DB index values for the Ecoli, Wine
 data set also obtained next minimum index which is more                           TABLE IV.         Iteration count Analysis
 than that of other algorithm. Hence the proposed K-Means
 clustering algorithm obtained good clustering results.                                                      Iteration Level
                                                                             S.   Cluster                 Static      Dynamic
                                                                             No                K-         Weight       Weight      Proposed
     3) Execution Time Measure                                                                Means      K-Means      K-Means      K-Means
                                                                             1       2          8           7           11            6
     Different Clustering algorithms are compared for their                  2       3          6           8            5            7
 performances using the time required to cluster the dataset.                3       4         14          18           15            9
 The execution time is varying while selecting the number of                 4       5          9          13           11            8
 initial cluster centroids. The execution time is increased                  5       6          4           8           11            6
 when the number of cluster centroid is increased. The                       6       7         15          13           17            11
 obtained results are depicted in the following Table III                    7       8         13          10           15            6
                                                                             8       9          7           6           11            10
                   TABLE III.        Execution Time
                                                                             9       10         14          17           14            8
                                        Execution Time
                                            (in sec)
       S.No       Cluster              Static   Dynamic
                                       Weight    Weight     Refined
                              K-
                                        K-        K-          K-
                             Means
                                       Means     Means      Means
          1          3        1.54      1.66      1.45       1.31
          2          6        1.95      2.87      2.28       2.05
          3          9        3.78      4.15      3.94       3.45
          4         12        4.45      5.12      4.18       3.94
          5         15         5.7      6.45      5.48       4.71




                                                                                  Figure6. Iteration levels chart for Clustering Algorithms



              Figure5. Execution Time chart for clustering algorithms

 www.ijascse.in                                                                                                                      Page 46
Feb. 28                                                   IJASCSE, Vol 2, Issue 1, 2013

                                                                              [9]   Thangadurai, K. e.a. 2010. A Study On Rough Clustering. In
                                                                                    «Global Journal of Computer Science and Technology», Vol. 10,
     From the figure6, the iteration levels are identified by the                   Issue 5.
 four algorithms by setting the cluster centroid from 2 to 10
 with the yeast dataset. The K-Means and proposed K-Means                     [10] http://en.wikipedia.org/wiki/Chebyshev_distance
 clustering methods performed well in minimum iterations.
 The Proposed K-Means method performed better than K-
 Means for most of the Clustering centroids (K) values.


                      IV.       CONCLUSION
     In this paper, the four different Clustering methods in the
 Partitional clustering are studied by applying the K-Means
 algorithm with three different distance functions to find the
 optimal distance function for clustering process. One of the
 demerits of K-Means algorithm is random selection of initial
 centroids of desired clusters. This was overcome by proposed
 K-Means with initial cluster centroid selection process for
 finding the initial centroids to avoid the selecting centroids
 randomly and it produces distinct better results. The four
 clustering algorithms are executed with four different dataset
 but the Proposed K-Means method performs very well and
 obtains minimum DB index value. The Execution time and
 iteration count is compared with the four different clustering
 algorithms and different cluster values. The Proposed K-
 Means achieved less execution time and minimum iteration
 count than K-Means, Static Weighted K-Means (SWK-
 Means), and Dynamic Weighted K-Means (DWK-Means)
 clustering methods. Therefore, the proposed K-Means
 clustering method can be applied in the application area and
 various cluster validity measures can be used to improve the
 cluster performance in our future work.


                              REFERENCES
     [1] Davies & Bouldin, 1979. Davies, D.L., Bouldin, D.W., (2000) “A
           cluster separation measure.” IEEE Trans.Pattern Anal. Machine
           Intell., 1(4), 224-227.

     [2]   HARTIGAN, J. and WONG, M. 1979. Algorithm AS136: “A k-
           means clustering algorithm”. Applied Statistics, 28, 100-108.

     [3]   HEER, J. and CHI, E. 2001. “Identification of Web user traffic
           composition using multimodal clustering and information scent.”,
           1st SIAM ICDM, Workshop on Web Mining, 51-58, Chicago,
           IL.

     [4] JAIN A.K, MURTY M.N. And FLYNN P.J. Data Clustering: A
         Review ACM Computing Surveys, Vol. 31, No. 3, September
         1999.

     [5]   Maleq Khan - Fast Distance Metric Based Data Mining
           Techniques Using P-trees: k-Nearest-Neighbor Classification and
           k-Clustering.

     [6]   Maria Halkidi, Yannis Batistakis, and Michalis Varzirgiannis,
           'On clustering validation techniques', Journal of Intelligent
           Information Systems, 17(2-3), 107–145, (2001).

     [7]   Rokach L O Maimon - Data mining and knowledge discovery
           handbook, 2005 – Springer.

     [8]   Sauravjoyti Sarmah and Dhruba K. Bhattacharyya. May 2010
           “An Effective Technique for Clustering Incremental Gene
           Expression data”, IJCSI International Journal of Computer
           Science Issues, Vol. 7, Issue 3, No. 3.




 www.ijascse.in                                                                                                                       Page 47

Contenu connexe

Tendances

IRJET- An Improved Technique for Hiding Secret Image on Colour Images usi...
IRJET-  	  An Improved Technique for Hiding Secret Image on Colour Images usi...IRJET-  	  An Improved Technique for Hiding Secret Image on Colour Images usi...
IRJET- An Improved Technique for Hiding Secret Image on Colour Images usi...IRJET Journal
 
X trepan an extended trepan for
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan forijaia
 
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...ijesajournal
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
 
A combination of decision tree learning and clustering
A combination of decision tree learning and clusteringA combination of decision tree learning and clustering
A combination of decision tree learning and clusteringChinnapat Kaewchinporn
 
Performance Evaluation of Classifiers used for Identification of Encryption A...
Performance Evaluation of Classifiers used for Identification of Encryption A...Performance Evaluation of Classifiers used for Identification of Encryption A...
Performance Evaluation of Classifiers used for Identification of Encryption A...IDES Editor
 
APPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATA
APPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATAAPPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATA
APPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATAIJDKP
 
FUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDING
FUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDINGFUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDING
FUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDINGIJCSEA Journal
 
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...IJCNCJournal
 
An Empirical Study for Defect Prediction using Clustering
An Empirical Study for Defect Prediction using ClusteringAn Empirical Study for Defect Prediction using Clustering
An Empirical Study for Defect Prediction using Clusteringidescitation
 
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...Happiest Minds Technologies
 
Introduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering EnsembleIntroduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering EnsembleIJSRD
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Editor IJARCET
 
Mining Maximally Banded Matrices in Binary Data
 Mining Maximally Banded Matrices in Binary Data Mining Maximally Banded Matrices in Binary Data
Mining Maximally Banded Matrices in Binary DataFaris Alqadah
 

Tendances (19)

B42010712
B42010712B42010712
B42010712
 
IRJET- An Improved Technique for Hiding Secret Image on Colour Images usi...
IRJET-  	  An Improved Technique for Hiding Secret Image on Colour Images usi...IRJET-  	  An Improved Technique for Hiding Secret Image on Colour Images usi...
IRJET- An Improved Technique for Hiding Secret Image on Colour Images usi...
 
X trepan an extended trepan for
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan for
 
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
A combination of decision tree learning and clustering
A combination of decision tree learning and clusteringA combination of decision tree learning and clustering
A combination of decision tree learning and clustering
 
Performance Evaluation of Classifiers used for Identification of Encryption A...
Performance Evaluation of Classifiers used for Identification of Encryption A...Performance Evaluation of Classifiers used for Identification of Encryption A...
Performance Evaluation of Classifiers used for Identification of Encryption A...
 
APPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATA
APPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATAAPPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATA
APPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATA
 
FUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDING
FUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDINGFUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDING
FUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDING
 
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
 
Dr24751755
Dr24751755Dr24751755
Dr24751755
 
An Empirical Study for Defect Prediction using Clustering
An Empirical Study for Defect Prediction using ClusteringAn Empirical Study for Defect Prediction using Clustering
An Empirical Study for Defect Prediction using Clustering
 
Journal_IJABME
Journal_IJABMEJournal_IJABME
Journal_IJABME
 
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
 
Introduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering EnsembleIntroduction to Multi-Objective Clustering Ensemble
Introduction to Multi-Objective Clustering Ensemble
 
50120140501009
5012014050100950120140501009
50120140501009
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
 
Taskonomy of Transfer Learning
Taskonomy  of Transfer LearningTaskonomy  of Transfer Learning
Taskonomy of Transfer Learning
 
Mining Maximally Banded Matrices in Binary Data
 Mining Maximally Banded Matrices in Binary Data Mining Maximally Banded Matrices in Binary Data
Mining Maximally Banded Matrices in Binary Data
 

En vedette

Comparative analysis of various data stream mining procedures and various dim...
Comparative analysis of various data stream mining procedures and various dim...Comparative analysis of various data stream mining procedures and various dim...
Comparative analysis of various data stream mining procedures and various dim...Alexander Decker
 
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE IJORCS
 
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)IJERD Editor
 
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajjaTrabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajjaadriannaranjo3
 

En vedette (6)

Comparative analysis of various data stream mining procedures and various dim...
Comparative analysis of various data stream mining procedures and various dim...Comparative analysis of various data stream mining procedures and various dim...
Comparative analysis of various data stream mining procedures and various dim...
 
By
ByBy
By
 
50120130406008
5012013040600850120130406008
50120130406008
 
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
 
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
 
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajjaTrabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
 

Similaire à Improved Performance of Unsupervised Method by Renovated K-Means

An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringIDES Editor
 
K means clustering in the cloud - a mahout test
K means clustering in the cloud - a mahout testK means clustering in the cloud - a mahout test
K means clustering in the cloud - a mahout testJoão Gabriel Lima
 
Bs31267274
Bs31267274Bs31267274
Bs31267274IJMER
 
A Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsA Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478IJRAT
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Supervised Machine Learning: A Review of Classification ...
Supervised Machine Learning: A Review of Classification ...Supervised Machine Learning: A Review of Classification ...
Supervised Machine Learning: A Review of Classification ...butest
 
Main single agent machine learning algorithms
Main single agent machine learning algorithmsMain single agent machine learning algorithms
Main single agent machine learning algorithmsbutest
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...IOSR Journals
 
Ijartes v1-i2-006
Ijartes v1-i2-006Ijartes v1-i2-006
Ijartes v1-i2-006IJARTES
 
An Analysis On Clustering Algorithms In Data Mining
An Analysis On Clustering Algorithms In Data MiningAn Analysis On Clustering Algorithms In Data Mining
An Analysis On Clustering Algorithms In Data MiningGina Rizzo
 
Clustering Algorithms.pptx
Clustering Algorithms.pptxClustering Algorithms.pptx
Clustering Algorithms.pptxIssra'a Almgoter
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
 
Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...IRJET Journal
 
Multilevel techniques for the clustering problem
Multilevel techniques for the clustering problemMultilevel techniques for the clustering problem
Multilevel techniques for the clustering problemcsandit
 

Similaire à Improved Performance of Unsupervised Method by Renovated K-Means (20)

An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means Clustering
 
K means clustering in the cloud - a mahout test
K means clustering in the cloud - a mahout testK means clustering in the cloud - a mahout test
K means clustering in the cloud - a mahout test
 
Ir3116271633
Ir3116271633Ir3116271633
Ir3116271633
 
Bs31267274
Bs31267274Bs31267274
Bs31267274
 
cluster.pptx
cluster.pptxcluster.pptx
cluster.pptx
 
A Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsA Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text Documents
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Supervised Machine Learning: A Review of Classification ...
Supervised Machine Learning: A Review of Classification ...Supervised Machine Learning: A Review of Classification ...
Supervised Machine Learning: A Review of Classification ...
 
Main single agent machine learning algorithms
Main single agent machine learning algorithmsMain single agent machine learning algorithms
Main single agent machine learning algorithms
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
 
Ijartes v1-i2-006
Ijartes v1-i2-006Ijartes v1-i2-006
Ijartes v1-i2-006
 
A0310112
A0310112A0310112
A0310112
 
An Analysis On Clustering Algorithms In Data Mining
An Analysis On Clustering Algorithms In Data MiningAn Analysis On Clustering Algorithms In Data Mining
An Analysis On Clustering Algorithms In Data Mining
 
Clustering Algorithms.pptx
Clustering Algorithms.pptxClustering Algorithms.pptx
Clustering Algorithms.pptx
 
Az36311316
Az36311316Az36311316
Az36311316
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
 
Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...
 
Multilevel techniques for the clustering problem
Multilevel techniques for the clustering problemMultilevel techniques for the clustering problem
Multilevel techniques for the clustering problem
 

Plus de IJASCSE

Inter Time Series Sales Forecasting
Inter Time Series Sales ForecastingInter Time Series Sales Forecasting
Inter Time Series Sales ForecastingIJASCSE
 
Independent Component Analysis for Filtering Airwaves in Seabed Logging Appli...
Independent Component Analysis for Filtering Airwaves in Seabed Logging Appli...Independent Component Analysis for Filtering Airwaves in Seabed Logging Appli...
Independent Component Analysis for Filtering Airwaves in Seabed Logging Appli...IJASCSE
 
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...IJASCSE
 
Improving Utilization of Infrastructure Cloud
Improving Utilization of Infrastructure CloudImproving Utilization of Infrastructure Cloud
Improving Utilization of Infrastructure CloudIJASCSE
 
Four Side Distance: A New Fourier Shape Signature
Four Side Distance: A New Fourier Shape SignatureFour Side Distance: A New Fourier Shape Signature
Four Side Distance: A New Fourier Shape SignatureIJASCSE
 
Theoretical study of axially compressed Cold Formed Steel Sections
Theoretical study of axially compressed Cold Formed Steel SectionsTheoretical study of axially compressed Cold Formed Steel Sections
Theoretical study of axially compressed Cold Formed Steel SectionsIJASCSE
 
A Study on the Effectiveness of Computer Games in Teaching and Learning
A Study on the Effectiveness of Computer Games in Teaching and LearningA Study on the Effectiveness of Computer Games in Teaching and Learning
A Study on the Effectiveness of Computer Games in Teaching and LearningIJASCSE
 
Clustering Based Lifetime Maximizing Aggregation Tree for Wireless Sensor Net...
Clustering Based Lifetime Maximizing Aggregation Tree for Wireless Sensor Net...Clustering Based Lifetime Maximizing Aggregation Tree for Wireless Sensor Net...
Clustering Based Lifetime Maximizing Aggregation Tree for Wireless Sensor Net...IJASCSE
 
Design Equation for CFRP strengthened Cold Formed Steel Channel Column Sections
Design Equation for CFRP strengthened Cold Formed Steel Channel Column SectionsDesign Equation for CFRP strengthened Cold Formed Steel Channel Column Sections
Design Equation for CFRP strengthened Cold Formed Steel Channel Column SectionsIJASCSE
 
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...IJASCSE
 
A Study on Thermal behavior of Nano film as thermal interface layer
A Study on Thermal behavior of Nano film as thermal interface layerA Study on Thermal behavior of Nano film as thermal interface layer
A Study on Thermal behavior of Nano film as thermal interface layerIJASCSE
 
Performance analysis of a model predictive unified power flow controller (MPU...
Performance analysis of a model predictive unified power flow controller (MPU...Performance analysis of a model predictive unified power flow controller (MPU...
Performance analysis of a model predictive unified power flow controller (MPU...IJASCSE
 
Synthesis and structural properties of Mg (OH)2 on RF sputtered Mg thin films...
Synthesis and structural properties of Mg (OH)2 on RF sputtered Mg thin films...Synthesis and structural properties of Mg (OH)2 on RF sputtered Mg thin films...
Synthesis and structural properties of Mg (OH)2 on RF sputtered Mg thin films...IJASCSE
 
Evaluation of Exception Handling Metrics
Evaluation of Exception Handling MetricsEvaluation of Exception Handling Metrics
Evaluation of Exception Handling MetricsIJASCSE
 
Investigation of Integrated Rectangular SIW Filter and Rectangular Microstrip...
Investigation of Integrated Rectangular SIW Filter and Rectangular Microstrip...Investigation of Integrated Rectangular SIW Filter and Rectangular Microstrip...
Investigation of Integrated Rectangular SIW Filter and Rectangular Microstrip...IJASCSE
 
An effect of synthesis parameters on structural properties of AlN thin films ...
An effect of synthesis parameters on structural properties of AlN thin films ...An effect of synthesis parameters on structural properties of AlN thin films ...
An effect of synthesis parameters on structural properties of AlN thin films ...IJASCSE
 
Cluster-based Target Tracking and Recovery Algorithm in Wireless Sensor Network
Cluster-based Target Tracking and Recovery Algorithm in Wireless Sensor NetworkCluster-based Target Tracking and Recovery Algorithm in Wireless Sensor Network
Cluster-based Target Tracking and Recovery Algorithm in Wireless Sensor NetworkIJASCSE
 
Analysis and Design of Lead Salt PbSe/PbSrSe Single Quantum Well In the Infra...
Analysis and Design of Lead Salt PbSe/PbSrSe Single Quantum Well In the Infra...Analysis and Design of Lead Salt PbSe/PbSrSe Single Quantum Well In the Infra...
Analysis and Design of Lead Salt PbSe/PbSrSe Single Quantum Well In the Infra...IJASCSE
 
Portfolio Analysis in US stock market using Markowitz model
Portfolio Analysis in US stock market using Markowitz modelPortfolio Analysis in US stock market using Markowitz model
Portfolio Analysis in US stock market using Markowitz modelIJASCSE
 
Stable and Reliable Route Identification Scheme for Efficient DSR Route Cache...
Stable and Reliable Route Identification Scheme for Efficient DSR Route Cache...Stable and Reliable Route Identification Scheme for Efficient DSR Route Cache...
Stable and Reliable Route Identification Scheme for Efficient DSR Route Cache...IJASCSE
 

Plus de IJASCSE (20)

Inter Time Series Sales Forecasting
Inter Time Series Sales ForecastingInter Time Series Sales Forecasting
Inter Time Series Sales Forecasting
 
Independent Component Analysis for Filtering Airwaves in Seabed Logging Appli...
Independent Component Analysis for Filtering Airwaves in Seabed Logging Appli...Independent Component Analysis for Filtering Airwaves in Seabed Logging Appli...
Independent Component Analysis for Filtering Airwaves in Seabed Logging Appli...
 
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
 
Improving Utilization of Infrastructure Cloud
Improving Utilization of Infrastructure CloudImproving Utilization of Infrastructure Cloud
Improving Utilization of Infrastructure Cloud
 
Four Side Distance: A New Fourier Shape Signature
Four Side Distance: A New Fourier Shape SignatureFour Side Distance: A New Fourier Shape Signature
Four Side Distance: A New Fourier Shape Signature
 
Theoretical study of axially compressed Cold Formed Steel Sections
Theoretical study of axially compressed Cold Formed Steel SectionsTheoretical study of axially compressed Cold Formed Steel Sections
Theoretical study of axially compressed Cold Formed Steel Sections
 
A Study on the Effectiveness of Computer Games in Teaching and Learning
A Study on the Effectiveness of Computer Games in Teaching and LearningA Study on the Effectiveness of Computer Games in Teaching and Learning
A Study on the Effectiveness of Computer Games in Teaching and Learning
 
Clustering Based Lifetime Maximizing Aggregation Tree for Wireless Sensor Net...
Clustering Based Lifetime Maximizing Aggregation Tree for Wireless Sensor Net...Clustering Based Lifetime Maximizing Aggregation Tree for Wireless Sensor Net...
Clustering Based Lifetime Maximizing Aggregation Tree for Wireless Sensor Net...
 
Design Equation for CFRP strengthened Cold Formed Steel Channel Column Sections
Design Equation for CFRP strengthened Cold Formed Steel Channel Column SectionsDesign Equation for CFRP strengthened Cold Formed Steel Channel Column Sections
Design Equation for CFRP strengthened Cold Formed Steel Channel Column Sections
 
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
OfdmaClosed-Form Rate Outage Probability for OFDMA Multi-Hop Broadband Wirele...
 
A Study on Thermal behavior of Nano film as thermal interface layer
A Study on Thermal behavior of Nano film as thermal interface layerA Study on Thermal behavior of Nano film as thermal interface layer
A Study on Thermal behavior of Nano film as thermal interface layer
 
Performance analysis of a model predictive unified power flow controller (MPU...
Performance analysis of a model predictive unified power flow controller (MPU...Performance analysis of a model predictive unified power flow controller (MPU...
Performance analysis of a model predictive unified power flow controller (MPU...
 
Synthesis and structural properties of Mg (OH)2 on RF sputtered Mg thin films...
Synthesis and structural properties of Mg (OH)2 on RF sputtered Mg thin films...Synthesis and structural properties of Mg (OH)2 on RF sputtered Mg thin films...
Synthesis and structural properties of Mg (OH)2 on RF sputtered Mg thin films...
 
Evaluation of Exception Handling Metrics
Evaluation of Exception Handling MetricsEvaluation of Exception Handling Metrics
Evaluation of Exception Handling Metrics
 
Investigation of Integrated Rectangular SIW Filter and Rectangular Microstrip...
Investigation of Integrated Rectangular SIW Filter and Rectangular Microstrip...Investigation of Integrated Rectangular SIW Filter and Rectangular Microstrip...
Investigation of Integrated Rectangular SIW Filter and Rectangular Microstrip...
 
An effect of synthesis parameters on structural properties of AlN thin films ...
An effect of synthesis parameters on structural properties of AlN thin films ...An effect of synthesis parameters on structural properties of AlN thin films ...
An effect of synthesis parameters on structural properties of AlN thin films ...
 
Cluster-based Target Tracking and Recovery Algorithm in Wireless Sensor Network
Cluster-based Target Tracking and Recovery Algorithm in Wireless Sensor NetworkCluster-based Target Tracking and Recovery Algorithm in Wireless Sensor Network
Cluster-based Target Tracking and Recovery Algorithm in Wireless Sensor Network
 
Analysis and Design of Lead Salt PbSe/PbSrSe Single Quantum Well In the Infra...
Analysis and Design of Lead Salt PbSe/PbSrSe Single Quantum Well In the Infra...Analysis and Design of Lead Salt PbSe/PbSrSe Single Quantum Well In the Infra...
Analysis and Design of Lead Salt PbSe/PbSrSe Single Quantum Well In the Infra...
 
Portfolio Analysis in US stock market using Markowitz model
Portfolio Analysis in US stock market using Markowitz modelPortfolio Analysis in US stock market using Markowitz model
Portfolio Analysis in US stock market using Markowitz model
 
Stable and Reliable Route Identification Scheme for Efficient DSR Route Cache...
Stable and Reliable Route Identification Scheme for Efficient DSR Route Cache...Stable and Reliable Route Identification Scheme for Efficient DSR Route Cache...
Stable and Reliable Route Identification Scheme for Efficient DSR Route Cache...
 

Dernier

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Dernier (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Improved Performance of Unsupervised Method by Renovated K-Means

  • 1. Feb. 28 IJASCSE, Vol 2, Issue 1, 2013 Improved Performance of Unsupervised Method by Renovated K-Means P.Ashok Dr.G.M Kadhar Nawaz E.Elayaraja V.Vadivel Research Scholar, Department of Computer Department of Computer Department of Computer Bharathiar University, Application Science Science Coimbatore Sona College of Technology, Periyar University, Salem-11 Periyar University, Salem-11 Tamilnadu, India. Salem, Tamilnadu, India Tamilnadu, India Tamilnadu, India Abstract: Clustering is a separation of data into groups of This is called the distance-based clustering. Another kind similar objects. Every group called cluster consists of objects that of clustering is conceptual clustering. Two or more objects are similar to one another and dissimilar to objects of other belong to the same cluster if it defines a concept common to groups. In this paper, the K-Means algorithm is implemented by all that objects. In other words, objects are grouped according three distance functions and to identify the optimal distance to their relevance to descriptive concepts, not according to function for clustering methods. The proposed K-Means algorithm is compared with K-Means, Static Weighted K-Means (SWK- simple similarity measures. Means) and Dynamic Weighted K-Means (DWK-Means) B. Goals of Clustering algorithm by using Davis Bouldin index, Execution Time and Iteration count methods. Experimental results show that the The goal of clustering is to determine the intrinsic proposed K-Means algorithm performed better on Iris and Wine grouping in a set of unlabeled data. The main requirements dataset when compared with other three clustering methods. that a clustering algorithm should satisfy are:  scalability I. INTRODUCTION  Dealing with different types of attributes. A. Clustering  Discovering clusters with arbitrary shape. Clustering [9] is a technique to group together a set of items having similar characteristics. Clustering can be  Ability to deal with noise and outliers. considered the most important unsupervised learning problem. Like every other problem of this kind, it deals with  Insensitivity to order of input records. finding a structure in collection of unlabeled data. A loose  High dimensionality and definition of clustering could be “the process of organizing objects into groups whose members are similar in some  Interpretability and usability. way”. A cluster is therefore a collection of objects which are Clustering has number of problems. Few among them are “similar” between them and are “dissimilar” to the objects listed below. belonging to other clusters. We can show this with a simple graphical example.  Current clustering techniques do not address all the requirements adequately (and concurrently).  Dealing with large number of dimensions and large number of data items can be problematic because of time complexity.  The effectiveness of the method depends on the definition of “distance” (for distance-based clustering) and  If an obvious distance measure doesn’t exist we must “define” it, which is not always easy, especially in multi-dimensional spaces. A large number of techniques have been proposed for forming clusters from distance matrices. Hierarchical techniques, optimization techniques and mixture model are Figure 1. cluster analysis the most important types. In this case, we easily identify the 4 clusters into which the data can be divided. The similarity criterion is distance of two or more objects belong to the same cluster if they are We discuss the first two types here. We will discuss “close” according to a given distance (in the case of mixture models in a separate note that includes their use in geometrical distance). classification and regression as well as clustering. www.ijascse.in Page 41
  • 2. Feb. 28 IJASCSE, Vol 2, Issue 1, 2013 function defined either locally (on a subset of the patterns) or globally (defined over all of the patterns). Combinatorial search of the set of possible labeling for an optimum value of a criterion is clearly computationally prohibitive. In practice, therefore, the algorithm is typically run multiple times with different starting states, and the best configuration obtained from all of the runs issued as the output clustering. D. Applications of Clustering Clustering algorithms can be applied in many fields, such as the ones that are described below  Marketing: Finding groups of customers with similar behavior gives a large database of customer data containing their properties and past buying records.  Biology: Classification of plants and animals given Figure2. Taxonomy of Clustering Approaches their features given. At a high level, we can divide clustering algorithms into  Libraries: Book ordering two broad classes as mentioned in the below section.  Insurance: Identifying groups of motor insurance policy holders with a high average claim cost C. Clustering Methods frauds.  City-planning: Identify groups of houses according 1) Hierarchical Clustering [3] to their house type, value and geographical location. We begin assuming that each point is a cluster by itself.  Earthquake studies: Clustering observed earthquake We repeatedly merge nearby clusters, using some measure of epicenters to identify dangerous zones and how close two clusters are (e.g., distance between their  WWW: Document classification, clustering weblog centroids), or how good a cluster the resulting group would data to discover groups of similar access patterns. be (e.g., the average distance of points in the cluster from the The rest of this paper is organized as follows: section II resulting centroids). describes three different Distance functions, Execution Time A hierarchical algorithm [8] yields a dendogram, method and the clustering algorithms viz. for K-Means, representing the nested grouping of patterns and similarity Weighted K-Means and Proposed K-Means Clustering levels at which groupings change. The dendogram can be algorithms have been studied and implemented. Section III broken at different levels to yield different clustering of the presents the experimental analysis conducted on various data data. Most hierarchical clustering algorithms are variants of sets of UCI data repository and section IV concludes the the single-link, complete-link, and minimum-variance paper. algorithms. The single-link and complete link algorithms are most popular. These two algorithms differ in the way they II. Partitional Clustering Algorithms characterize the similarity between a pair of clusters. In the single-link method, the distance between two clusters is the A. K-Means Algorithm minimum of the distances between all pairs of patterns drawn The K-Means [2] algorithm is an iterative procedure for from the two clusters (one pattern from the first cluster, the clustering which requires an initial classification of data. It other from the second). In the complete-link algorithm, the computes the center of each cluster, and then computes new distance between two clusters is the minimum of all pair wise partitions by assigning every object to the cluster whose distances between patterns in the two clusters. In either case, center is the closest to that object. This cycle is repeated two clusters are merged to form a larger cluster based on during a given number of iterations or until the assignment minimum distance criteria. The complete-link algorithm has not changed during one iteration. This algorithm is based produces tightly bound or compact clusters. The single-link on an approach where a random set of cluster base is algorithm, by contrast, suffers from a chaining effect. It has a selected from the original dataset, and each element update tendency to produce clusters that are straggly or elongated. the nearest element of the base with the average of its The clusters obtained by the complete link algorithm are attributes. The K-Means is possibly the most commonly-used more compact than those obtained by the single-link clustering algorithm. It is most effective for relatively smaller algorithm. data sets. The K-Means finds a locally optimal solution by 2) Partitional Clustering minimizing a distance measure between each data and its nearest cluster center. The basic K-Means algorithm is A Partitional clustering [9] algorithm obtains a commonly measured by any of intra-cluster or inter-cluster single partition of the data instead of a clustering structure, criterion. A typical intra-cluster criterion is the squared-error such as dendogram produced by a hierarchical technique. criterion. It is the most commonly used and a good measure Partitional methods have advantages in applications of the within-cluster variation across all the partitions. The involving large data sets for which the construction of a process iterates through the following steps: dendogram is computationally prohibitive. A problem accompanying the use of a Partitional algorithm is the choice  Assignment of data to representative centers upon of the number of desired output clusters. The Partitional minimum distance and technique usually produce clusters by optimizing a criterion  Computation of the new cluster centers. www.ijascse.in Page 42
  • 3. Feb. 28 IJASCSE, Vol 2, Issue 1, 2013 which initially represents a cluster mean or center. In the selecting centroids we calculate the weights using the K-Means clustering is computationally efficient for large Weighted K-Means algorithm. For each of the remaining data sets with both numeric and categorical [3] attributes. objects, an object is assigned to the cluster to which it is the most similar, based on the distance between objects and centroids with weights of the corresponding object. It then 1) Working Principle computes the new mean for each cluster. This process iterates The K-Means [9] algorithm works as follows. First, it until the criterion function converge. Typically, the iteratively selects k of the objects, each of which initially Euclidean distance is used in the clustering process. In the represents a cluster mean or center. For each of the remaining Euclidean distance, we can calculate the Dynamic weights objects, an object is assigned to the cluster to which it is the based on the particular centroids. most similar, based on the distance between the object and Calculate the Sum and the cluster mean. It then computes the new mean for each cluster. This process iterates until the criterion function where j=1, 2, …, n. converges. Typically, the Euclidean distance is used. is corresponding weight vector to the Algorithm 1: K-Means 1. Initialize the number of clusters k. Using this equation we can calculate the distance between the centroids and the weights. 2. Randomly selecting the centroids in the given dataset ( . The weighted K-Means clustering algorithms are explained in the following section. In the static weighted K- 3. Compute the distance between the centroids and Means, the weight is fixed 1.5 as constant but the dynamic objects using the Euclidean Distance equation. weight is calculated by the above equation and the weighted K-Means clustering algorithm is explained in algorithm 2. 4. Update the centroids. Algorithm 2: Weighted K-Means 5. Stop the process when the new centroids are nearer to old one. Otherwise, go to step-3. Steps: 1. Initialize the number of clusters k. 2. Randomly selecting the centroids ( in B. Weighted K-Means Clustering Algorithm the data set. Weighted K-Means [9] algorithm is one of the clustering 3. Calculating the weights of the corresponding algorithms, based on the K-Means algorithm calculating with centroids ). weights. This algorithm is same as normal K-Means 4. Calculate Sum algorithm just adding with weights. Weighted K-Means attempts to decompose a set of objects into a set of disjoint and where j=1, 2… n. clusters taking into consideration the fact that the numerical Where is corresponding weight vector to attributes of objects in the set often do not come from the . independent identical normal distribution. Weighted K- 5. Find the distance between the centroids using the Means algorithms are iterative and use hill-climbing to find Euclidean Distance equation an optimal solution (clustering) and thus usually gives dij = converge to a local minimum. 6. Update the centroids Stop the process when the First, calculate the weights for the corresponding new centroids are nearer to old one. Otherwise, centroids in the data set and then calculate the distance go to step-4. between the object and the centroid with the weights of the centroids. This method is called the Weighted K-Means algorithm. C. Proposed K-Means In the Weighted K-Means algorithm the weights can be In the proposed method, first, it determines the initial cluster classified into two types as follows. centroids by using the equation which is given in the  Dynamic Weights: In the dynamic weights, the following algorithm 3. The Proposed K-Means algorithm is weights are determined during the execution time which can improved by selecting the initial centroids manually instead be changed at runtime. of selecting centroids by randomly. It selects ‘K’ objects  Static Weights: In the static weights, the weights and each of which initially represents a cluster mean or are not changed during the runtime. centroids. For each of the remaining objects, an object is assigned to the cluster to which it is the most similar based 2) Working Principle on the distance between the object and the cluster mean. It then computes the new mean for each cluster. This process The Weighted K-Means [9] algorithm works as defined iterates until the criterion function converges. In this paper below. First, it iteratively selects K of the objects, each of www.ijascse.in Page 43
  • 4. Feb. 28 IJASCSE, Vol 2, Issue 1, 2013 the Proposed K-Means algorithm is implemented instead of traditional K-Means as explained in the algorithm 3. The dispersion dpi can be seen as a measure of the radius of Pi, Algorithm 3: Proposed K-Means Steps: 1. Using Euclidean distance as a dissimilarity measure, compute the distance between every pair of all Where ni is the number of objects in the ith cluster. Vi objects as follow. is the centroid of the ith cluster. dvij describes the dissimilarity between Pi and Pj, The corresponding DB index is defined as: 2. Calculate Mij to make an initial guess at the centres of the clusters c is the number of cluster. Hence the ratio is small if the clusters are compact and far from each other. Consequently Davies-Bouldin index will have a small value for a good 3. Calculate (3) at clustering. each object and sort them in ascending order. E. Distance Measures 4. Select K objects having the minimum value as initial cluster centroids which are determined by the Many clustering methods use distance measures [7] to above equation. Arbitrarily choose k data points determine the similarity or dissimilarity between any pair of from D as initial centroids. objects. It is useful to denote the distance between two 5. Assign each point di to the cluster which has the instances xi and xj as: d (xi, xj). A valid distance measure closest centroid. should be symmetric and obtains its minimum value (usually 6. Calculate the new mean for each cluster. zero) in case of identical vectors. The distance functions are 7. Repeat step 5 and step 6 until convergence criteria classified into 3 types. is met. 1) Manhattan Distance The Minkowski distance or the L1 distance is called the Manhattan distance [5] and is described in the below D. Cluster validity Measure equation. Many criteria have been developed for determining cluster validity all of which have a common goal to find the clustering which results in compact clusters which are well separated. Clustering validity is a concept that is used to It is also known as the City Block distance. This metric evaluate the quality of clustering results. The clustering assumes that in going from one pixel to the other, it is only validity index may also be used to find the optimal number of possible to travel directly along pixel grid lines. Diagonal clusters and measure the compactness and separation of moves are not allowed. clusters. 1) Davies-Bouldin Index 2) Euclidian Distance[5] This is the most familiar distance that we use. To find the In this paper, DAVIS BOULDIN index [1 and 6] has shortest distance between two points (x1, y1) and (x2, y2) in a been chosen as the cluster validity measure because it has two dimensional space that is been shown to be able to detect the correct number of clusters in several experiments. Davis Bouldin validity is the combination of two functions. First, calculates the compactness of data in the same cluster and the second, computes the separateness of data in different clusters. This index (Davies and Bouldin, 1979) is a function of 3) Chebyshev Distance the ratio of the sum of within-cluster scatter to between- The Chebyshev [10] distance between two vectors or cluster separation. If dpi is the dispersion of the cluster Pi, points p and q, with standard coordinates pi and qi, and dvij denotes the dissimilarity between two clusters Pi and respectively. Pj , then a cluster similarity matrix FR = { FRij , (i, j) = 1. 2 …..C} is defined as: www.ijascse.in Page 44
  • 5. Feb. 28 IJASCSE, Vol 2, Issue 1, 2013 III. EXPERIMENTAL ANALYSIS AND DISCUSSION This section describes, the data sets used to analyze the methods studied in sections II and III, which are arranged in the form of a list in Table1. A. Datasets 1) Iris The iris dataset contains the information about the iris flower. The data set contains 150 samples with four attributes. The dataset is collected from the location which is given in the link. http://archive.ics.uci.edu/ml/ machine- learning-databases /iris/ iris.data 2) Ecoli The Ecoli dataset contains protein localization sites having 350 samples with 8 attributes. The dataset is collected from the location which is given in the link. http://archive.ics.uci.edu/ml/machine-learning-databases /ecoli/ecoli.data Figure4. Distance functions analysis chart for K-Means 3) Yeast The yeast dataset contains 1400 samples with 8 attributes. The dataset is collected from the location which is given in From the figure 4 shows that, the various distance the link. http://archive.ics.uci.edu/ml/machine-learning- functions for K-Means are studied and compared with the databases /yeast/yeast.data dataset “Ecoli”. The K-Means clustering methods are executed with varying cluster centroids K from 2 to 10 with 4) Wine the data set. It clearly shows that the Euclidean distance The dataset contains the information about to determine function obtained the minimum index values for most of the the origin of wines. It contains 200 samples with 13 attributes different clusters values. Hence the Euclidean distance and the dataset is collected from the location which is given function is better for clustering algorithms than other in the link. http://archive.ics.uci.edu/ml/machine-learning- distance functions. databases/ yeast/wine.data 2) Performance of Clustering Algorithms B. Comparative Study and Performance analysis The four clustering algorithms are K-Means, Static The four algorithms are studied in the section II which are Weighted K-Means (SWK-Means), Dynamic Weighted K- implemented by the software MATLAB 2012 (a). All the Means (DWK-Means) and Proposed K-Means that are used methods are executed and compared by Ecoli, Iris, Yeast and to clustering the data sets. To access the quality of the Wine dataset. The Davis Bouldin index is used to determine clusters, the Davis Bouldin measure has been used. the performance of the clustering algorithms. The results are obtained from the various clustering algorithms and are listed 1) Performance of Distance functions in the Table II below. The K-Means clustering method is executed by three different distance functions are Manhattan, Euclidean and TABLE II. Davis Bouldin index analysis chebyshev with iris dataset are used and select the centroid value (K) from 2 to 10. The obtained results are listed in the table I given below. Davis Bouldin Index TABLE I. Distance Functions Data Static S.No Dynamic set Weight K- Weight Proposed Distance Functions K- Means K-Means K-Means S.No Clusters Means Manhattan Euclidean Chebyshev 1 Ecoli 0.71 0.95 0.84 0.65 1 2 0.510 0.524 0.658 2 Iris 0.65 0.49 0.53 0.73 2 3 0.653 0.416 0.548 3 Yeast 0.86 0.79 0.84 0.66 3 4 0.758 0.634 0.811 4 5 0.912 0.589 0.987 4 Wine 0.55 0.74 0.68 0.45 5 6 0.933 0.712 1.023 6 7 0.847 0.689 0.956 7 8 0.935 0.881 1.095 8 9 0.874 0.643 0.912 9 10 0.901 0.773 0.845 www.ijascse.in Page 45
  • 6. Feb. 28 IJASCSE, Vol 2, Issue 1, 2013 From the figure5 shows that, the performance of the four clustering algorithms are executed by the Iris dataset with varying the cluster centroids from 3 to 15. The proposed K- Means algorithm obtained the minimum execution time for most of the clustering centroids. The SWK-Means clustering algorithm does not produce minimum execution time for all various K values, but the other two algorithms produce minimum execution time for some K centroids values. Hence the proposed K-Means clustering executed in the minimum execution time and performed better than other 3 algorithms. 4) Iteration Count Analysis The four Clustering algorithms are compared for their performance using Iteration count method with the Wine dataset. The Iteration count is defined as that the number of Figure3. Clustering methods chart for various dataset times the clustering algorithm is executed until the convergence criteria is met. The cluster centres are increased From the figure3 shows that, the four clustering each time by 3 and the number of iterations for each algorithms are executed by the four different data set called clustering algorithms are obtained and listed in the below Ecoli, Iris, Yeast, Wine with the constant Cluster Centroid Table IV. (K) whose value is 5. The proposed K-Means algorithm obtained the minimum DB index values for the Ecoli, Wine data set also obtained next minimum index which is more TABLE IV. Iteration count Analysis than that of other algorithm. Hence the proposed K-Means clustering algorithm obtained good clustering results. Iteration Level S. Cluster Static Dynamic No K- Weight Weight Proposed 3) Execution Time Measure Means K-Means K-Means K-Means 1 2 8 7 11 6 Different Clustering algorithms are compared for their 2 3 6 8 5 7 performances using the time required to cluster the dataset. 3 4 14 18 15 9 The execution time is varying while selecting the number of 4 5 9 13 11 8 initial cluster centroids. The execution time is increased 5 6 4 8 11 6 when the number of cluster centroid is increased. The 6 7 15 13 17 11 obtained results are depicted in the following Table III 7 8 13 10 15 6 8 9 7 6 11 10 TABLE III. Execution Time 9 10 14 17 14 8 Execution Time (in sec) S.No Cluster Static Dynamic Weight Weight Refined K- K- K- K- Means Means Means Means 1 3 1.54 1.66 1.45 1.31 2 6 1.95 2.87 2.28 2.05 3 9 3.78 4.15 3.94 3.45 4 12 4.45 5.12 4.18 3.94 5 15 5.7 6.45 5.48 4.71 Figure6. Iteration levels chart for Clustering Algorithms Figure5. Execution Time chart for clustering algorithms www.ijascse.in Page 46
  • 7. Feb. 28 IJASCSE, Vol 2, Issue 1, 2013 [9] Thangadurai, K. e.a. 2010. A Study On Rough Clustering. In «Global Journal of Computer Science and Technology», Vol. 10, From the figure6, the iteration levels are identified by the Issue 5. four algorithms by setting the cluster centroid from 2 to 10 with the yeast dataset. The K-Means and proposed K-Means [10] http://en.wikipedia.org/wiki/Chebyshev_distance clustering methods performed well in minimum iterations. The Proposed K-Means method performed better than K- Means for most of the Clustering centroids (K) values. IV. CONCLUSION In this paper, the four different Clustering methods in the Partitional clustering are studied by applying the K-Means algorithm with three different distance functions to find the optimal distance function for clustering process. One of the demerits of K-Means algorithm is random selection of initial centroids of desired clusters. This was overcome by proposed K-Means with initial cluster centroid selection process for finding the initial centroids to avoid the selecting centroids randomly and it produces distinct better results. The four clustering algorithms are executed with four different dataset but the Proposed K-Means method performs very well and obtains minimum DB index value. The Execution time and iteration count is compared with the four different clustering algorithms and different cluster values. The Proposed K- Means achieved less execution time and minimum iteration count than K-Means, Static Weighted K-Means (SWK- Means), and Dynamic Weighted K-Means (DWK-Means) clustering methods. Therefore, the proposed K-Means clustering method can be applied in the application area and various cluster validity measures can be used to improve the cluster performance in our future work. REFERENCES [1] Davies & Bouldin, 1979. Davies, D.L., Bouldin, D.W., (2000) “A cluster separation measure.” IEEE Trans.Pattern Anal. Machine Intell., 1(4), 224-227. [2] HARTIGAN, J. and WONG, M. 1979. Algorithm AS136: “A k- means clustering algorithm”. Applied Statistics, 28, 100-108. [3] HEER, J. and CHI, E. 2001. “Identification of Web user traffic composition using multimodal clustering and information scent.”, 1st SIAM ICDM, Workshop on Web Mining, 51-58, Chicago, IL. [4] JAIN A.K, MURTY M.N. And FLYNN P.J. Data Clustering: A Review ACM Computing Surveys, Vol. 31, No. 3, September 1999. [5] Maleq Khan - Fast Distance Metric Based Data Mining Techniques Using P-trees: k-Nearest-Neighbor Classification and k-Clustering. [6] Maria Halkidi, Yannis Batistakis, and Michalis Varzirgiannis, 'On clustering validation techniques', Journal of Intelligent Information Systems, 17(2-3), 107–145, (2001). [7] Rokach L O Maimon - Data mining and knowledge discovery handbook, 2005 – Springer. [8] Sauravjoyti Sarmah and Dhruba K. Bhattacharyya. May 2010 “An Effective Technique for Clustering Incremental Gene Expression data”, IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 3, No. 3. www.ijascse.in Page 47