SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
International
OPEN ACCESS Journal
Of Modern Engineering Research (IJMER)
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 55 |
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type
Data
N. Aparna1
, M. Kalaiarasu2
1, 2
(Department of Computer Science, Department of Information Technology, Sri Ramakrishna Engineering
College, Coimbatore)
I. Introduction
Clustering is a fundamental technique of unsupervised learning in machine learning and statistics. It is
generally used to find groups of similar items in a set of unlabeled data. The aim of clustering is to divide a set
of data objects into clusters so that those data objects that belongs to the same cluster are more similar to each
other than those in other clusters [1-4]. In real world, datasets usually contain both numeric and categorical
variables [5,6]. However, most existing clustering algorithms assume all variables are either numeric or
categorical , examples of which include the k-means [7], k-modes [8], fuzzy k-modes [9] algorithms. Here, the
data is observed from multiple outlooks and in multiple types of dimensions. For example, in a student data set,
variables can be divided into personal information view showing the information about the student’s personal
information, the academic view describing the student’s academic performance and the extra-curricular view
which gives the extra-curricular activities and achievements made by the student.
Traditional methods take multiple views as a set of flat variables and do not take into account the
differences among various views [10], [11], [12]. In the case of multiview clustering, it takes the information
from multiple views and also considers the variations among different views which produces a more precise and
efficient partitioning of data.
In this paper, a new algorithm Multi-viewpoint K-prototypes (MK-Prototypes) for clustering mixed
type data is proposed. It is an enhancement to the usual k-prototypes algorithm. In order to differentiate the
effects of different views and different variables in clustering, the view weights and individual variables are
applied to the distance function. Here while computing the view weights, the complete set of variables are
considered and while calculating the weights of variables in a view, only a part of the data that includes the
variables in the view is considered. Thus, the view weights show the significance of views in the complete data
and the variables weights in a view shows the significance of variables in a view alone.
II. Related Works
Till date, there exist a number of algorithms and methods to directly deal with mixed type data. In [13],
Cen Li and Gautam Biswas proposed an algorithm, Similarity-based agglomerative clustering(SBAC) that
works well for data with mixed attributes. It adopts a similarity measure proposed by Goodall [14] for biological
taxonomy. In this method, while computing the similarity, higher weight is assigned to infrequent attribute value
matches. It does not make any suppositions on the underlying features of the attribute values. An agglomerative
algorithm is used to generate a dendrogram and a simple distinctness heuristic is used to extract a partition of the
data.Hsu and Chen proposed CAVE [15], a clustering algorithm based on the Variance and Entropy for
Abstract: Clustering mixed type data is one of the major research topics in the area of data mining. In
this paper, a new algorithm for clustering mixed type data is proposed where the concept of distribution
centroid is used to represent the prototype of categorical variables in a cluster which is then combined
with the mean to represent the prototype of clusters with mixed type variables. In the method, data is
observed from different views and the variables are grouped into different views. Those instances that
can be viewed differently from different viewpoints can be defined as multiview data. During clustering
process the differences among views are ignored in usual cases. Here, both views and variables weights
are computed simultaneously. The view weight is used to determine the closeness or density of view and
variable weight is used to identify the significance of each variable. With the intention of determining
the cluster of objects both these weights are used in the distance function. In the proposed method,
enhancement to the k-prototypes is done so that it automatically computes both view and variable
weights. The proposed algorithm MK-Prototypes algorithm is compared with two other clustering
algorithms.
Keywords: clustering, mixed data, multiview, variable weighting, view weighting, k-prototypes.
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 56 |
clustering mixed data. It builds a distance hierarchy for every categorical attributes which needs domain
expertise.Hsu et al.[16] proposed an extension to the self-organizing map to analyze mixed data where the
distance hierarchy is automatically constructed by using the values of class attributes.
In [17] Chatziz propsed KL-FCM-GM algorithm in which data derived from the clusters are in the
Guassian form and is designed for the Guass-Multinomial distributed data.
Huang presented a k-prototypes algorithm [18] where k-means is integrated with k-modes to partition
mixed data. Bezdek et al. considered the fuzzy nature of the objects in his work the fuzzy k-prototypes[19] and
Zheng et al. proposed [20] an evolutionary type k-prototypes algorithm by introducing an evolutionary
algorithm framework.
III. Proposed System
The motivation for the proposed system is on one hand to provide a better representation for the
categorical variable part in a mixed data since the numerical variables can be well represented using the mean
concept itself. On the other hand it considers the importance of view and variables weights in the process of
clustering. The concept of distribution centroid represents the cluster centroid for the categorical variable part.
Huang’s strategy of evaluation is used for the computation of both view weights and variable weights.
A. The distribution centroid
The idea of distribution centroid for a better representation of categorical variables is stimulated from
fuzzy centroid proposed by Kim et al.[ 21]. It makes use of a fuzzy scenario to represent the cluster centers for
the categorical variable part.
For Dom(Vj)={{𝑣𝑖
1
, 𝑣𝑖
2
, 𝑣𝑖
3
, … 𝑣𝑖
𝑡
}, the distribution centroid of a cluster o, denoted as 𝐶𝑜
′
, is represented as follows
𝐶𝑜
′
= 𝑐 𝑜1
′
, 𝑐 𝑜2
′
, … , 𝑐 𝑜𝑗
′
, … 𝑐 𝑜𝑚
′
(1)
where
𝑐 𝑜𝑗
′
= 𝑏𝑗
1
, 𝑤 𝑜𝑗
1
, 𝑏𝑗
2
, 𝑤 𝑜𝑗
2
,… 𝑏𝑗
𝑘
, 𝑤 𝑜𝑗
𝑘
, … 𝑏𝑗
𝑡
, 𝑤 𝑜𝑗
𝑡
(2)
.
In the above equation
𝑤 𝑜𝑗
𝑘
= 𝜇(
𝑛
𝑖=1
𝑥𝑖𝑗 ) (3)
where
𝜇(𝑥𝑖𝑗 )=
𝑢 𝑖𝑜
𝑢 𝑖𝑜
𝑛
𝑖=1
if 𝑥𝑖𝑗 = 𝑏𝑗
𝑘
𝑜 if 𝑥𝑖𝑗 ≠ 𝑏𝑗
𝑘
(4)
Here, 𝑢𝑖𝑜 is assigned the value 1, if the data object xi belongs to cluster o and as 0, if the data object xi do not
belong to cluster o
From the above mentioned equations it is clear that the computation of distribution centroid considers the
number of times each categorical value repeat in a cluster. Thus to denote the center of a cluster it takes into
account the distribution features of categorical variables
B. Weight calculation using Huang’s approach
Weight of a variable identifies the effect of that variable in clustering process. In 2005, Huang et al.
proposed an approach to calculate the weight of variable [22]. According to their method, the weight is
computed by minimizing the value of objective function.
The standard for assigning weight of variable is to allocate a larger value to a variable that has a
smaller sum of the within cluster distances (WCD), and vice versa. This principle is given by
𝑤𝑗 ∝
1
𝐷𝑗
(5)
where 𝑤𝑗 is the significance of the variable j, ∝ is the mathematical symbol denoting direct proportionality, and
𝐷 𝑗 is the sum of the within cluster distances for this variable.
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 57 |
C. Multiview concept
FIGURE 1 : Multiview concept
In 2013, Chen Et Al [23] proposed Tw-K-Means where the concept of multiview data was introduced.
The above figure illustrates the multiview concept. During the process of clustering, the differences among
different views are not considered. In the process of multiview clustering, in addition to variable weights, the
variables are grouped according to their characteristic properties. Each group is termed as a view and a weight is
assigned to each view. The view weight is assigned according to Huang’s approach.
D. The proposed algorithm
The proposed algorithm, MK-prototypes put together the concepts in section 3.1, section 3.2, section
3.3. The figure 2 describes the steps involved in the algorithm:
Steps in the proposed algorithm:
1. Compute the distribution centroid to represent the categorical variable centroid
2. Compute the mean for the numerical variables
3. Integrate the distribution centroid and mean to represent the prototype for the mixed data
4. Compute the view weights and variable weights.
5. Measure the similarity between the data objects and the prototypes
6. Assign the data object to that prototype to which the considered data object is the closest
7. Repeat steps 1-6 until an effective clustering result is obtained.
E. The optimization model
The clustering process to partition the dataset X into k clusters that considers both view weights and
variable weights is represented according to the framework of [23] as a minimization of the following objective
function.
𝑃 𝑈, 𝑍, 𝑅, 𝑉 = 𝑢𝑖,𝑜 𝑣𝑡 𝑟𝑠 𝑑(𝑥𝑖,𝑠
𝑠∈𝐺𝑡
𝑄
𝑡=1
,
𝑛
𝑖=1
𝑧 𝑜,𝑠
𝑘
𝑜=1
) (6)
subject to 𝑢𝑖,𝑜 = 1𝑘
𝑜=1 , 𝑢𝑖,𝑙 ∈ 0,1 , 1 ≤ 𝑖 ≤ 𝑛
𝑣𝑡 = 1, 0 ≤ 𝑣𝑡
𝑄
𝑖=1
≤ 1, 0 ≤ 𝑟𝑗 ≤ 1, 1 ≤ 𝑡 ≤ 𝑄, 𝑟𝑗 = 1
𝑗∈𝐺𝑡
where U is an n x k partition matrix whose elements 𝑢𝑖,𝑜 are binary where 𝑢𝑖,𝑜 = 1 indicates that object i is
allocated to cluster o.𝑍 = {𝑍1, 𝑍2, … . 𝑍 𝑘 } is a set of k vectors on behalf of the centers of the k clusters.𝑉 =
{𝑉1, 𝑉2, … . 𝑉𝑄 } are Q weights for Q views. 𝑅 = {𝑟1, 𝑟2…𝑟𝑠} are s weights for s variables.𝑑(𝑥𝑖,𝑠, 𝑧 𝑜,𝑠) is a distance
or dissimilarity measure on the 𝑠 𝑡ℎ
variable between the 𝑖 𝑡ℎ
object and the center of the 𝑜 𝑡ℎ
cluster.
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 58 |
FIGURE 2. Flowchart for the proposed algorithm
In order to minimize the equation, the problem is divided into four sub-problems:
1. Sub-problem 1: Fix Z=Z^,R=R^ and V=V^ and solve the reduced problem P(U,Z^,R^,V^).
2. Sub-problem 2: Fix U=U^, R=R^ and V=V^ and solve the reduced problem P(U^,Z,R^,V^).
3. Sub-problem 3: Fix Z=Z^, U=U^ and V=V^ and solve the reduced problem P(U^,Z^,R,V^).
4. Sub-problem 4: Fix Z=Z^, R=R^ and U=U^ and solve the reduced problem P(U^,Z^,R^,V).
The sub-problem 1 is solved by:
𝑢𝑖,𝑜 = 1 (7)
if
𝑣𝑡 𝑟𝑠d 𝑥𝑖,𝑜 , 𝑧𝑖,0
𝑚
𝑠=1
≤ 𝑣𝑡 𝑟𝑠
𝑚
𝑠=1
d 𝑥𝑖,𝑜 , 𝑧𝑒,0 (8)
where 1≤ 𝑒 ≤ 𝑘
𝑢𝑖,𝑜 = 0 𝑤ℎ𝑒𝑟𝑒 𝑒 ≠ 𝑜
The sub-problem 2 is solved for the numeric variable by
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 59 |
𝑧 𝑜,𝑠 =
𝑢𝑖,𝑜 𝑥𝑖,𝑠
𝑛
𝑖=1
𝑢𝑖,𝑜
𝑛
𝑖=1
(9)
and for the categorical variables by 𝑧 𝑜,𝑠 = 𝑐′𝑖,𝑠 which is already defined.
𝑑 𝑥𝑖,𝑠, 𝑧 𝑜,𝑠 = 𝑥𝑖,𝑠 − 𝑧 𝑜,𝑠 if the sth variable is a numeric variable .
𝑑 𝑥𝑖,𝑠, 𝑧 𝑜,𝑠 = 𝜑 𝑥𝑖,𝑠, 𝑧 𝑜,𝑠 if the sth variable is a categorical variable .
where 𝜑 𝑥𝑖,𝑠, 𝑧 𝑜,𝑠 = 𝛿 𝑥𝑖,𝑠, 𝑏𝑗
𝑘𝑡
𝑘=1 and 𝛿 𝑥𝑖,𝑠, 𝑏𝑗
𝑘
𝑖𝑠 0 𝑖𝑓 𝑥𝑖,𝑠 ≠ 𝑏𝑗
𝑘
and 𝑤 𝑜,𝑗
𝑘
if 𝑥𝑖,𝑠 = 𝑏𝑗
𝑘
.
The solution to the sub-problem 3 is as followed:
Let Z=Z^, U=U^ and V=V^ be fixed . Then the reduced problem P(U^,Z^,R,V^) is minimized if
𝑟𝑠 =
1
𝐷𝑠
𝐷ℎ
1
𝛾
ℎ𝜖𝐺𝑡
(10)
where
𝐷𝑠 = 𝑢′
𝑖,𝑜
𝑛
𝑖=1
𝑤′
𝑡 𝑑 𝑥𝑖,𝑠, 𝑧′
𝑜,𝑠
𝑘
𝑜=1
(11)
Sub-problem 4 is solved as follows
𝑤𝑡 =
1
𝐹𝑠
𝐹𝑡
1
𝜇ℎ
𝑡=1
(12)
where
𝐹𝑠 = 𝑢′
𝑖,𝑜 𝑟′
𝑠 𝑑 𝑥𝑖,𝑠, 𝑧′
𝑜,𝑠
𝑠𝜖 𝐺𝑡
𝑛
𝑖=1
𝑘
𝑜=1
(13)
Having presented the detailed computations required for calculating the important variables, the proposed
algorithm
MK-Prototypes can be described as given below:
1. Choose the number of iterations, number of clusters k, value of μ and γ, randomly choose k distinct data
objects and convert them into initial prototypes and initialize the view weights and variable weights.
2. Fix Z’, R’, V’ as 𝑍 𝑡
, 𝑅 𝑡
, 𝑉 𝑡
respectively and minimize the problem P(U, Z’, R’, V’) to obtain 𝑈 𝑡+1
.
3. Fix U’, R’, V’ as 𝑈 𝑡
, 𝑅 𝑡
, 𝑉 𝑡
respectively and minimize the problem P(U’, Z, R’, V’) to obtain 𝑍 𝑡+1
.
4. Fix U’, Z’, V’ as 𝑈 𝑡
, 𝑍 𝑡
, 𝑉 𝑡
respectively and minimize the problem P(U’, Z’, R, V’) to obtain 𝑅 𝑡+1
.
5. Fix U’, Z’, R’, V as 𝑈 𝑡
, 𝑍 𝑡
, 𝑅 𝑡
respectively and minimize the problem P(U’, Z’, R’, V) to obtain 𝑉 𝑡+1
.
6. If there is no improvement in P or if the maximum iterations is reached, then stop. Else increment t by 1 ,
decrement number of iterations by 1 and go to Step 2.
IV. Experiments on Performance Of Mk-Prototypes Algorithm
In order to measure the performance level of the proposed algorithm, it is used to cluster a real-world dataset
Heart (disease). The dataset is taken from UCI Machine Learning Repository.
The proposed algorithm is compared with k-prototypes and SBAC algorithm. They are well known for
clustering mixed type data. In this paper, the clustering accuracy is measured using one of the most commonly
used criteria. The clustering accuracy r is given by
𝑟 =
𝑎𝑖
𝑘
𝑖=1
𝑛
(14)
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 60 |
where 𝑎𝑖is the number of data objects that occur in both the ith cluster and its corresponding true class and n is
the number of data objects in a data set.
Higher the value of r , the higher the clustering accuracy . A perfect clustering gives a value of r=1.0.
A. Dataset description
The Heart disease data set is a mixed dataset. It contains 303 patient instances. The actual data set
contains 76 variables out of which 14 are considered usually. In the proposed algorithm, in order to define three
views 19 out of 76 variables are considered here. It consists of seven numeric variables and twelve categorical
variables.
These 19 variables can be naturally divided into 3 views.
1. Personal data view: It includes those variables which describes a patient’s personal data.
2. Historical data view: It includes those variables which describes a patient’s historical data like the habits.
3. Test output view: It includes all those variables which describes the results of various tests conducted for
the patient.
Here, 𝐺1, 𝐺2, 𝐺3 represents the three views personal, historical, test output respectively.
B. Results and analysis
Below are the graphical representations of the clustering results. Fig 3 shows the variation in variable
weights for varying μ values and fixed γ values. Fig 4 shows the variation in view weights for varying μ values
and fixed γ values.
From Table 1, it is observed that as μ increased, the variance of V decreased rapidly. This result can be
explained from equation (10) as μ increases, V becomes flatter. The graphical representation of the Table 1 has
been shown below.
Table 1: Variable weights vs γ value For fixed μ value
Table 2 shows that as γ increased, the variance of view weights decreased rapidly. This result can be explained
from equation (11) as γ increases, W becomes flatter. The graphical representation has been shown below.
Fig 3: Variable weights vs γ value for fixed μ value
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 5 10 15 20 25 30 35
Variableweights
γ
μ=1
μ=4
μ=12
μ
γ
1 4 12
10 0.01 0 0
15 0 0.01 0
20 0.7 0 0.05
25 0.03 0.4 0.1
30 0 0.1 0.02
35 0.02 0 0
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 61 |
Table 2: View weights vs μ value for fixed γ value
γ
μ
1 4 12
10 0.05 0.075 0.01
20 0.075 0.14 0.015
30 0.095 0.05 0.04
40 0.16 0.06 0.01
50 0.04 0.07 0.005
60 0.05 0.04 0.02
70 0.07 0.035 0.01
From above analysis, it can be summarized that the following method can be used to control two types of weight
distributions in MK-Prototypes algorithm by setting different values of γ and μ.
Figure 4: View weights vs μ value for fixed γ value
The experiments have been conducted for three different values of μ and γ for varying values of γ and μ
respectively.
1. Large μ makes more variables contribute to the clustering while small μ makes only important variables
contribute to the clustering.
2. Large γ makes more views contribute to the clustering while small γ makes only important views
contribute to the clustering.
Table 3: Comparison of accuracy rates of dataset considering all views
From the above table, it is clear that the proposed algorithm has a better clustering accuracy than the
existing k-prototypes and SBAC.
V. Conclusion
Mixed type data are encountered everywhere in the real world. In this paper, a new algorithm,
Multiview point based clustering algorithm for mixed type data has been proposed. When compared with the
existing algorithms the proposed algorithm has many significant contributions. The proposed algorithm
encapsulates the characteristics of clusters with mixed type variables more efficiently since it includes the
distribution information of both numeric and categorical variables.
It also takes into account the importance of various variables and views during the process of clustering
by using Huang’s approach and a new dissimilarity measure.
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0 10 20 30 40 50 60 70 80
Viewweights
μ
γ=1
γ=4
γ=12
Algorithms Clustering accuracy %
k-prototypes
SBAC
MK-Prototypes
0.521
0.747
0.846
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 62 |
It can compute weights for views and individual variables simultaneously in the clustering process.
With the two types of weights, dense views and significant variables can be identified and effect of low-quality
views and noise variables can be reduced.
Because of these contributions the proposed algorithm obtains higher clustering accuracy, which has
been validated by experimental results.
REFERENCES
[1] Z.X. Huang, Extensions to the k means algorithm for clustering large datasets with categorical values, Data Min.
Knowl. Discovery2 (3) (1998) 283–304.
[2] A.K.Jain, R.C.Dubes, Algorithms for Clustering Data, Prentice-Hall, New Jersey, 1988.
[3] A.K.Jain, M.N.Murty, P.J.Flynn, Data clustering: a survey, ACM Comput. Surv. 31 (3) (1999) 264–323.
[4] J.W.Han, M.Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann, SanFrancisco,2001.
[5] C.Hsu,Y.P.Huang, Incremental clustering of mixed data based on distance hierarchy, Expert Syst. Appl. 35 (3)
(2008) 1177–1185. [6] C.Hsu, S.Lin, W.Tai, Apply extended self-organizing map to cluster and classify
mixed-type data, Neurocomputing 74 (18) (2011) 3832–3842.
[7] S.Lloyd, Least squares quantization in PCM, IEEE Trans. Inf. Theory 28 (2) (1982) 129–137.
[8] Z.X.Huang, Extensions to the k-meansalgorithm for clustering large datasets with categorical values, Data Min.
Knowl. Discovery 2 (3) (1998) 283–304. [9]Z.X.Huang, M.K.Ng, A fuzzy k-modes algorithm for clustering
categorical data, IEEE Trans. Fuzzy Syst. 7 (4) (1999) 446–452.
[10] J. Mui and K. Fu, “Automated Classification Of Nucleated Blood Cells Using A Binary Tree Classifier,” IEEE
Trans. Pattern Analysis And Machine Intelligence, Vol. 2, No. 5, Pp. 429-443, May 1980.
[11] J. Wang, H. Zeng, Z. Chen, H. Lu, L. Tao, And W. Ma, “Recom: Reinforcement Clustering Of Multitype
Interrelated Data Objects,”Proc. 26th Ann. Int’l ACM SIGIR Conf. Research And Development In Informaion
Retrieval, Pp. 274-281, 2003.
[12] S. Bickel And T. Scheffer, “Multi-View Clustering,” Proc. IEEE Fourth Int’l Conf. Data Mining, Pp. 19-26, 2004.
[13] C.Li, G.Biswas, Unsupervised Learning with Mixed Numeric And Nominal Data, IEEE Trans. Knowl. Data
Eng.14 (4) (2002) 673–690.
[14] D.W.Goodall, A New Similarity Index Based On Probability, Biometrics 22 (4) (1966) 882–907.
[15] C.C.Hsu, Y.C.Chen, Mining Of Mixed Data With Application To Catalog Marketing, Expert Syst. Appl. 32 (1)
(2007) 12–27.
[16] C. Hsu, S.Lin, W.Tai, Apply Extended Self-Organizing Map To Cluster And Classify Mixed-Type Data,
Neurocomputing 74 (18) (2011) 3832–3842.
[17] S.P.Chatzis, A Fuzzy C-Means-Type Algorithm For Clustering Of Data With Mixed Numeric And Categorical
Attributes Employing A Probabilistic Dissimilarity Functional, Expert Syst. Appl. 38 (7) (2011) 8684–8689.
[18] Z.X.Huang, Clustering Large Datasets with Mixed Numeric and Categorical Values, In: Proceedings Of The First
Pacific-Asia Knowledge Discovery And Data Mining Conference, 1997, Pp.21–34.
[19] J.C.Bezdek, J.Keller, R.Krisnapuram, Fuzzy Models And Algorithms For Pattern Recognition And Image
Processing, Kluwer Academy Publishers, Boston, 1999. [25].
[20] Z.Zheng, M.G.Gong, J.J.Ma, L.C.Jiao, Unsupervised Evolutionary Clustering Algorithm For Mixed Type Data, In
: Proceedings Of The IEEE Congresson Evolutionary Computation (CEC), 2010, Pp.1–8.
[21] W.Kim, K.H.Lee, D.Lee, Fuzzy Clustering Of Categorical Data Using Fuzzy Centroid, Pattern Recognition
Lett.25 (11) (2004) 1263–1271.
[22] Z.X.Huang, M.K.Ng, H.Q.Rong, Z.C.Li, Automated Variable Weighting In K-Means Type Clustering, IEEE
Trans. Pattern Anal. Mach. Intell.27 (5) (2005) 657–668.
[23] Xiaojun Chen, Xiaofei Xu, Joshua Zhexue Huang, And Yunming Ye, Tw-K-Means: Automated Two-Level
Variable Weighting Clustering Algorithm For Multiview Data, IEEE Transactions On Knowledge And Data
Engineering, Vol. 25, No. 4, April 2013, pp 932-945

Contenu connexe

Tendances

Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...IJECEIAES
 
Analysis and implementation of modified k medoids
Analysis and implementation of modified k medoidsAnalysis and implementation of modified k medoids
Analysis and implementation of modified k medoidseSAT Publishing House
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
 
Mine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means ClusteringMine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimizationAlexander Decker
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clusteringKrish_ver2
 
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...TELKOMNIKA JOURNAL
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingIOSR Journals
 
A Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsA Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...IJRES Journal
 
Classification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmClassification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmeSAT Publishing House
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basicHouw Liong The
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Editor IJARCET
 

Tendances (20)

Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...
 
Analysis and implementation of modified k medoids
Analysis and implementation of modified k medoidsAnalysis and implementation of modified k medoids
Analysis and implementation of modified k medoids
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
 
Mine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means ClusteringMine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means Clustering
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
 
The improved k means with particle swarm optimization
The improved k means with particle swarm optimizationThe improved k means with particle swarm optimization
The improved k means with particle swarm optimization
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clustering
 
Cg33504508
Cg33504508Cg33504508
Cg33504508
 
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity Measure
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in Datamining
 
A Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsA Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text Documents
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
 
Classification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmClassification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithm
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973
 

En vedette

An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...IJMER
 
Cost Analysis of Small Scale Solar and Wind Energy Systems
Cost Analysis of Small Scale Solar and Wind Energy SystemsCost Analysis of Small Scale Solar and Wind Energy Systems
Cost Analysis of Small Scale Solar and Wind Energy SystemsIJMER
 
Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
 
The Effect of Design Parameters of an Integrated Linear Electromagnetic Moto...
The Effect of Design Parameters of an Integrated Linear  Electromagnetic Moto...The Effect of Design Parameters of an Integrated Linear  Electromagnetic Moto...
The Effect of Design Parameters of an Integrated Linear Electromagnetic Moto...IJMER
 
Static Analysis of a Pyro Turbine by using CFD
Static Analysis of a Pyro Turbine by using CFDStatic Analysis of a Pyro Turbine by using CFD
Static Analysis of a Pyro Turbine by using CFDIJMER
 
Personalization of the Web Search
Personalization of the Web SearchPersonalization of the Web Search
Personalization of the Web SearchIJMER
 
Analysis of Conditions in Boundary Lubrications Using Bearing Materials
Analysis of Conditions in Boundary Lubrications Using Bearing  MaterialsAnalysis of Conditions in Boundary Lubrications Using Bearing  Materials
Analysis of Conditions in Boundary Lubrications Using Bearing MaterialsIJMER
 
On Some Partial Orderings for Bimatrices
On Some Partial Orderings for BimatricesOn Some Partial Orderings for Bimatrices
On Some Partial Orderings for BimatricesIJMER
 
ΦΥΛΛΟ ΕΡΓΑΣΙΑΣ (ΕΙΣΑΓΩΓΗ ΞΕΝΟΦΩΝΤΑ )
ΦΥΛΛΟ ΕΡΓΑΣΙΑΣ (ΕΙΣΑΓΩΓΗ ΞΕΝΟΦΩΝΤΑ )ΦΥΛΛΟ ΕΡΓΑΣΙΑΣ (ΕΙΣΑΓΩΓΗ ΞΕΝΟΦΩΝΤΑ )
ΦΥΛΛΟ ΕΡΓΑΣΙΑΣ (ΕΙΣΑΓΩΓΗ ΞΕΝΟΦΩΝΤΑ )Popi Kaza
 
Bv31291295
Bv31291295Bv31291295
Bv31291295IJMER
 
Bs31267274
Bs31267274Bs31267274
Bs31267274IJMER
 
Secure and Efficient Hierarchical Data Aggregation in Wireless Sensor Networks
Secure and Efficient Hierarchical Data Aggregation in Wireless Sensor NetworksSecure and Efficient Hierarchical Data Aggregation in Wireless Sensor Networks
Secure and Efficient Hierarchical Data Aggregation in Wireless Sensor NetworksIJMER
 
Influence of chemical admixtures on density and slump loss of concrete
Influence of chemical admixtures on density and slump loss of concreteInfluence of chemical admixtures on density and slump loss of concrete
Influence of chemical admixtures on density and slump loss of concreteIJMER
 
Radiation and Mass Transfer Effects on MHD Natural Convection Flow over an In...
Radiation and Mass Transfer Effects on MHD Natural Convection Flow over an In...Radiation and Mass Transfer Effects on MHD Natural Convection Flow over an In...
Radiation and Mass Transfer Effects on MHD Natural Convection Flow over an In...IJMER
 
Develop with love bb10
Develop with love bb10Develop with love bb10
Develop with love bb10Bhasker Thapan
 
Analysis of Machining Characteristics of Cryogenically Treated Die Steels Usi...
Analysis of Machining Characteristics of Cryogenically Treated Die Steels Usi...Analysis of Machining Characteristics of Cryogenically Treated Die Steels Usi...
Analysis of Machining Characteristics of Cryogenically Treated Die Steels Usi...IJMER
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationIJMER
 
FPGA Based Wireless Jamming Networks
FPGA Based Wireless Jamming NetworksFPGA Based Wireless Jamming Networks
FPGA Based Wireless Jamming NetworksIJMER
 

En vedette (20)

An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...
 
South africa
South africaSouth africa
South africa
 
Cost Analysis of Small Scale Solar and Wind Energy Systems
Cost Analysis of Small Scale Solar and Wind Energy SystemsCost Analysis of Small Scale Solar and Wind Energy Systems
Cost Analysis of Small Scale Solar and Wind Energy Systems
 
Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...
 
The Effect of Design Parameters of an Integrated Linear Electromagnetic Moto...
The Effect of Design Parameters of an Integrated Linear  Electromagnetic Moto...The Effect of Design Parameters of an Integrated Linear  Electromagnetic Moto...
The Effect of Design Parameters of an Integrated Linear Electromagnetic Moto...
 
Static Analysis of a Pyro Turbine by using CFD
Static Analysis of a Pyro Turbine by using CFDStatic Analysis of a Pyro Turbine by using CFD
Static Analysis of a Pyro Turbine by using CFD
 
Personalization of the Web Search
Personalization of the Web SearchPersonalization of the Web Search
Personalization of the Web Search
 
Analysis of Conditions in Boundary Lubrications Using Bearing Materials
Analysis of Conditions in Boundary Lubrications Using Bearing  MaterialsAnalysis of Conditions in Boundary Lubrications Using Bearing  Materials
Analysis of Conditions in Boundary Lubrications Using Bearing Materials
 
On Some Partial Orderings for Bimatrices
On Some Partial Orderings for BimatricesOn Some Partial Orderings for Bimatrices
On Some Partial Orderings for Bimatrices
 
ΦΥΛΛΟ ΕΡΓΑΣΙΑΣ (ΕΙΣΑΓΩΓΗ ΞΕΝΟΦΩΝΤΑ )
ΦΥΛΛΟ ΕΡΓΑΣΙΑΣ (ΕΙΣΑΓΩΓΗ ΞΕΝΟΦΩΝΤΑ )ΦΥΛΛΟ ΕΡΓΑΣΙΑΣ (ΕΙΣΑΓΩΓΗ ΞΕΝΟΦΩΝΤΑ )
ΦΥΛΛΟ ΕΡΓΑΣΙΑΣ (ΕΙΣΑΓΩΓΗ ΞΕΝΟΦΩΝΤΑ )
 
Bv31291295
Bv31291295Bv31291295
Bv31291295
 
Bs31267274
Bs31267274Bs31267274
Bs31267274
 
Secure and Efficient Hierarchical Data Aggregation in Wireless Sensor Networks
Secure and Efficient Hierarchical Data Aggregation in Wireless Sensor NetworksSecure and Efficient Hierarchical Data Aggregation in Wireless Sensor Networks
Secure and Efficient Hierarchical Data Aggregation in Wireless Sensor Networks
 
Influence of chemical admixtures on density and slump loss of concrete
Influence of chemical admixtures on density and slump loss of concreteInfluence of chemical admixtures on density and slump loss of concrete
Influence of chemical admixtures on density and slump loss of concrete
 
Radiation and Mass Transfer Effects on MHD Natural Convection Flow over an In...
Radiation and Mass Transfer Effects on MHD Natural Convection Flow over an In...Radiation and Mass Transfer Effects on MHD Natural Convection Flow over an In...
Radiation and Mass Transfer Effects on MHD Natural Convection Flow over an In...
 
Develop with love bb10
Develop with love bb10Develop with love bb10
Develop with love bb10
 
Analysis of Machining Characteristics of Cryogenically Treated Die Steels Usi...
Analysis of Machining Characteristics of Cryogenically Treated Die Steels Usi...Analysis of Machining Characteristics of Cryogenically Treated Die Steels Usi...
Analysis of Machining Characteristics of Cryogenically Treated Die Steels Usi...
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document Summarization
 
Bill gates
Bill gatesBill gates
Bill gates
 
FPGA Based Wireless Jamming Networks
FPGA Based Wireless Jamming NetworksFPGA Based Wireless Jamming Networks
FPGA Based Wireless Jamming Networks
 

Similaire à MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data

Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringIJERD Editor
 
A03202001005
A03202001005A03202001005
A03202001005theijes
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications
 
Max stable set problem to found the initial centroids in clustering problem
Max stable set problem to found the initial centroids in clustering problemMax stable set problem to found the initial centroids in clustering problem
Max stable set problem to found the initial centroids in clustering problemnooriasukmaningtyas
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptionsrefedey275
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfSowmyaJyothi3
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
Clustering techniques final
Clustering techniques finalClustering techniques final
Clustering techniques finalBenard Maina
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
Image Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K MeansImage Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K MeansEditor IJCATR
 
A New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
A New Framework for Kmeans Algorithm by Combining the Dispersions of ClustersA New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
A New Framework for Kmeans Algorithm by Combining the Dispersions of ClustersIJMTST Journal
 

Similaire à MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data (20)

Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 
K044055762
K044055762K044055762
K044055762
 
47 292-298
47 292-29847 292-298
47 292-298
 
A03202001005
A03202001005A03202001005
A03202001005
 
I017235662
I017235662I017235662
I017235662
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
Max stable set problem to found the initial centroids in clustering problem
Max stable set problem to found the initial centroids in clustering problemMax stable set problem to found the initial centroids in clustering problem
Max stable set problem to found the initial centroids in clustering problem
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
One Graduate Paper
One Graduate PaperOne Graduate Paper
One Graduate Paper
 
Clustering techniques final
Clustering techniques finalClustering techniques final
Clustering techniques final
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
Image Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K MeansImage Segmentation Using Two Weighted Variable Fuzzy K Means
Image Segmentation Using Two Weighted Variable Fuzzy K Means
 
Bj24390398
Bj24390398Bj24390398
Bj24390398
 
A New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
A New Framework for Kmeans Algorithm by Combining the Dispersions of ClustersA New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
A New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
 
A0310112
A0310112A0310112
A0310112
 

Plus de IJMER

A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...IJMER
 
Developing Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingDeveloping Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingIJMER
 
Study & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreStudy & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreIJMER
 
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)IJMER
 
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...IJMER
 
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...IJMER
 
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...IJMER
 
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...IJMER
 
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationStatic Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationIJMER
 
High Speed Effortless Bicycle
High Speed Effortless BicycleHigh Speed Effortless Bicycle
High Speed Effortless BicycleIJMER
 
Integration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIntegration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIJMER
 
Microcontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemMicrocontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemIJMER
 
On some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesOn some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesIJMER
 
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningNatural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningIJMER
 
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessEvolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessIJMER
 
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersMaterial Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersIJMER
 
Studies On Energy Conservation And Audit
Studies On Energy Conservation And AuditStudies On Energy Conservation And Audit
Studies On Energy Conservation And AuditIJMER
 
An Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLAn Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLIJMER
 
Discrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyDiscrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyIJMER
 
Application of Parabolic Trough Collectorfor Reduction of Pressure Drop in Oi...
Application of Parabolic Trough Collectorfor Reduction of Pressure Drop in Oi...Application of Parabolic Trough Collectorfor Reduction of Pressure Drop in Oi...
Application of Parabolic Trough Collectorfor Reduction of Pressure Drop in Oi...IJMER
 

Plus de IJMER (20)

A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...
 
Developing Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingDeveloping Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed Delinting
 
Study & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreStudy & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja Fibre
 
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
 
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
 
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
 
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
 
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
 
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationStatic Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
 
High Speed Effortless Bicycle
High Speed Effortless BicycleHigh Speed Effortless Bicycle
High Speed Effortless Bicycle
 
Integration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIntegration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise Applications
 
Microcontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemMicrocontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation System
 
On some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesOn some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological Spaces
 
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningNatural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine Learning
 
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessEvolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
 
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersMaterial Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
 
Studies On Energy Conservation And Audit
Studies On Energy Conservation And AuditStudies On Energy Conservation And Audit
Studies On Energy Conservation And Audit
 
An Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLAn Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDL
 
Discrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyDiscrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One Prey
 
Application of Parabolic Trough Collectorfor Reduction of Pressure Drop in Oi...
Application of Parabolic Trough Collectorfor Reduction of Pressure Drop in Oi...Application of Parabolic Trough Collectorfor Reduction of Pressure Drop in Oi...
Application of Parabolic Trough Collectorfor Reduction of Pressure Drop in Oi...
 

Dernier

Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 

Dernier (20)

Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 

MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data

  • 1. International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) | IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 55 | MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data N. Aparna1 , M. Kalaiarasu2 1, 2 (Department of Computer Science, Department of Information Technology, Sri Ramakrishna Engineering College, Coimbatore) I. Introduction Clustering is a fundamental technique of unsupervised learning in machine learning and statistics. It is generally used to find groups of similar items in a set of unlabeled data. The aim of clustering is to divide a set of data objects into clusters so that those data objects that belongs to the same cluster are more similar to each other than those in other clusters [1-4]. In real world, datasets usually contain both numeric and categorical variables [5,6]. However, most existing clustering algorithms assume all variables are either numeric or categorical , examples of which include the k-means [7], k-modes [8], fuzzy k-modes [9] algorithms. Here, the data is observed from multiple outlooks and in multiple types of dimensions. For example, in a student data set, variables can be divided into personal information view showing the information about the student’s personal information, the academic view describing the student’s academic performance and the extra-curricular view which gives the extra-curricular activities and achievements made by the student. Traditional methods take multiple views as a set of flat variables and do not take into account the differences among various views [10], [11], [12]. In the case of multiview clustering, it takes the information from multiple views and also considers the variations among different views which produces a more precise and efficient partitioning of data. In this paper, a new algorithm Multi-viewpoint K-prototypes (MK-Prototypes) for clustering mixed type data is proposed. It is an enhancement to the usual k-prototypes algorithm. In order to differentiate the effects of different views and different variables in clustering, the view weights and individual variables are applied to the distance function. Here while computing the view weights, the complete set of variables are considered and while calculating the weights of variables in a view, only a part of the data that includes the variables in the view is considered. Thus, the view weights show the significance of views in the complete data and the variables weights in a view shows the significance of variables in a view alone. II. Related Works Till date, there exist a number of algorithms and methods to directly deal with mixed type data. In [13], Cen Li and Gautam Biswas proposed an algorithm, Similarity-based agglomerative clustering(SBAC) that works well for data with mixed attributes. It adopts a similarity measure proposed by Goodall [14] for biological taxonomy. In this method, while computing the similarity, higher weight is assigned to infrequent attribute value matches. It does not make any suppositions on the underlying features of the attribute values. An agglomerative algorithm is used to generate a dendrogram and a simple distinctness heuristic is used to extract a partition of the data.Hsu and Chen proposed CAVE [15], a clustering algorithm based on the Variance and Entropy for Abstract: Clustering mixed type data is one of the major research topics in the area of data mining. In this paper, a new algorithm for clustering mixed type data is proposed where the concept of distribution centroid is used to represent the prototype of categorical variables in a cluster which is then combined with the mean to represent the prototype of clusters with mixed type variables. In the method, data is observed from different views and the variables are grouped into different views. Those instances that can be viewed differently from different viewpoints can be defined as multiview data. During clustering process the differences among views are ignored in usual cases. Here, both views and variables weights are computed simultaneously. The view weight is used to determine the closeness or density of view and variable weight is used to identify the significance of each variable. With the intention of determining the cluster of objects both these weights are used in the distance function. In the proposed method, enhancement to the k-prototypes is done so that it automatically computes both view and variable weights. The proposed algorithm MK-Prototypes algorithm is compared with two other clustering algorithms. Keywords: clustering, mixed data, multiview, variable weighting, view weighting, k-prototypes.
  • 2. MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data | IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 56 | clustering mixed data. It builds a distance hierarchy for every categorical attributes which needs domain expertise.Hsu et al.[16] proposed an extension to the self-organizing map to analyze mixed data where the distance hierarchy is automatically constructed by using the values of class attributes. In [17] Chatziz propsed KL-FCM-GM algorithm in which data derived from the clusters are in the Guassian form and is designed for the Guass-Multinomial distributed data. Huang presented a k-prototypes algorithm [18] where k-means is integrated with k-modes to partition mixed data. Bezdek et al. considered the fuzzy nature of the objects in his work the fuzzy k-prototypes[19] and Zheng et al. proposed [20] an evolutionary type k-prototypes algorithm by introducing an evolutionary algorithm framework. III. Proposed System The motivation for the proposed system is on one hand to provide a better representation for the categorical variable part in a mixed data since the numerical variables can be well represented using the mean concept itself. On the other hand it considers the importance of view and variables weights in the process of clustering. The concept of distribution centroid represents the cluster centroid for the categorical variable part. Huang’s strategy of evaluation is used for the computation of both view weights and variable weights. A. The distribution centroid The idea of distribution centroid for a better representation of categorical variables is stimulated from fuzzy centroid proposed by Kim et al.[ 21]. It makes use of a fuzzy scenario to represent the cluster centers for the categorical variable part. For Dom(Vj)={{𝑣𝑖 1 , 𝑣𝑖 2 , 𝑣𝑖 3 , … 𝑣𝑖 𝑡 }, the distribution centroid of a cluster o, denoted as 𝐶𝑜 ′ , is represented as follows 𝐶𝑜 ′ = 𝑐 𝑜1 ′ , 𝑐 𝑜2 ′ , … , 𝑐 𝑜𝑗 ′ , … 𝑐 𝑜𝑚 ′ (1) where 𝑐 𝑜𝑗 ′ = 𝑏𝑗 1 , 𝑤 𝑜𝑗 1 , 𝑏𝑗 2 , 𝑤 𝑜𝑗 2 ,… 𝑏𝑗 𝑘 , 𝑤 𝑜𝑗 𝑘 , … 𝑏𝑗 𝑡 , 𝑤 𝑜𝑗 𝑡 (2) . In the above equation 𝑤 𝑜𝑗 𝑘 = 𝜇( 𝑛 𝑖=1 𝑥𝑖𝑗 ) (3) where 𝜇(𝑥𝑖𝑗 )= 𝑢 𝑖𝑜 𝑢 𝑖𝑜 𝑛 𝑖=1 if 𝑥𝑖𝑗 = 𝑏𝑗 𝑘 𝑜 if 𝑥𝑖𝑗 ≠ 𝑏𝑗 𝑘 (4) Here, 𝑢𝑖𝑜 is assigned the value 1, if the data object xi belongs to cluster o and as 0, if the data object xi do not belong to cluster o From the above mentioned equations it is clear that the computation of distribution centroid considers the number of times each categorical value repeat in a cluster. Thus to denote the center of a cluster it takes into account the distribution features of categorical variables B. Weight calculation using Huang’s approach Weight of a variable identifies the effect of that variable in clustering process. In 2005, Huang et al. proposed an approach to calculate the weight of variable [22]. According to their method, the weight is computed by minimizing the value of objective function. The standard for assigning weight of variable is to allocate a larger value to a variable that has a smaller sum of the within cluster distances (WCD), and vice versa. This principle is given by 𝑤𝑗 ∝ 1 𝐷𝑗 (5) where 𝑤𝑗 is the significance of the variable j, ∝ is the mathematical symbol denoting direct proportionality, and 𝐷 𝑗 is the sum of the within cluster distances for this variable.
  • 3. MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data | IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 57 | C. Multiview concept FIGURE 1 : Multiview concept In 2013, Chen Et Al [23] proposed Tw-K-Means where the concept of multiview data was introduced. The above figure illustrates the multiview concept. During the process of clustering, the differences among different views are not considered. In the process of multiview clustering, in addition to variable weights, the variables are grouped according to their characteristic properties. Each group is termed as a view and a weight is assigned to each view. The view weight is assigned according to Huang’s approach. D. The proposed algorithm The proposed algorithm, MK-prototypes put together the concepts in section 3.1, section 3.2, section 3.3. The figure 2 describes the steps involved in the algorithm: Steps in the proposed algorithm: 1. Compute the distribution centroid to represent the categorical variable centroid 2. Compute the mean for the numerical variables 3. Integrate the distribution centroid and mean to represent the prototype for the mixed data 4. Compute the view weights and variable weights. 5. Measure the similarity between the data objects and the prototypes 6. Assign the data object to that prototype to which the considered data object is the closest 7. Repeat steps 1-6 until an effective clustering result is obtained. E. The optimization model The clustering process to partition the dataset X into k clusters that considers both view weights and variable weights is represented according to the framework of [23] as a minimization of the following objective function. 𝑃 𝑈, 𝑍, 𝑅, 𝑉 = 𝑢𝑖,𝑜 𝑣𝑡 𝑟𝑠 𝑑(𝑥𝑖,𝑠 𝑠∈𝐺𝑡 𝑄 𝑡=1 , 𝑛 𝑖=1 𝑧 𝑜,𝑠 𝑘 𝑜=1 ) (6) subject to 𝑢𝑖,𝑜 = 1𝑘 𝑜=1 , 𝑢𝑖,𝑙 ∈ 0,1 , 1 ≤ 𝑖 ≤ 𝑛 𝑣𝑡 = 1, 0 ≤ 𝑣𝑡 𝑄 𝑖=1 ≤ 1, 0 ≤ 𝑟𝑗 ≤ 1, 1 ≤ 𝑡 ≤ 𝑄, 𝑟𝑗 = 1 𝑗∈𝐺𝑡 where U is an n x k partition matrix whose elements 𝑢𝑖,𝑜 are binary where 𝑢𝑖,𝑜 = 1 indicates that object i is allocated to cluster o.𝑍 = {𝑍1, 𝑍2, … . 𝑍 𝑘 } is a set of k vectors on behalf of the centers of the k clusters.𝑉 = {𝑉1, 𝑉2, … . 𝑉𝑄 } are Q weights for Q views. 𝑅 = {𝑟1, 𝑟2…𝑟𝑠} are s weights for s variables.𝑑(𝑥𝑖,𝑠, 𝑧 𝑜,𝑠) is a distance or dissimilarity measure on the 𝑠 𝑡ℎ variable between the 𝑖 𝑡ℎ object and the center of the 𝑜 𝑡ℎ cluster.
  • 4. MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data | IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 58 | FIGURE 2. Flowchart for the proposed algorithm In order to minimize the equation, the problem is divided into four sub-problems: 1. Sub-problem 1: Fix Z=Z^,R=R^ and V=V^ and solve the reduced problem P(U,Z^,R^,V^). 2. Sub-problem 2: Fix U=U^, R=R^ and V=V^ and solve the reduced problem P(U^,Z,R^,V^). 3. Sub-problem 3: Fix Z=Z^, U=U^ and V=V^ and solve the reduced problem P(U^,Z^,R,V^). 4. Sub-problem 4: Fix Z=Z^, R=R^ and U=U^ and solve the reduced problem P(U^,Z^,R^,V). The sub-problem 1 is solved by: 𝑢𝑖,𝑜 = 1 (7) if 𝑣𝑡 𝑟𝑠d 𝑥𝑖,𝑜 , 𝑧𝑖,0 𝑚 𝑠=1 ≤ 𝑣𝑡 𝑟𝑠 𝑚 𝑠=1 d 𝑥𝑖,𝑜 , 𝑧𝑒,0 (8) where 1≤ 𝑒 ≤ 𝑘 𝑢𝑖,𝑜 = 0 𝑤ℎ𝑒𝑟𝑒 𝑒 ≠ 𝑜 The sub-problem 2 is solved for the numeric variable by
  • 5. MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data | IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 59 | 𝑧 𝑜,𝑠 = 𝑢𝑖,𝑜 𝑥𝑖,𝑠 𝑛 𝑖=1 𝑢𝑖,𝑜 𝑛 𝑖=1 (9) and for the categorical variables by 𝑧 𝑜,𝑠 = 𝑐′𝑖,𝑠 which is already defined. 𝑑 𝑥𝑖,𝑠, 𝑧 𝑜,𝑠 = 𝑥𝑖,𝑠 − 𝑧 𝑜,𝑠 if the sth variable is a numeric variable . 𝑑 𝑥𝑖,𝑠, 𝑧 𝑜,𝑠 = 𝜑 𝑥𝑖,𝑠, 𝑧 𝑜,𝑠 if the sth variable is a categorical variable . where 𝜑 𝑥𝑖,𝑠, 𝑧 𝑜,𝑠 = 𝛿 𝑥𝑖,𝑠, 𝑏𝑗 𝑘𝑡 𝑘=1 and 𝛿 𝑥𝑖,𝑠, 𝑏𝑗 𝑘 𝑖𝑠 0 𝑖𝑓 𝑥𝑖,𝑠 ≠ 𝑏𝑗 𝑘 and 𝑤 𝑜,𝑗 𝑘 if 𝑥𝑖,𝑠 = 𝑏𝑗 𝑘 . The solution to the sub-problem 3 is as followed: Let Z=Z^, U=U^ and V=V^ be fixed . Then the reduced problem P(U^,Z^,R,V^) is minimized if 𝑟𝑠 = 1 𝐷𝑠 𝐷ℎ 1 𝛾 ℎ𝜖𝐺𝑡 (10) where 𝐷𝑠 = 𝑢′ 𝑖,𝑜 𝑛 𝑖=1 𝑤′ 𝑡 𝑑 𝑥𝑖,𝑠, 𝑧′ 𝑜,𝑠 𝑘 𝑜=1 (11) Sub-problem 4 is solved as follows 𝑤𝑡 = 1 𝐹𝑠 𝐹𝑡 1 𝜇ℎ 𝑡=1 (12) where 𝐹𝑠 = 𝑢′ 𝑖,𝑜 𝑟′ 𝑠 𝑑 𝑥𝑖,𝑠, 𝑧′ 𝑜,𝑠 𝑠𝜖 𝐺𝑡 𝑛 𝑖=1 𝑘 𝑜=1 (13) Having presented the detailed computations required for calculating the important variables, the proposed algorithm MK-Prototypes can be described as given below: 1. Choose the number of iterations, number of clusters k, value of μ and γ, randomly choose k distinct data objects and convert them into initial prototypes and initialize the view weights and variable weights. 2. Fix Z’, R’, V’ as 𝑍 𝑡 , 𝑅 𝑡 , 𝑉 𝑡 respectively and minimize the problem P(U, Z’, R’, V’) to obtain 𝑈 𝑡+1 . 3. Fix U’, R’, V’ as 𝑈 𝑡 , 𝑅 𝑡 , 𝑉 𝑡 respectively and minimize the problem P(U’, Z, R’, V’) to obtain 𝑍 𝑡+1 . 4. Fix U’, Z’, V’ as 𝑈 𝑡 , 𝑍 𝑡 , 𝑉 𝑡 respectively and minimize the problem P(U’, Z’, R, V’) to obtain 𝑅 𝑡+1 . 5. Fix U’, Z’, R’, V as 𝑈 𝑡 , 𝑍 𝑡 , 𝑅 𝑡 respectively and minimize the problem P(U’, Z’, R’, V) to obtain 𝑉 𝑡+1 . 6. If there is no improvement in P or if the maximum iterations is reached, then stop. Else increment t by 1 , decrement number of iterations by 1 and go to Step 2. IV. Experiments on Performance Of Mk-Prototypes Algorithm In order to measure the performance level of the proposed algorithm, it is used to cluster a real-world dataset Heart (disease). The dataset is taken from UCI Machine Learning Repository. The proposed algorithm is compared with k-prototypes and SBAC algorithm. They are well known for clustering mixed type data. In this paper, the clustering accuracy is measured using one of the most commonly used criteria. The clustering accuracy r is given by 𝑟 = 𝑎𝑖 𝑘 𝑖=1 𝑛 (14)
  • 6. MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data | IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 60 | where 𝑎𝑖is the number of data objects that occur in both the ith cluster and its corresponding true class and n is the number of data objects in a data set. Higher the value of r , the higher the clustering accuracy . A perfect clustering gives a value of r=1.0. A. Dataset description The Heart disease data set is a mixed dataset. It contains 303 patient instances. The actual data set contains 76 variables out of which 14 are considered usually. In the proposed algorithm, in order to define three views 19 out of 76 variables are considered here. It consists of seven numeric variables and twelve categorical variables. These 19 variables can be naturally divided into 3 views. 1. Personal data view: It includes those variables which describes a patient’s personal data. 2. Historical data view: It includes those variables which describes a patient’s historical data like the habits. 3. Test output view: It includes all those variables which describes the results of various tests conducted for the patient. Here, 𝐺1, 𝐺2, 𝐺3 represents the three views personal, historical, test output respectively. B. Results and analysis Below are the graphical representations of the clustering results. Fig 3 shows the variation in variable weights for varying μ values and fixed γ values. Fig 4 shows the variation in view weights for varying μ values and fixed γ values. From Table 1, it is observed that as μ increased, the variance of V decreased rapidly. This result can be explained from equation (10) as μ increases, V becomes flatter. The graphical representation of the Table 1 has been shown below. Table 1: Variable weights vs γ value For fixed μ value Table 2 shows that as γ increased, the variance of view weights decreased rapidly. This result can be explained from equation (11) as γ increases, W becomes flatter. The graphical representation has been shown below. Fig 3: Variable weights vs γ value for fixed μ value 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 5 10 15 20 25 30 35 Variableweights γ μ=1 μ=4 μ=12 μ γ 1 4 12 10 0.01 0 0 15 0 0.01 0 20 0.7 0 0.05 25 0.03 0.4 0.1 30 0 0.1 0.02 35 0.02 0 0
  • 7. MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data | IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 61 | Table 2: View weights vs μ value for fixed γ value γ μ 1 4 12 10 0.05 0.075 0.01 20 0.075 0.14 0.015 30 0.095 0.05 0.04 40 0.16 0.06 0.01 50 0.04 0.07 0.005 60 0.05 0.04 0.02 70 0.07 0.035 0.01 From above analysis, it can be summarized that the following method can be used to control two types of weight distributions in MK-Prototypes algorithm by setting different values of γ and μ. Figure 4: View weights vs μ value for fixed γ value The experiments have been conducted for three different values of μ and γ for varying values of γ and μ respectively. 1. Large μ makes more variables contribute to the clustering while small μ makes only important variables contribute to the clustering. 2. Large γ makes more views contribute to the clustering while small γ makes only important views contribute to the clustering. Table 3: Comparison of accuracy rates of dataset considering all views From the above table, it is clear that the proposed algorithm has a better clustering accuracy than the existing k-prototypes and SBAC. V. Conclusion Mixed type data are encountered everywhere in the real world. In this paper, a new algorithm, Multiview point based clustering algorithm for mixed type data has been proposed. When compared with the existing algorithms the proposed algorithm has many significant contributions. The proposed algorithm encapsulates the characteristics of clusters with mixed type variables more efficiently since it includes the distribution information of both numeric and categorical variables. It also takes into account the importance of various variables and views during the process of clustering by using Huang’s approach and a new dissimilarity measure. 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0 10 20 30 40 50 60 70 80 Viewweights μ γ=1 γ=4 γ=12 Algorithms Clustering accuracy % k-prototypes SBAC MK-Prototypes 0.521 0.747 0.846
  • 8. MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data | IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 62 | It can compute weights for views and individual variables simultaneously in the clustering process. With the two types of weights, dense views and significant variables can be identified and effect of low-quality views and noise variables can be reduced. Because of these contributions the proposed algorithm obtains higher clustering accuracy, which has been validated by experimental results. REFERENCES [1] Z.X. Huang, Extensions to the k means algorithm for clustering large datasets with categorical values, Data Min. Knowl. Discovery2 (3) (1998) 283–304. [2] A.K.Jain, R.C.Dubes, Algorithms for Clustering Data, Prentice-Hall, New Jersey, 1988. [3] A.K.Jain, M.N.Murty, P.J.Flynn, Data clustering: a survey, ACM Comput. Surv. 31 (3) (1999) 264–323. [4] J.W.Han, M.Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann, SanFrancisco,2001. [5] C.Hsu,Y.P.Huang, Incremental clustering of mixed data based on distance hierarchy, Expert Syst. Appl. 35 (3) (2008) 1177–1185. [6] C.Hsu, S.Lin, W.Tai, Apply extended self-organizing map to cluster and classify mixed-type data, Neurocomputing 74 (18) (2011) 3832–3842. [7] S.Lloyd, Least squares quantization in PCM, IEEE Trans. Inf. Theory 28 (2) (1982) 129–137. [8] Z.X.Huang, Extensions to the k-meansalgorithm for clustering large datasets with categorical values, Data Min. Knowl. Discovery 2 (3) (1998) 283–304. [9]Z.X.Huang, M.K.Ng, A fuzzy k-modes algorithm for clustering categorical data, IEEE Trans. Fuzzy Syst. 7 (4) (1999) 446–452. [10] J. Mui and K. Fu, “Automated Classification Of Nucleated Blood Cells Using A Binary Tree Classifier,” IEEE Trans. Pattern Analysis And Machine Intelligence, Vol. 2, No. 5, Pp. 429-443, May 1980. [11] J. Wang, H. Zeng, Z. Chen, H. Lu, L. Tao, And W. Ma, “Recom: Reinforcement Clustering Of Multitype Interrelated Data Objects,”Proc. 26th Ann. Int’l ACM SIGIR Conf. Research And Development In Informaion Retrieval, Pp. 274-281, 2003. [12] S. Bickel And T. Scheffer, “Multi-View Clustering,” Proc. IEEE Fourth Int’l Conf. Data Mining, Pp. 19-26, 2004. [13] C.Li, G.Biswas, Unsupervised Learning with Mixed Numeric And Nominal Data, IEEE Trans. Knowl. Data Eng.14 (4) (2002) 673–690. [14] D.W.Goodall, A New Similarity Index Based On Probability, Biometrics 22 (4) (1966) 882–907. [15] C.C.Hsu, Y.C.Chen, Mining Of Mixed Data With Application To Catalog Marketing, Expert Syst. Appl. 32 (1) (2007) 12–27. [16] C. Hsu, S.Lin, W.Tai, Apply Extended Self-Organizing Map To Cluster And Classify Mixed-Type Data, Neurocomputing 74 (18) (2011) 3832–3842. [17] S.P.Chatzis, A Fuzzy C-Means-Type Algorithm For Clustering Of Data With Mixed Numeric And Categorical Attributes Employing A Probabilistic Dissimilarity Functional, Expert Syst. Appl. 38 (7) (2011) 8684–8689. [18] Z.X.Huang, Clustering Large Datasets with Mixed Numeric and Categorical Values, In: Proceedings Of The First Pacific-Asia Knowledge Discovery And Data Mining Conference, 1997, Pp.21–34. [19] J.C.Bezdek, J.Keller, R.Krisnapuram, Fuzzy Models And Algorithms For Pattern Recognition And Image Processing, Kluwer Academy Publishers, Boston, 1999. [25]. [20] Z.Zheng, M.G.Gong, J.J.Ma, L.C.Jiao, Unsupervised Evolutionary Clustering Algorithm For Mixed Type Data, In : Proceedings Of The IEEE Congresson Evolutionary Computation (CEC), 2010, Pp.1–8. [21] W.Kim, K.H.Lee, D.Lee, Fuzzy Clustering Of Categorical Data Using Fuzzy Centroid, Pattern Recognition Lett.25 (11) (2004) 1263–1271. [22] Z.X.Huang, M.K.Ng, H.Q.Rong, Z.C.Li, Automated Variable Weighting In K-Means Type Clustering, IEEE Trans. Pattern Anal. Mach. Intell.27 (5) (2005) 657–668. [23] Xiaojun Chen, Xiaofei Xu, Joshua Zhexue Huang, And Yunming Ye, Tw-K-Means: Automated Two-Level Variable Weighting Clustering Algorithm For Multiview Data, IEEE Transactions On Knowledge And Data Engineering, Vol. 25, No. 4, April 2013, pp 932-945