SlideShare une entreprise Scribd logo
1  sur  21
K-Mean clustering
Index:
• What is Clustering
• Types of Clustering
• Common Distance Measure
• k means Clustering
• How k-mean clustering algorithm works
• Simple example showing the implementation of k means algorithm
• Real life numerical example for k means clustering
• Weakness of k means clustering
• Application of k means clustering algorithm
Clustering
Clustering is the task of grouping a set of objects in such a way that objects in
the same group (called a cluster) are more similar (in some sense or another) to
each other than to those in other groups (clusters).
Clustering is the classification of objects into different groups, or more precisely,
the partitions of a data set into subsets (clusters), so that the data in each subset
(ideally)share some common trait – often according to some defined distance
measure.
Examples of Clustering
Types of
clustering
1. Hierarchical algorithms: Hierarchy algorithm is a method of cluster analysis which
seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally
fall into two types:
1.1 Agglomerative ("bottom-up"): Agglomerative algorithms begin with each
element as a separate cluster and merge them into successively larger clusters.
1.2 Divisive ("top-down"): Divisive algorithms begin with the whole set and
proceed to divide it into successively smaller clusters.
2. Partitional clustering: Partitional algorithms determine all clusters at once. They
include:
• K-means and derivatives
• Fuzzy c-means clustering
• QT clustering algorithm
Common Distance
Measures
Distance measure will determine how the similarity of two elements is calculated and it will
influence the shape of the clusters. They include:
1. The Euclidean distance (also called 2-norm distance) is given by:
2. The Manhattan distance (also called taxicab norm or 1-norm) is given by:
3.The maximum norm is given by:
4. The Mahalanobis distance corrects data for different scales and correlations in the
variables.
5. Inner product space: The angle between two vectors can be used as a distance measure
when clustering high dimensional data
6. Hamming distance (sometimes edit distance) measures the minimum number of
substitutions required to change one member into another.
K means
Clustering
The k-means algorithm is an algorithm to cluster n objects based on attributes into k
partitions, where k < n.
• It is similar to the expectation-maximization algorithm for mixtures of Gaussians in
that they both attempt to find the centers of natural clusters in the data.
• It assumes that the object attributes form a vector space.
• An algorithm for partitioning (or clustering) N data points into K disjoint subsets Sj
containing data points so as to minimize the sum-of-squares criterion
where xn is a vector representing the the nth data point and uj is the geometric centroid
of the data points in Sj.
• Simply speaking k-means clustering is an algorithm to classify or to group the objects
based on attributes/features into K number of group. K is positive integer number.
• The grouping is done by minimizing the sum of squares of distances between data and
the corresponding cluster centroid.
K- means Clustering algorithm
 Step 1: Begin with a decision on the value of k =number of clusters .
 Step 2: Put any initial partition that classifies the data into k clusters. You may
assign the training samples randomly or systematically as the following:
 1. Take the first k training sample as single element clusters
 2. Assign each of the remaining (N-k) training sample to the cluster with the nearest
centroid. After each assignment, recomputed the centroid of the gaining cluster.
 Step 3: Take each sample in sequence and compute its distance from the centroid of
each of the clusters. If a sample is not currently in the cluster with the closest
centroid, switch this sample to that cluster and update the centroid of the cluster
gaining the new sample and the cluster losing the sample.
 Step 4 . Repeat step 3 until convergence is achieved, that is until a pass through the
training sample causes no new assignments.
Simple example showing the implementation of k means
algorithm Step 1:
Initialization: Randomly we choose following
two centroids (k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0)
and m2=(5.0,7.0).
Step 2:
Thus, we obtain two clusters containing:
{1,2,3} and {4,5,6,7}.
Their new centroids are:
Step 3:
• Now using these centroids of step2, we
compute the Euclidean distance of each object,
as shown in table.
• Therefore, the new clusters are:
{1,2} and {3,4,5,6,7}
• Next centroids are: m1=(1.25,1.5) and m2 =
(3.9,5.1)
Step 4 :
• The clusters obtained are:
{1,2} and {3,4,5,6,7}
• Therefore, there is no change in the cluster.
• Thus, the algorithm comes to a halt here and
final result consist of 2 clusters {1,2} and
{3,4,5,6,7}.
Plot For k = 2
(with K=3)
Step 1 Step 2
Plot for k = 3
Real life numerical example for k means
clustering
We have 4 medicines as our training data points object and each medicine has 2
attributes. Each attribute represents coordinate of the object. We have to
determine which medicines belong to cluster 1 and which medicines belong to the
other cluster.
Object
Attribute1 (X):
weight index
Attribute 2 (Y): pH
Medicine A 1 1
Medicine B 2 1
Medicine C 4 3
Medicine D 5 4
Final Answer:
Object Feature1(X):
weight index
Feature2
(Y): pH
Group
(result)
Medicine A 1 1 1
Medicine B 2 1 1
Medicine C 4 3 2
Medicine D 5 4 2
Weaknesses of K-Mean
Clustering
• When the numbers of data are not so many, initial grouping will determine
the cluster significantly.
• The number of cluster, K, must be determined before hand. Its disadvantage
is that it does not yield the same result with each run, since the resulting
clusters depend on the initial random assignments.
• We never know the real cluster, using the same data, because if it is inputted
in a different order it may produce different cluster if the number of data is
few.
• It is sensitive to initial condition. Different initial condition may produce
different result of cluster. The algorithm may be trapped in the local
optimum.
Applications of K-Mean
Clustering
• It is relatively efficient and fast. It computes result at O(tkn), where n is
number of objects or points, k is number of clusters and t is number of
iterations.
• k-means clustering can be applied to machine learning or data mining
• Used on acoustic data in speech understanding to convert waveforms into one
of k categories (known as Vector Quantization or Image Segmentation).
• Also used for choosing color palettes on old fashioned graphical display
devices and Image Quantization.
Thank You

Contenu connexe

Tendances

K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++sabbirantor
 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewVahid Mirjalili
 
Clustering
ClusteringClustering
ClusteringMeme Hei
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithmDarshak Mehta
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithmVinit Dantkale
 
Clustering in artificial intelligence
Clustering in artificial intelligence Clustering in artificial intelligence
Clustering in artificial intelligence Karam Munir Butt
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
K-Means clustring @jax
K-Means clustring @jaxK-Means clustring @jax
K-Means clustring @jaxAjay Iet
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering AlgorithmLino Possamai
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsNithyananthSengottai
 

Tendances (20)

K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
Dataa miining
Dataa miiningDataa miining
Dataa miining
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
Lect4
Lect4Lect4
Lect4
 
Clustering
ClusteringClustering
Clustering
 
K means clustering
K means clusteringK means clustering
K means clustering
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
 
Clustering in artificial intelligence
Clustering in artificial intelligence Clustering in artificial intelligence
Clustering in artificial intelligence
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Unsupervised Learning
Unsupervised LearningUnsupervised Learning
Unsupervised Learning
 
K-Means clustring @jax
K-Means clustring @jaxK-Means clustring @jax
K-Means clustring @jax
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 

Similaire à Clustering

K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsVoidVampire
 
AI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxAI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxSyed Ejaz
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptionsrefedey275
 
Lecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.pptLecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.pptSyedNahin1
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptImXaib
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSandinoBerutu1
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningPyingkodi Maran
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxnikshaikh786
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
 

Similaire à Clustering (20)

K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
 
AI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxAI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptx
 
k-mean-clustering.pdf
k-mean-clustering.pdfk-mean-clustering.pdf
k-mean-clustering.pdf
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
Lecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.pptLecture_3_k-mean-clustering.ppt
Lecture_3_k-mean-clustering.ppt
 
47 292-298
47 292-29847 292-298
47 292-298
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
ClusetrigBasic.ppt
ClusetrigBasic.pptClusetrigBasic.ppt
ClusetrigBasic.ppt
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 

Plus de Md. Hasnat Shoheb

Story classification using linguistic feature and topic modeling algorithm
Story classification using linguistic feature and topic modeling algorithm Story classification using linguistic feature and topic modeling algorithm
Story classification using linguistic feature and topic modeling algorithm Md. Hasnat Shoheb
 
INTERNAL STRUCTURE OF 8086 MICROPROCESSOR
INTERNAL STRUCTURE OF  8086 MICROPROCESSORINTERNAL STRUCTURE OF  8086 MICROPROCESSOR
INTERNAL STRUCTURE OF 8086 MICROPROCESSORMd. Hasnat Shoheb
 

Plus de Md. Hasnat Shoheb (10)

Slop Intercept Method
Slop Intercept MethodSlop Intercept Method
Slop Intercept Method
 
Story classification using linguistic feature and topic modeling algorithm
Story classification using linguistic feature and topic modeling algorithm Story classification using linguistic feature and topic modeling algorithm
Story classification using linguistic feature and topic modeling algorithm
 
COST CHAPTER OF ACCOUNTING
COST CHAPTER OF  ACCOUNTING COST CHAPTER OF  ACCOUNTING
COST CHAPTER OF ACCOUNTING
 
INTERNAL STRUCTURE OF 8086 MICROPROCESSOR
INTERNAL STRUCTURE OF  8086 MICROPROCESSORINTERNAL STRUCTURE OF  8086 MICROPROCESSOR
INTERNAL STRUCTURE OF 8086 MICROPROCESSOR
 
House rent system
House rent systemHouse rent system
House rent system
 
Zimbra mail server
Zimbra mail serverZimbra mail server
Zimbra mail server
 
result processing system
result processing system result processing system
result processing system
 
3rd generation computer
3rd generation computer3rd generation computer
3rd generation computer
 
Electrick circuit
Electrick circuitElectrick circuit
Electrick circuit
 
Mathmatics presentation1
Mathmatics presentation1Mathmatics presentation1
Mathmatics presentation1
 

Dernier

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Dernier (20)

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

Clustering

  • 2. Index: • What is Clustering • Types of Clustering • Common Distance Measure • k means Clustering • How k-mean clustering algorithm works • Simple example showing the implementation of k means algorithm • Real life numerical example for k means clustering • Weakness of k means clustering • Application of k means clustering algorithm
  • 3. Clustering Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). Clustering is the classification of objects into different groups, or more precisely, the partitions of a data set into subsets (clusters), so that the data in each subset (ideally)share some common trait – often according to some defined distance measure.
  • 5. Types of clustering 1. Hierarchical algorithms: Hierarchy algorithm is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types: 1.1 Agglomerative ("bottom-up"): Agglomerative algorithms begin with each element as a separate cluster and merge them into successively larger clusters. 1.2 Divisive ("top-down"): Divisive algorithms begin with the whole set and proceed to divide it into successively smaller clusters. 2. Partitional clustering: Partitional algorithms determine all clusters at once. They include: • K-means and derivatives • Fuzzy c-means clustering • QT clustering algorithm
  • 6. Common Distance Measures Distance measure will determine how the similarity of two elements is calculated and it will influence the shape of the clusters. They include: 1. The Euclidean distance (also called 2-norm distance) is given by: 2. The Manhattan distance (also called taxicab norm or 1-norm) is given by: 3.The maximum norm is given by: 4. The Mahalanobis distance corrects data for different scales and correlations in the variables. 5. Inner product space: The angle between two vectors can be used as a distance measure when clustering high dimensional data 6. Hamming distance (sometimes edit distance) measures the minimum number of substitutions required to change one member into another.
  • 7. K means Clustering The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, where k < n. • It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data. • It assumes that the object attributes form a vector space. • An algorithm for partitioning (or clustering) N data points into K disjoint subsets Sj containing data points so as to minimize the sum-of-squares criterion where xn is a vector representing the the nth data point and uj is the geometric centroid of the data points in Sj. • Simply speaking k-means clustering is an algorithm to classify or to group the objects based on attributes/features into K number of group. K is positive integer number. • The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid.
  • 9.  Step 1: Begin with a decision on the value of k =number of clusters .  Step 2: Put any initial partition that classifies the data into k clusters. You may assign the training samples randomly or systematically as the following:  1. Take the first k training sample as single element clusters  2. Assign each of the remaining (N-k) training sample to the cluster with the nearest centroid. After each assignment, recomputed the centroid of the gaining cluster.  Step 3: Take each sample in sequence and compute its distance from the centroid of each of the clusters. If a sample is not currently in the cluster with the closest centroid, switch this sample to that cluster and update the centroid of the cluster gaining the new sample and the cluster losing the sample.  Step 4 . Repeat step 3 until convergence is achieved, that is until a pass through the training sample causes no new assignments.
  • 10. Simple example showing the implementation of k means algorithm Step 1: Initialization: Randomly we choose following two centroids (k=2) for two clusters. In this case the 2 centroid are: m1=(1.0,1.0) and m2=(5.0,7.0).
  • 11. Step 2: Thus, we obtain two clusters containing: {1,2,3} and {4,5,6,7}. Their new centroids are:
  • 12. Step 3: • Now using these centroids of step2, we compute the Euclidean distance of each object, as shown in table. • Therefore, the new clusters are: {1,2} and {3,4,5,6,7} • Next centroids are: m1=(1.25,1.5) and m2 = (3.9,5.1)
  • 13. Step 4 : • The clusters obtained are: {1,2} and {3,4,5,6,7} • Therefore, there is no change in the cluster. • Thus, the algorithm comes to a halt here and final result consist of 2 clusters {1,2} and {3,4,5,6,7}.
  • 14. Plot For k = 2
  • 16. Plot for k = 3
  • 17. Real life numerical example for k means clustering We have 4 medicines as our training data points object and each medicine has 2 attributes. Each attribute represents coordinate of the object. We have to determine which medicines belong to cluster 1 and which medicines belong to the other cluster. Object Attribute1 (X): weight index Attribute 2 (Y): pH Medicine A 1 1 Medicine B 2 1 Medicine C 4 3 Medicine D 5 4
  • 18. Final Answer: Object Feature1(X): weight index Feature2 (Y): pH Group (result) Medicine A 1 1 1 Medicine B 2 1 1 Medicine C 4 3 2 Medicine D 5 4 2
  • 19. Weaknesses of K-Mean Clustering • When the numbers of data are not so many, initial grouping will determine the cluster significantly. • The number of cluster, K, must be determined before hand. Its disadvantage is that it does not yield the same result with each run, since the resulting clusters depend on the initial random assignments. • We never know the real cluster, using the same data, because if it is inputted in a different order it may produce different cluster if the number of data is few. • It is sensitive to initial condition. Different initial condition may produce different result of cluster. The algorithm may be trapped in the local optimum.
  • 20. Applications of K-Mean Clustering • It is relatively efficient and fast. It computes result at O(tkn), where n is number of objects or points, k is number of clusters and t is number of iterations. • k-means clustering can be applied to machine learning or data mining • Used on acoustic data in speech understanding to convert waveforms into one of k categories (known as Vector Quantization or Image Segmentation). • Also used for choosing color palettes on old fashioned graphical display devices and Image Quantization.