SlideShare une entreprise Scribd logo
1  sur  31
www.edureka.in/data-science
Slide 1
Clustering
www.edureka.in/data-science
Slide 2
Clustering: Scenarios
The following scenarios implement Clustering:
 A telephone company needs to establish its network by putting its towers in a particular region
it has acquired. The location of putting these towers can be found by clustering algorithm so
that all its users receive maximum signal strength.
 Cisco wants to open its new office in California. The management wants to be cordial to its
employees and want their office in a location so that its employees’ commutation is reduced to
minimum.
 The Miami DEA wants to make its law enforcement more stringent and hence have decided to
make their patrol vans stationed across the area so that the areas of high crime rates are in
vicinity to the patrol vans.
 A Hospital Care chain wants to open a series of Emergency-Care wards, keeping in mind the
factor of maximum accident prone areas in a region.
www.edureka.in/data-science
Slide 3
What is Clustering?
Slide 3
Organizing data into clusters such that there is:
 High intra-cluster similarity
 Low inter-cluster similarity
 Informally, finding natural groupings among objects
Why Clustering?
www.edureka.in/data-science
Slide 4
Why Clustering?
Slide 4
 Organizing data into clusters shows internal structure of the data
Ex. Clusty and clustering genes
 Sometimes the partitioning is the goal
Ex. Market segmentation
 Prepare for other AI techniques
Ex. Summarize news (cluster and then find centroid)
 Discovery in data
Ex. Underlying rules, reoccurring patterns, topics, etc.
www.edureka.in/data-science
Slide 5
Clustering algorithms may be classified as:
Exclusive Clustering:
Data is grouped in an exclusive way, so that if a certain
datum belongs to a definite cluster then it could not be
included in another cluster.
E.g. K-means
Overlapping Clustering:
The overlapping clustering, uses fuzzy sets to cluster data,
so that each point may belong to two or more clusters with
different degrees of membership.
E.g. Fuzzy C-means
www.edureka.in/data-science
Slide 6
Hierarchical Clustering:
It is based on the union between the two
nearest clusters. The beginning condition is
realized by setting every datum as a cluster.
There are certain properties which one cluster
receives in hierarchy from another cluster.
Clustering algorithms may be classified as:
www.edureka.in/data-science
Slide 7
“Clustering is in the eye of the beholder."
The most appropriate clustering algorithm for a particular problem often needs to be chosen
experimentally, unless there is a mathematical reason to prefer one cluster model over
another.
It should be noted that an algorithm that is designed for one kind of model has no chance on a
data set that contains a radically different kind of model.
For example, k-means cannot find non-convex clusters.
www.edureka.in/data-science
Similarity/Dissimilarity Measurement
Slide 8
To achieve Clustering, a similarity/dissimilarity
measure must be determined so as to cluster the
data points based either on :
1. Similarity in the data or
2. Dissimilarity in the data
The measure reflects the degree of closeness or
separation of the target objects and should
correspond to the characteristics that are
believed to distinguish the clusters embedded in
the data.
Measurement
Similarity Dissimilarity
www.edureka.in/data-science
Slide 9
Similarity Measurement
Similarity measures the degree to which a pair of
objects are alike.
Concerning structural patterns represented as strings
or sequences of symbols, the concept of pattern
resemblance has typically been viewed from three
main perspectives:
 Similarity as matching, according to which
patterns are seen as different viewpoints,
possible instantiations or noisy versions of the
same object;
 Structural resemblance, based on the similarity of
their composition rules and primitives;
 Content-based similarity.
www.edureka.in/data-science
Dissimilarity Measurement: Distance Measures
Slide 10
Similarity can also be measured in
terms of the placing of data points.
By finding the distance between the
data points , the distance/difference of
the point to the cluster can be found.
Distance
Measures
Euclidean Distance Measure
Manhattan Distance Measure
Cosine Distance Measure
Tanimoto Distance Measure
Squared Euclidean Distance
Measure
www.edureka.in/data-science
Slide 11
Difference between Euclidean and Manhattan
From this image we can say that, The Euclidean distance measure gives 5.65 as the distance
between (2, 2) and (6, 6) whereas the Manhattan distance is 8.0
Slide 11
Mathematically, Euclidean distance between two
n-dimensional vectors
(a1, a2, ... , an) and (b1,b2,...,bn) is:
d = |a1 – b1| + |a2 – b2| + ... + |an – bn|
Manhattan distance between two n-
dimensional vectors
www.edureka.in/data-science
Cosine Distance Measure
The formula for the cosine distance between n-dimensional vectors
(a1, a2, ... , an) and (b1, b2, ...,bn) is
Slide 12
www.edureka.in/data-science
Slide 13
Slide 13
K-Means Clustering
www.edureka.in/data-science
Slide 14
Slide 14
K-Means Clustering
The process by which objects are classified into
a number of groups so that they are as much
dissimilar as possible from one group to another
group, but as much similar as possible within
each group.
The objects in group 1 should be as similar as
possible.
But there should be much difference between an
object in group 1 and group 2.
The attributes of the objects are allowed to
determine which objects should be grouped
together.
Total population
Group 1
Group 2 Group 3
Group 4
www.edureka.in/data-science
Slide 15
Slide 15
Current
Balance
High
High
Medium
Medium
Low
Low
Gross Monthly Income
Example Cluster 1
High Balance
Low Income
Example Cluster 2
High Income
Low Balance
 Cluster 1 and Cluster 2 are being differentiated by Income and Current Balance.
 The objects in Cluster 1 have similar characteristics (High Income and Low balance), on the other hand
the objects in Cluster 2 have the same characteristic (High Balance and Low Income).
 But there are much differences between an object in Cluster 1 and an object in Cluster 2.
Basic concepts of Cluster Analysis using two variables
K-Means Clustering
www.edureka.in/data-science
Slide 16
Process Flow of K-means
Iterate until stable (cluster centers converge):
1. Determine the centroid coordinate.
2. Determine the distance of each object to the
centroids.
3. Group the object based on minimum
distance (find the closest centroid)
Start
Number of
Cluster K
Centroid
Distance objects to
centroids
Grouping based on
minimum distance
End
No object
move
group?
+
www.edureka.in/data-science
Slide 17
K-Means Clustering Use-Case:
Problem Statement:
The newly appointed Governor has finally decided to do something for the society and wants to open
a chain of schools across a particular region, keeping in mind the distance travelled by children is
minimum, so that the percentage turnout is more.
Poor fella cannot decide himself and has asked its Data Science team to come up with the solution.
Bet, these guys have the solution to almost everything!!
www.edureka.in/data-science
Slide 18
Slide 18
K-Means Clustering Steps
1. If k=4, we select 4 random points in
the 2d space and assume them to be
cluster centers for the clusters to be
created.
www.edureka.in/data-science
Slide 19
Slide 19
2. We take up a random data point from
the space and find out its distance from
all the 4 clusters centers.
If the data point is closest to the pink
cluster center, it is colored pink.
K-Means Clustering Steps
www.edureka.in/data-science
Slide 20
Slide 20
3. Now we calculate the centroid of all
the pink points and assign that point
as the cluster center for that cluster.
Similarly, we calculate centroids for all
the 4 colored(clustered) points and
assign the new centroids as the
cluster centers.
K-Means Clustering Steps
www.edureka.in/data-science
Slide 21
Slide 21
4. Step-2 and step-3 are run iteratively, unless the cluster centers converge at a point and no
longer move.
Iteration-1 Iteration-2
K-Means Clustering Steps
www.edureka.in/data-science
Slide 22
Iteration-3 Iteration-4
5. We can see that the cluster centers are still not converged so we go ahead and iterate it more.
K-Means Clustering Steps
www.edureka.in/data-science
Slide 23
Finally, after multiple iterations, we reach a
stage where the cluster centers coverge and
the clusters look like as:
Here we have performed:
Iterations: 5
K-Means Clustering Steps
Slide 24 www.edureka.in/data-science
Q1. In cluster analysis objects are classified into a number of groups so that
1. They are as much dissimilar as possible from one group to another group,
but as much similar as possible within each group.
2. They are as much similar as possible from one group to another group, but
as much dissimilar as possible within each group.
Annie’s Question
Slide 25 www.edureka.in/data-science
Correct Answer.
Option 1: They are as much dissimilar as possible from one group to another
group, but as much similar as possible within each group.
Annie’s Answer
Slide 26 www.edureka.in/data-science
K-Means Mathematical Formulation
Distortion = =
(within cluster sum of squares)



m
i
i
i c
x
1
2
)
(  
 

k
j OwnedBy
i
j
i
j
x
1 )
(
2
)
(


Owned By(.): set of records that belong to the specified cluster center
D={x1,x2,…,xi,…,xm}  data set of m records
xi=(xi1,xi2,…,xin)  each record is an n-dimensional vector
ci =
cluster(xi)=
Slide 27 www.edureka.in/data-science
Goal: Find cluster centers that minimize Distortion
Solution can be found by setting the partial derivative of Distortion w.r.t. each cluster center to zero








)
(
2
)
(
Distortion
j
OwnedBy
i
j
i
j
j
x









)
(
)
(
2
j
OwnedBy
i
j
i
x

 minimum)
(for
0





)
(
|
)
(
|
1
j
OwnedBy
i
i
j
j x
OwnedBy 


K-Means Mathematical Formulation
Slide 28 www.edureka.in/data-science
Will we find the Optimal Solution?
Not necessarily!
Try to come up with a converged solution, but does not have minimum distortion:
We might get stuck in local minimum, and not a global minimum
Slide 29 www.edureka.in/data-science
 Choose first center at random
 Choose second center that is far away from the first center
 … Choose jth center as far away as possible from the closest of centers 1 through
(j-1)
Idea 1: careful about where we start
Idea 2: Do many runs of K-means, each with different random starting point
How to find Optimal Solution?
Slide 30 www.edureka.in/data-science
Choosing the Number of Clusters
Elbow method
Objective
Function
Value
i.e.,
Distortion
K means Clustering

Contenu connexe

Tendances

K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...Edureka!
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data MiningValerii Klymchuk
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERINGsingh7599
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsKush Kulshrestha
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithmhadifar
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basicHouw Liong The
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysisguru_prasadg
 
Customer Segmentation using Clustering
Customer Segmentation using ClusteringCustomer Segmentation using Clustering
Customer Segmentation using ClusteringDessy Amirudin
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Salah Amean
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 

Tendances (20)

K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
K means clustering
K means clusteringK means clustering
K means clustering
 
KNN
KNN KNN
KNN
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Data clustering
Data clustering Data clustering
Data clustering
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Kmeans
KmeansKmeans
Kmeans
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Customer Segmentation using Clustering
Customer Segmentation using ClusteringCustomer Segmentation using Clustering
Customer Segmentation using Clustering
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
 
Euclidean Distance And Manhattan Distance
Euclidean Distance And Manhattan DistanceEuclidean Distance And Manhattan Distance
Euclidean Distance And Manhattan Distance
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 

En vedette

Association Analysis
Association AnalysisAssociation Analysis
Association Analysisguest0edcaf
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationAdnan Masood
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysisDataminingTools Inc
 
phase rule & phase diagram
phase rule & phase diagramphase rule & phase diagram
phase rule & phase diagramYog's Malani
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionAdnan Masood
 
The phase rule
The phase ruleThe phase rule
The phase ruleJatin Garg
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesGilad Barkan
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsPrashanth Guntal
 
Coacervation Phase Separation Techniques
Coacervation Phase Separation TechniquesCoacervation Phase Separation Techniques
Coacervation Phase Separation TechniquesGargi Nanda
 
Clustering training
Clustering trainingClustering training
Clustering trainingGabor Veress
 
Phase Diagrams and Phase Rule
Phase Diagrams and Phase RulePhase Diagrams and Phase Rule
Phase Diagrams and Phase RuleRuchi Pandey
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithmDarshak Mehta
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 

En vedette (18)

Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
phase rule & phase diagram
phase rule & phase diagramphase rule & phase diagram
phase rule & phase diagram
 
Phase rule
Phase rulePhase rule
Phase rule
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
The phase rule
The phase ruleThe phase rule
The phase rule
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummies
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Coacervation Phase Separation Techniques
Coacervation Phase Separation TechniquesCoacervation Phase Separation Techniques
Coacervation Phase Separation Techniques
 
Clustering training
Clustering trainingClustering training
Clustering training
 
Phase Diagrams and Phase Rule
Phase Diagrams and Phase RulePhase Diagrams and Phase Rule
Phase Diagrams and Phase Rule
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 

Similaire à K means Clustering

International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingIOSR Journals
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptionsrefedey275
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications
 
Predicting Students Performance using K-Median Clustering
Predicting Students Performance using  K-Median ClusteringPredicting Students Performance using  K-Median Clustering
Predicting Students Performance using K-Median ClusteringIIRindia
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataKathleneNgo
 
Dynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c meansDynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c meansWrishin Bhattacharya
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
 
For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxSureshPolisetty2
 
Exploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationExploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationChristopher Peter Makris
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringIJERD Editor
 

Similaire à K means Clustering (20)

International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in Datamining
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
Predicting Students Performance using K-Median Clustering
Predicting Students Performance using  K-Median ClusteringPredicting Students Performance using  K-Median Clustering
Predicting Students Performance using K-Median Clustering
 
Visualization of Crisp and Rough Clustering using MATLAB
Visualization of Crisp and Rough Clustering using MATLABVisualization of Crisp and Rough Clustering using MATLAB
Visualization of Crisp and Rough Clustering using MATLAB
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports Data
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
47 292-298
47 292-29847 292-298
47 292-298
 
Dynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c meansDynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c means
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptx
 
Exploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationExploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image Segmentation
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 

Plus de Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

Plus de Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Dernier

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 

Dernier (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

K means Clustering

  • 2. www.edureka.in/data-science Slide 2 Clustering: Scenarios The following scenarios implement Clustering:  A telephone company needs to establish its network by putting its towers in a particular region it has acquired. The location of putting these towers can be found by clustering algorithm so that all its users receive maximum signal strength.  Cisco wants to open its new office in California. The management wants to be cordial to its employees and want their office in a location so that its employees’ commutation is reduced to minimum.  The Miami DEA wants to make its law enforcement more stringent and hence have decided to make their patrol vans stationed across the area so that the areas of high crime rates are in vicinity to the patrol vans.  A Hospital Care chain wants to open a series of Emergency-Care wards, keeping in mind the factor of maximum accident prone areas in a region.
  • 3. www.edureka.in/data-science Slide 3 What is Clustering? Slide 3 Organizing data into clusters such that there is:  High intra-cluster similarity  Low inter-cluster similarity  Informally, finding natural groupings among objects Why Clustering?
  • 4. www.edureka.in/data-science Slide 4 Why Clustering? Slide 4  Organizing data into clusters shows internal structure of the data Ex. Clusty and clustering genes  Sometimes the partitioning is the goal Ex. Market segmentation  Prepare for other AI techniques Ex. Summarize news (cluster and then find centroid)  Discovery in data Ex. Underlying rules, reoccurring patterns, topics, etc.
  • 5. www.edureka.in/data-science Slide 5 Clustering algorithms may be classified as: Exclusive Clustering: Data is grouped in an exclusive way, so that if a certain datum belongs to a definite cluster then it could not be included in another cluster. E.g. K-means Overlapping Clustering: The overlapping clustering, uses fuzzy sets to cluster data, so that each point may belong to two or more clusters with different degrees of membership. E.g. Fuzzy C-means
  • 6. www.edureka.in/data-science Slide 6 Hierarchical Clustering: It is based on the union between the two nearest clusters. The beginning condition is realized by setting every datum as a cluster. There are certain properties which one cluster receives in hierarchy from another cluster. Clustering algorithms may be classified as:
  • 7. www.edureka.in/data-science Slide 7 “Clustering is in the eye of the beholder." The most appropriate clustering algorithm for a particular problem often needs to be chosen experimentally, unless there is a mathematical reason to prefer one cluster model over another. It should be noted that an algorithm that is designed for one kind of model has no chance on a data set that contains a radically different kind of model. For example, k-means cannot find non-convex clusters.
  • 8. www.edureka.in/data-science Similarity/Dissimilarity Measurement Slide 8 To achieve Clustering, a similarity/dissimilarity measure must be determined so as to cluster the data points based either on : 1. Similarity in the data or 2. Dissimilarity in the data The measure reflects the degree of closeness or separation of the target objects and should correspond to the characteristics that are believed to distinguish the clusters embedded in the data. Measurement Similarity Dissimilarity
  • 9. www.edureka.in/data-science Slide 9 Similarity Measurement Similarity measures the degree to which a pair of objects are alike. Concerning structural patterns represented as strings or sequences of symbols, the concept of pattern resemblance has typically been viewed from three main perspectives:  Similarity as matching, according to which patterns are seen as different viewpoints, possible instantiations or noisy versions of the same object;  Structural resemblance, based on the similarity of their composition rules and primitives;  Content-based similarity.
  • 10. www.edureka.in/data-science Dissimilarity Measurement: Distance Measures Slide 10 Similarity can also be measured in terms of the placing of data points. By finding the distance between the data points , the distance/difference of the point to the cluster can be found. Distance Measures Euclidean Distance Measure Manhattan Distance Measure Cosine Distance Measure Tanimoto Distance Measure Squared Euclidean Distance Measure
  • 11. www.edureka.in/data-science Slide 11 Difference between Euclidean and Manhattan From this image we can say that, The Euclidean distance measure gives 5.65 as the distance between (2, 2) and (6, 6) whereas the Manhattan distance is 8.0 Slide 11 Mathematically, Euclidean distance between two n-dimensional vectors (a1, a2, ... , an) and (b1,b2,...,bn) is: d = |a1 – b1| + |a2 – b2| + ... + |an – bn| Manhattan distance between two n- dimensional vectors
  • 12. www.edureka.in/data-science Cosine Distance Measure The formula for the cosine distance between n-dimensional vectors (a1, a2, ... , an) and (b1, b2, ...,bn) is Slide 12
  • 14. www.edureka.in/data-science Slide 14 Slide 14 K-Means Clustering The process by which objects are classified into a number of groups so that they are as much dissimilar as possible from one group to another group, but as much similar as possible within each group. The objects in group 1 should be as similar as possible. But there should be much difference between an object in group 1 and group 2. The attributes of the objects are allowed to determine which objects should be grouped together. Total population Group 1 Group 2 Group 3 Group 4
  • 15. www.edureka.in/data-science Slide 15 Slide 15 Current Balance High High Medium Medium Low Low Gross Monthly Income Example Cluster 1 High Balance Low Income Example Cluster 2 High Income Low Balance  Cluster 1 and Cluster 2 are being differentiated by Income and Current Balance.  The objects in Cluster 1 have similar characteristics (High Income and Low balance), on the other hand the objects in Cluster 2 have the same characteristic (High Balance and Low Income).  But there are much differences between an object in Cluster 1 and an object in Cluster 2. Basic concepts of Cluster Analysis using two variables K-Means Clustering
  • 16. www.edureka.in/data-science Slide 16 Process Flow of K-means Iterate until stable (cluster centers converge): 1. Determine the centroid coordinate. 2. Determine the distance of each object to the centroids. 3. Group the object based on minimum distance (find the closest centroid) Start Number of Cluster K Centroid Distance objects to centroids Grouping based on minimum distance End No object move group? +
  • 17. www.edureka.in/data-science Slide 17 K-Means Clustering Use-Case: Problem Statement: The newly appointed Governor has finally decided to do something for the society and wants to open a chain of schools across a particular region, keeping in mind the distance travelled by children is minimum, so that the percentage turnout is more. Poor fella cannot decide himself and has asked its Data Science team to come up with the solution. Bet, these guys have the solution to almost everything!!
  • 18. www.edureka.in/data-science Slide 18 Slide 18 K-Means Clustering Steps 1. If k=4, we select 4 random points in the 2d space and assume them to be cluster centers for the clusters to be created.
  • 19. www.edureka.in/data-science Slide 19 Slide 19 2. We take up a random data point from the space and find out its distance from all the 4 clusters centers. If the data point is closest to the pink cluster center, it is colored pink. K-Means Clustering Steps
  • 20. www.edureka.in/data-science Slide 20 Slide 20 3. Now we calculate the centroid of all the pink points and assign that point as the cluster center for that cluster. Similarly, we calculate centroids for all the 4 colored(clustered) points and assign the new centroids as the cluster centers. K-Means Clustering Steps
  • 21. www.edureka.in/data-science Slide 21 Slide 21 4. Step-2 and step-3 are run iteratively, unless the cluster centers converge at a point and no longer move. Iteration-1 Iteration-2 K-Means Clustering Steps
  • 22. www.edureka.in/data-science Slide 22 Iteration-3 Iteration-4 5. We can see that the cluster centers are still not converged so we go ahead and iterate it more. K-Means Clustering Steps
  • 23. www.edureka.in/data-science Slide 23 Finally, after multiple iterations, we reach a stage where the cluster centers coverge and the clusters look like as: Here we have performed: Iterations: 5 K-Means Clustering Steps
  • 24. Slide 24 www.edureka.in/data-science Q1. In cluster analysis objects are classified into a number of groups so that 1. They are as much dissimilar as possible from one group to another group, but as much similar as possible within each group. 2. They are as much similar as possible from one group to another group, but as much dissimilar as possible within each group. Annie’s Question
  • 25. Slide 25 www.edureka.in/data-science Correct Answer. Option 1: They are as much dissimilar as possible from one group to another group, but as much similar as possible within each group. Annie’s Answer
  • 26. Slide 26 www.edureka.in/data-science K-Means Mathematical Formulation Distortion = = (within cluster sum of squares)    m i i i c x 1 2 ) (      k j OwnedBy i j i j x 1 ) ( 2 ) (   Owned By(.): set of records that belong to the specified cluster center D={x1,x2,…,xi,…,xm}  data set of m records xi=(xi1,xi2,…,xin)  each record is an n-dimensional vector ci = cluster(xi)=
  • 27. Slide 27 www.edureka.in/data-science Goal: Find cluster centers that minimize Distortion Solution can be found by setting the partial derivative of Distortion w.r.t. each cluster center to zero         ) ( 2 ) ( Distortion j OwnedBy i j i j j x          ) ( ) ( 2 j OwnedBy i j i x   minimum) (for 0      ) ( | ) ( | 1 j OwnedBy i i j j x OwnedBy    K-Means Mathematical Formulation
  • 28. Slide 28 www.edureka.in/data-science Will we find the Optimal Solution? Not necessarily! Try to come up with a converged solution, but does not have minimum distortion: We might get stuck in local minimum, and not a global minimum
  • 29. Slide 29 www.edureka.in/data-science  Choose first center at random  Choose second center that is far away from the first center  … Choose jth center as far away as possible from the closest of centers 1 through (j-1) Idea 1: careful about where we start Idea 2: Do many runs of K-means, each with different random starting point How to find Optimal Solution?
  • 30. Slide 30 www.edureka.in/data-science Choosing the Number of Clusters Elbow method Objective Function Value i.e., Distortion