SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
K-Means Clustering Problem
            Ahmad Sabiq
          Febri Maspiyanti
       Indah Kuntum Khairina
          Wiwin Farhania
              Yonatan
What is k-means?
• To partition n objects into k clusters, based on
  attributes.
  – Objects of the same cluster are close their
    attributes are related to each other.
  – Objects of different clusters are far apart their
    attributes are very dissimilar.
Algorithm
• Input: n objects, k (integer k ≤ n)
• Output: k clusters
• Steps:
   1. Select k initial centroids.
   2. Calculate the distance between each object and
      each centroid.
   3. Assign each object to the cluster with the nearest
      centroid.
   4. Recalculate each centroid.
   5. If the centroids don’t change, stop (convergence).
      Otherwise, back to step 2.
• Complexity: O(k.n.d.total_iteration)
Initialization
• Why is it important? What does it affect?
  – Clustering result local optimum!
  – Total iteration / complexity
Good Initialization
3 clusters with 2 iterations…
Bad Initialization
3 clusters with 4 iterations…
Initialization Methods
1.   Random
2.   Forgy
3.   Macqueen
4.   Kaufman
Random
• Algorithm:
  1. Assigns each object to a random cluster.
  2. Computes the initial centroid of each cluster.
Random
Random
Random
9
8
7
6
5
4
3
2
1
0
    0   5   10    15   20   25   30   35
Forgy
• Algorithm:
  1. Chooses k objects at random and uses them as the initial
     centroids.
Forgy
9
8
7
6
5
4
3
2
1
0
    0   5   10   15   20   25   30   35
MacQueen
• Algorithm:
  1. Chooses k objects at random and uses them as the initial
     centroids.
  2. Assign each object to the cluster with the nearest
     centroid.
  3. After each assignment, recalculate the centroid.
MacQueen
9
8
7
6
5
4
3
2
1
0
    0   5   10     15   20   25   30   35
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
MacQueen
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
Kaufman
                        C=0




d = 24,33

            D = 15,52
Kaufman
          C=0


          C=0   C=0

          C=0




          C=0
Kaufman
                       C=0


                       C=0   C=0

                       C=0



∑C1 = 2,74
                       C=0
Kaufman
                                       ∑C5 = 52,55

                                       ∑C6 = 55,88   ∑C9 = 42,69

                                  ∑C7 = 53,77




∑C1 = 2,74                           ∑C8 = 51,16

         ∑C2 = 12,,21


         ∑C3 = 12,36



        ∑C3 = 8,38
Kaufman
                                       ∑C5 = 52,55

                                       ∑C6 = 55,88   ∑C9 = 42,69

                                  ∑C7 = 53,77




∑C1 = 2,74                           ∑C8 = 51,16

         ∑C2 = 12,,21


         ∑C3 = 12,36



        ∑C3 = 8,38
Reference
1. J.M. Peña, J.A. Lozano, and P. Larrañaga. An Empirical
   Comparison of Four Initialization Methods for the K-
   Means Algorithm. Pattern Recognition Letters, vol. 20,
   pp. 1027–1040. 1999.
2. J.R. Cano, O. Cordón, F. Herrera, and L. Sánchez. A
   Greedy Randomized Adaptive Search Procedure
   Applied to the Clustering Problem as an Initialization
   Process Using K-Means as a Local Search Procedure.
   Journal of Intelligent and Fuzzy Systems, vol. 12, pp.
   235 – 242. 2002.
3. L. Kaufman and P.J. Rousseeuw. Finding Groups in
   Data: An Introduction to Cluster Analysis. Wiley. 1990.
Questions
1. Kenapa inisialisasi penting pada k-means?
2. Metode inisialisasi apa yang memiliki greedy
   choice property?
3. Jelaskan kompleksitas O(nkd) pada metode
   Random.

Contenu connexe

Tendances

Quick sort algorithn
Quick sort algorithnQuick sort algorithn
Quick sort algorithn
Kumar
 

Tendances (20)

Clustering
ClusteringClustering
Clustering
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 
Quick sort
Quick sortQuick sort
Quick sort
 
Quick sort algorithn
Quick sort algorithnQuick sort algorithn
Quick sort algorithn
 
Convex Hull Algorithm Analysis
Convex Hull Algorithm AnalysisConvex Hull Algorithm Analysis
Convex Hull Algorithm Analysis
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
Prim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning treePrim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning tree
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
3.8 quicksort
3.8 quicksort3.8 quicksort
3.8 quicksort
 
K MEANS CLUSTERING.pptx
K MEANS CLUSTERING.pptxK MEANS CLUSTERING.pptx
K MEANS CLUSTERING.pptx
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
Merge sort
Merge sortMerge sort
Merge sort
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
1.5 binary search tree
1.5 binary search tree1.5 binary search tree
1.5 binary search tree
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Chapter 06 Data Mining Techniques
Chapter 06 Data Mining TechniquesChapter 06 Data Mining Techniques
Chapter 06 Data Mining Techniques
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 

En vedette

Маркетинг финансовых услуг - выступление для студентов
Маркетинг финансовых услуг - выступление для студентовМаркетинг финансовых услуг - выступление для студентов
Маркетинг финансовых услуг - выступление для студентов
Cyril Savitsky
 
Experimental design
Experimental designExperimental design
Experimental design
Dan Toma
 
سبيلك الى الثروة و النجاح
سبيلك الى الثروة و النجاحسبيلك الى الثروة و النجاح
سبيلك الى الثروة و النجاح
Morad Kheloufi Kheloufi
 
#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока
#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока
#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока
School of Efficient Language Studying Lingvocat.com/ Школа результативных языков Lingvocat.com
 
Trulia Metro Movers Report - Winter 2012
Trulia Metro Movers Report - Winter 2012Trulia Metro Movers Report - Winter 2012
Trulia Metro Movers Report - Winter 2012
Trulia
 

En vedette (20)

Kmeans plusplus
Kmeans plusplusKmeans plusplus
Kmeans plusplus
 
Clustering, k means algorithm
Clustering, k means algorithmClustering, k means algorithm
Clustering, k means algorithm
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
PRML 9.1-9.2: K-means Clustering & Mixtures of Gaussians
PRML 9.1-9.2: K-means Clustering & Mixtures of GaussiansPRML 9.1-9.2: K-means Clustering & Mixtures of Gaussians
PRML 9.1-9.2: K-means Clustering & Mixtures of Gaussians
 
Kmeans
KmeansKmeans
Kmeans
 
The Public Opinion Landscape: Election 2016
The Public Opinion Landscape: Election 2016The Public Opinion Landscape: Election 2016
The Public Opinion Landscape: Election 2016
 
Comprension de lectura de los mexicanos
Comprension de lectura de los mexicanosComprension de lectura de los mexicanos
Comprension de lectura de los mexicanos
 
广东证券见记者发表
广东证券见记者发表广东证券见记者发表
广东证券见记者发表
 
 
Zaragoza turismo 243
Zaragoza turismo 243Zaragoza turismo 243
Zaragoza turismo 243
 
Маркетинг финансовых услуг - выступление для студентов
Маркетинг финансовых услуг - выступление для студентовМаркетинг финансовых услуг - выступление для студентов
Маркетинг финансовых услуг - выступление для студентов
 
Experimental design
Experimental designExperimental design
Experimental design
 
سبيلك الى الثروة و النجاح
سبيلك الى الثروة و النجاحسبيلك الى الثروة و النجاح
سبيلك الى الثروة و النجاح
 
Mumbai - Zappos - Downtown Project - Dec 10, 2015
Mumbai - Zappos - Downtown Project - Dec 10, 2015Mumbai - Zappos - Downtown Project - Dec 10, 2015
Mumbai - Zappos - Downtown Project - Dec 10, 2015
 
#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока
#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока
#СтанемБлиже: спецкурс по межкультурной коммуникации с туристами с Востока
 
Who Needs Love! In Japan, Many Couples Don't- by Nicholas D. Kristof
Who Needs Love! In Japan, Many Couples Don't- by Nicholas D. KristofWho Needs Love! In Japan, Many Couples Don't- by Nicholas D. Kristof
Who Needs Love! In Japan, Many Couples Don't- by Nicholas D. Kristof
 
Kmeans
KmeansKmeans
Kmeans
 
Trulia Metro Movers Report - Winter 2012
Trulia Metro Movers Report - Winter 2012Trulia Metro Movers Report - Winter 2012
Trulia Metro Movers Report - Winter 2012
 
Historia insp aurora silva
Historia insp   aurora silvaHistoria insp   aurora silva
Historia insp aurora silva
 
Application of Number
Application of NumberApplication of Number
Application of Number
 

Similaire à Kmeans initialization

Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
Junghoon Kim
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
Gianmario Spacagna
 
Clustering_Overview.pptx
Clustering_Overview.pptxClustering_Overview.pptx
Clustering_Overview.pptx
nyomans1
 

Similaire à Kmeans initialization (20)

Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Clustering Theory
Clustering TheoryClustering Theory
Clustering Theory
 
K means-1
K means-1K means-1
K means-1
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering
 
DMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringDMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based Clustering
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
 
Data Mining Lecture_7.pptx
Data Mining Lecture_7.pptxData Mining Lecture_7.pptx
Data Mining Lecture_7.pptx
 
K means clustering
K means clusteringK means clustering
K means clustering
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithms
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
Clustering
ClusteringClustering
Clustering
 
Bioalgo 2012-03-randomized
Bioalgo 2012-03-randomizedBioalgo 2012-03-randomized
Bioalgo 2012-03-randomized
 
Ch12 randalgs
Ch12 randalgsCh12 randalgs
Ch12 randalgs
 
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner softwareData Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
 
Clustering_Overview.pptx
Clustering_Overview.pptxClustering_Overview.pptx
Clustering_Overview.pptx
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Kmeans initialization

  • 1. K-Means Clustering Problem Ahmad Sabiq Febri Maspiyanti Indah Kuntum Khairina Wiwin Farhania Yonatan
  • 2. What is k-means? • To partition n objects into k clusters, based on attributes. – Objects of the same cluster are close their attributes are related to each other. – Objects of different clusters are far apart their attributes are very dissimilar.
  • 3. Algorithm • Input: n objects, k (integer k ≤ n) • Output: k clusters • Steps: 1. Select k initial centroids. 2. Calculate the distance between each object and each centroid. 3. Assign each object to the cluster with the nearest centroid. 4. Recalculate each centroid. 5. If the centroids don’t change, stop (convergence). Otherwise, back to step 2. • Complexity: O(k.n.d.total_iteration)
  • 4. Initialization • Why is it important? What does it affect? – Clustering result local optimum! – Total iteration / complexity
  • 5. Good Initialization 3 clusters with 2 iterations…
  • 6. Bad Initialization 3 clusters with 4 iterations…
  • 7. Initialization Methods 1. Random 2. Forgy 3. Macqueen 4. Kaufman
  • 8. Random • Algorithm: 1. Assigns each object to a random cluster. 2. Computes the initial centroid of each cluster.
  • 11. Random 9 8 7 6 5 4 3 2 1 0 0 5 10 15 20 25 30 35
  • 12. Forgy • Algorithm: 1. Chooses k objects at random and uses them as the initial centroids.
  • 13. Forgy 9 8 7 6 5 4 3 2 1 0 0 5 10 15 20 25 30 35
  • 14. MacQueen • Algorithm: 1. Chooses k objects at random and uses them as the initial centroids. 2. Assign each object to the cluster with the nearest centroid. 3. After each assignment, recalculate the centroid.
  • 15. MacQueen 9 8 7 6 5 4 3 2 1 0 0 5 10 15 20 25 30 35
  • 33. Kaufman C=0 d = 24,33 D = 15,52
  • 34. Kaufman C=0 C=0 C=0 C=0 C=0
  • 35. Kaufman C=0 C=0 C=0 C=0 ∑C1 = 2,74 C=0
  • 36. Kaufman ∑C5 = 52,55 ∑C6 = 55,88 ∑C9 = 42,69 ∑C7 = 53,77 ∑C1 = 2,74 ∑C8 = 51,16 ∑C2 = 12,,21 ∑C3 = 12,36 ∑C3 = 8,38
  • 37. Kaufman ∑C5 = 52,55 ∑C6 = 55,88 ∑C9 = 42,69 ∑C7 = 53,77 ∑C1 = 2,74 ∑C8 = 51,16 ∑C2 = 12,,21 ∑C3 = 12,36 ∑C3 = 8,38
  • 38. Reference 1. J.M. Peña, J.A. Lozano, and P. Larrañaga. An Empirical Comparison of Four Initialization Methods for the K- Means Algorithm. Pattern Recognition Letters, vol. 20, pp. 1027–1040. 1999. 2. J.R. Cano, O. Cordón, F. Herrera, and L. Sánchez. A Greedy Randomized Adaptive Search Procedure Applied to the Clustering Problem as an Initialization Process Using K-Means as a Local Search Procedure. Journal of Intelligent and Fuzzy Systems, vol. 12, pp. 235 – 242. 2002. 3. L. Kaufman and P.J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley. 1990.
  • 39. Questions 1. Kenapa inisialisasi penting pada k-means? 2. Metode inisialisasi apa yang memiliki greedy choice property? 3. Jelaskan kompleksitas O(nkd) pada metode Random.