Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

11-2-Clustering.pptx

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 34 Publicité
Publicité

Plus De Contenu Connexe

Plus récents (20)

Publicité

11-2-Clustering.pptx

  1. 1. PARTITIONAL & HIERARCHICAL CLUSTERING KS141321 SISTEM CERDAS Materi – Minggu 11 Jurusan Sistem Informasi ITS Oleh: Irmasari Hafidz
  2. 2. OUTLINE 1. Partitional Clustering: K-Means  Pseudocode of K-Means  Example  Evaluasi Performa K-Means 2. Hierarchical Clustering Cluster Distance Measures Agglomerative Algorithm Example
  3. 3. PARTITIONAL CLUSTERING: K- MEANS
  4. 4. K-MEANS Salah satu partitional clustering yang terkenal adalah K-Means Clustering Kelebihan: komputasinya yang sederhana Kekurangan: kualitas kluster tergantung pada pemilihan centroid awal dan nilai k. Parameter K menunjukkan banyaknya cluster yang akan dibentuk Sebuah nilai k ditentukan di awal. Nilai k = banyaknya cluster • Didefinisikan centroid awal sebanyak k • Centroid awal di-inisialisasikan secara random
  5. 5. K-MEANS (PSEUDOCODE) Proses pengelompokan ke k-cluster dilakukan dalam beberapa iterasi Iterasi berhenti jika centroidnya tidak berubah lagi atau setiap data selalu berada di cluster yang sama di iterasi-iterasi berikutnya
  6. 6. K-MEANS Jika atribut ke-i numerik, maka nilai centroid ke-i merupakan mean dari nilai atribut 1≤ i ≤ n Jika atribut ke-i kategorikal, maka nilai centroid ke-i merupakan modus dari nilai atribut itu 1 ≤ i ≤ n Contoh k-Means clustering dengan k=3, dan 3 centroid: m1, m2, m3 Setiap cluster diasosiasikan dengan sebuah centroid Setiap point data dimasukkan ke cluster dengan centroid terdekat Sebuah centroid: sebuah vektor n-dimensi. (Dimana n adalah banyaknya atribut di setiap data)
  7. 7. 7 K-MEAN ALGORITHM Diinisialkan jumlah klaster sebanyak K, the K-means algorithm dilakukan dalam 5 langkah: 1. Tentukan k 2. Tentukan titik awal centroid (set seed points) sebanyak k 3. Masukkan setiap data ke cluster dengan centroid terdekat (jarak minimum) 4. Update centroid dari masing- masing klaster (centroid adalah pusat dari klaster, i.e., mean point, dari klaster) 5. Kembali ke no 1, iterasi berhenti jika sudah tidak ada
  8. 8. 8 Problem Example Suppose we have 4 types of medicines and each has two attributes (pH and weight index). Our goal is to group these objects into K=2 group of medicine. Medicine Weight pH-Index A 1 1 B 2 1 C 4 3 D 5 4 A B C D
  9. 9. 9 EXAMPLE Step 1: Use initial seed points for partitioning B c , A c 2 1   24 . 4 ) 1 4 ( ) 2 5 ( ) , ( 5 ) 1 4 ( ) 1 5 ( ) , ( 2 2 2 2 2 1           c D d c D d Assign each object to the cluster with the nearest seed point Euclidean distance
  10. 10. 10 EXAMPLE Step 2: Compute new centroids of the current partition Knowing the members of each cluster, now we compute the new centroid of each group based on these new memberships. ) 67 . 2 , 67 . 3 ( ) 3 / 8 , 3 / 11 ( 3 4 3 1 , 3 5 4 2 ) 1 , 1 ( 2 1               c c
  11. 11. 11 EXAMPLE Step 2: Renew membership based on new centroids Compute the distance of all objects to the new centroids Assign the membership to objects
  12. 12. 12 EXAMPLE Step 3: Repeat the first two steps until its convergence Knowing the members of each cluster, now we compute the new centroid of each group based on these new memberships. ) 2 1 3 , 2 1 4 ( 2 4 3 , 2 5 4 ) 1 , 2 1 1 ( 2 1 1 , 2 2 1 2 1                     c c
  13. 13. 13 EXAMPLE Step 3: Repeat the first two steps until its convergence Compute the distance of all objects to the new centroids Stop due to no new assignment
  14. 14. EVALUASI PERFORMA K- MEANS  Evaluasi performa K-Means Clustering dapat menggunakan Sum of Square Error (SSE). Ide utama dari penggunaan SSE ini adalah mengukur keseragaman antar data dalam satu klaster  Keseragaman diukur berdasarkan error/jarak antara setiap data dengan centroidnya. Semakin seragam data-data dalam sebuah cluster, semakin kecil jarak antara setiap data dengan centroidnya  Selanjutnya error disetiap cluster dijumlahkan untuk semua cluster (Sum of Square Error/SSE). Semakin kecil nilai SSE maka semakin bagus hasil clusteringnya
  15. 15. EVALUASI PERFORMA K- MEANS K = banyaknya cluster Ci = Cluster ke-i mi = centroid cluster ke-I x = data yang berada di masing-masing cluster
  16. 16. CLUSTERING DENGAN WEKA
  17. 17. HIERARCHICAL CLUSTERING
  18. 18. INTRODUCTION Hierarchical Clustering Approach  A typical clustering analysis approach via partitioning data set sequentially  Construct nested partitions layer by layer via grouping objects into a tree of clusters (without the need to know the number of clusters in advance)  Uses distance matrix as clustering criteria and a termination condition needed Agglomerative vs. Divisive  Two sequential clustering strategies for constructing a tree of clusters  Agglomerative: a bottom-up strategy  Initially each data object is in its own (atomic) cluster  Then merge these atomic clusters into larger and larger clusters  Divisive: a top-down strategy  Initially all objects are in one single cluster  Then the cluster is subdivided into smaller and smaller clusters
  19. 19. INTRODUCTION Illustrative Example Agglomerative and divisive clustering on the data set {a, b, c, d ,e }  Cluster distance  Termination condition Step 0 Step 1 Step 2 Step 3 Step 4 b d c e a a b d e c d e a b c d e Step 4 Step 3 Step 2 Step 1 Step 0 Agglomerative Divisive
  20. 20. single link (min) complete link (max) average CLUSTER DISTANCE MEASURES Single link: smallest distance between an element in one cluster and an element in the other, i.e., d(Ci, Cj) = min{d(xip, xjq)} Complete link: largest distance between an element in one cluster and an element in the other, i.e., d(Ci, Cj) = max{d(xip, xjq)} Average: avg distance between elements in one cluster and elements in the other, i.e., d(C , C ) = avg{d(x , x )}
  21. 21. AGGLOMERATIVE ALGORITHM The Agglomerative algorithm is carried out in three steps: 1) Convert object attributes to distance matrix 2) Set each object as a cluster (thus if we have N objects, we will have N clusters at the beginning) 3) Repeat until number of cluster is one (or known # of clusters)  Merge two closest clusters  Update distance matrix
  22. 22. Problem: clustering analysis with agglomerative algorithm Example and Demo data matrix distance matrix Euclidean distance
  23. 23. Merge two closest clusters (iteration 1) Example and Demo
  24. 24. Update distance matrix (iteration 1) Example and Demo
  25. 25. Merge two closest clusters (iteration 2) Example and Demo
  26. 26. Update distance matrix (iteration 2) Example and Demo
  27. 27. Merge two closest clusters/update distance matrix (iteration 3) Example and Demo
  28. 28. Merge two closest clusters/update distance matrix (iteration 4) Example and Demo
  29. 29. COMP24111 MACHINE LEARNING 29 Final result (meeting termination condition) Example and Demo
  30. 30. Dendrogram tree representation Example and Demo 1. In the beginning we have 6 clusters: A, B, C, D, E and F 2. We merge cluster D and F into cluster (D, F) at distance 0.50 3. We merge cluster A and cluster B into (A, B) at distance 0.71 4. We merge cluster E and (D, F) into ((D, F), E) at distance 1.00 5. We merge cluster ((D, F), E) and C into (((D, F), E), C) at distance 1.41 6. We merge cluster (((D, F), E), C) and (A, B) into ((((D, F), E), C), (A, B)) at distance 2.50 7. The last cluster contain all the objects, thus conclude the computation 2 3 4 5 6
  31. 31. CLUSTERING IN R library mva: - Hierarchical clustering: hclust, heatmap - k-means: kmeans library class: - Self-organizing maps: SOM library cluster: - pam and other functions
  32. 32. TUGAS T2: K-MEANS & HIERARCHICAL CLUSTERING (SECTION 4) Individu Dikerjakan di kertas folio/A4 (tulis tangan) Dikumpulkan minggu depan (Minggu 12), 21 April 2015
  33. 33. NEXT WEEK (MINGGU 12) UNSUPERVISED LEARNING: ASSOCIATION RULE BAYES THEOREM Final Project: Any topic (From Week 1-14) using R, Laporan & Demo: Minggu 15 FP: kelompok, 3-4 orang Neural Network Clustering Bayesian Association Rule
  34. 34. REFERENCES Flach, Peter. 2012. Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press. Tan et. al., ‘Introduction to Data Mining’, Addison Wesley, 2006 Ke Chen, University of Manchester, COMP24111 Machine Learning http://www.cs.man.ac.uk/~kechen/teaching.php Wikibooks, K-Means Example, http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clu stering/K-Means

×