SlideShare une entreprise Scribd logo
1  sur  19
PPrreesseenntteedd BByy,, 
AAsshhwwiinn SShheennooyy MM 
44CCBB1133SSCCSS0022 
1
 Partitioning methods 
 HHiieerraarrcchhiiccaall mmeetthhooddss 
 DDeennssiittyy--bbaasseedd mmeetthhooddss 
 GGrriidd--bbaasseedd mmeetthhooddss 
 MMooddeell--bbaasseedd mmeetthhooddss 
(conceptual clustering, 
neural networks) 
2
 AA ppaarrttiittiioonniinngg mmeetthhoodd:: construct a partition of 
a database D of nn objects into a set of k clusters 
such that 
o each cluster contains at least one object 
o each object belongs to exactly one cluster 
3
mmeetthhooddss: 
o kk--mmeeaannss : Each cluster is 
represented by the center of the 
cluster (cceennttrrooiidd). 
o kk--mmeeddooiiddss : Each cluster is 
represented by one of the objects in 
the cluster (mmeeddooiidd). 
4
 IInnppuutt ttoo tthhee aallggoorriitthhmm: the number of clusters k, 
and a database of n objects 
 AAllggoorriitthhmm ccoonnssiissttss ooff ffoouurr sstteeppss: 
1. partition object into k nonempty 
subsets/clusters 
2. compute a seed points as the cceennttrrooiidd (the 
mean of the objects in the cluster) for each 
cluster in the current partition 
3. assign each object to the cluster with the 
nearest centroid 
4. go back to Step 2, stop when there are no 
more new assignments 
5
Alternative algorithm aallssoo ccoonnssiissttss ooff ffoouurr sstteeppss: 
1. arbitrarily choose k objects as the initial 
cluster centers (centroids) 
2. (re)assign each object to the cluster with the 
nearest centroid 
3. update the centroids 
4. go back to Step 2, stop when there are no 
more new assignments 
6
7 
10 
9 
8 
7 
6 
5 
4 
3 
2 
1 
0 
10 
9 
8 
7 
6 
5 
4 
3 
2 
1 
0 1 2 3 4 5 6 7 8 9 10 0 
0 1 2 3 4 5 6 7 8 9 10 
10 
9 
8 
7 
6 
5 
4 
3 
2 
1 
0 
0 1 2 3 4 5 6 7 8 9 10 
10 
9 
8 
7 
6 
5 
4 
3 
2 
1 
0 
0 1 2 3 4 5 6 7 8 9 10
 IInnppuutt ttoo tthhee aallggoorriitthhmm: the number of clusters k, 
and a database of n objects 
 AAllggoorriitthhmm ccoonnssiissttss ooff ffoouurr sstteeppss: 
1. arbitrarily choose k objects as the initial 
mmeeddooiiddss (representative objects) 
2. assign each remaining object to the cluster 
with the nearest medoid 
3. select a nonmedoid and replace one of the 
medoids with it if this improves the clustering 
4. go back to Step 2, stop when there are no 
more new assignments 
8
 AA hhiieerraarrcchhiiccaall mmeetthhoodd:: construct a hierarchy of 
clustering, not just a single partition of objects. 
 The number of clusters kk is not required as an 
input . 
 Use a ddiissttaannccee mmaattrriixx as clustering criteria 
 A tteerrmmiinnaattiioonn ccoonnddiittiioonn can be used (e.g., a 
number of clusters) 
9
11..aagggglloommeerraattiivvee (bottom-up): 
oplace each object in its own cluster. 
omerge in each step the two most similar clusters until 
there is only one cluster left or the termination 
condition is satisfied. 
22..ddiivviissiivvee (top-down): 
ostart with one big cluster containing all the objects 
odivide the most distinctive cluster into smaller 
clusters and proceed until there are n clusters or the 
termination condition is satisfied. 
10
11 
Step 0 Step 1 Step 2 Step 3 Step 4 
a a b 
b 
c 
d 
e 
d e 
a b c d e 
c d e 
Step 4 Step 3 Step 2 Step 1 Step 0 
aagggglloommeerraattiivvee 
ddiivviissiivvee
 Birch: Balanced Iterative Reducing and Clustering using 
Hierarchies. 
 Incrementally construct a CF (Clustering Feature) tree, a 
hierarchical data structure for multiphase clustering 
Phase 1: scan DB to build an initial in-memory CF tree 
(a multi-level compression of the data that tries to 
preserve the inherent clustering structure of the data). 
Phase 2: use an arbitrary clustering algorithm to 
cluster the leaf nodes of the CF-tree. 
12
 Clustering feature: 
• summary of the statistics for a given subcluster: the 0-th, 1st and 
2nd moments of the subcluster from the statistical point of view. 
• registers crucial measurements for computing cluster and utilizes 
storage efficiently 
A CF tree is a height-balanced tree that stores the clustering features 
for a hierarchical clustering 
• A nonleaf node in a tree has descendants or “children” 
• The nonleaf nodes store sums of the CFs of their children 
 A CF tree has two parameters 
• Branching factor: specify the maximum number of children. 
• threshold: max diameter of sub-clusters stored at the leaf nodes 
13
14 
CF1 
child1 
Root 
CF3 
child3 
CF2 
child2 
CF6 
child6 
CF1 
child1 
Non-leaf node 
CF3 
child3 
CF2 
child2 
CF5 
child5 
B = 7 
L = 6 
Leaf node Leaf node 
CF1 CF2 CF6 prev next CF1 CF2 CF4 prev next
 ROCK: RObust Clustering using linKs 
 Major ideas 
• Use links to measure similarity/proximity. 
15
 Links: # of common neighbors 
• C1 <a, b, c, d, e>: {a, b, c}, {a, b, d}, {a, b, e}, {a, c, d}, {a, c, e}, {a, 
d, e}, {b, c, d}, {b, c, e}, {b, d, e}, {c, d, e} 
• C2 <a, b, f, g>: {a, b, f}, {a, b, g}, {a, f, g}, {b, f, g} 
 Let T1 = {a, b, c}, T2 = {c, d, e}, T3 = {a, b, f} 
• link(T1, T2) = 4, since they have 4 common neighbors 
 {a, c, d}, {a, c, e}, {b, c, d}, {b, c, e} 
• link(T1, T3) = 3, since they have 3 common neighbors 
 {a, b, d}, {a, b, e}, {a, b, g} 
 Thus link is a better measure than Jaccard coefficient 
16
 Measures the similarity based on a dynamic model 
• Two clusters are merged only if the interconnectivity and 
closeness (proximity) between two clusters are high relative to the 
internal interconnectivity of the clusters and closeness of items 
within the clusters 
 A two-phase algorithm 
1. Use a graph partitioning algorithm: cluster objects into a large 
number of relatively small sub-clusters 
2. Use an agglomerative hierarchical clustering algorithm: find the 
genuine clusters by repeatedly combining these sub-clusters 
17
18 
Construct 
Sparse Graph Partition the Graph 
Merge Partition 
Final Clusters 
Data Set
THANK YOU 
19

Contenu connexe

Tendances

K means Clustering
K means ClusteringK means Clustering
K means Clustering
Edureka!
 

Tendances (20)

3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Unsupervised Learning
Unsupervised LearningUnsupervised Learning
Unsupervised Learning
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Clustering
ClusteringClustering
Clustering
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
3.6 constraint based cluster analysis
3.6 constraint based cluster analysis3.6 constraint based cluster analysis
3.6 constraint based cluster analysis
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
Document clustering and classification
Document clustering and classification Document clustering and classification
Document clustering and classification
 
Clustering
ClusteringClustering
Clustering
 

En vedette

Marketing wartosci dodanej (added value marketing)
Marketing wartosci dodanej (added value marketing)Marketing wartosci dodanej (added value marketing)
Marketing wartosci dodanej (added value marketing)
Shake Your Brand
 
фотоальбом дороги
фотоальбом дорогифотоальбом дороги
фотоальбом дороги
NASTIA_090583
 

En vedette (12)

Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data mining
 
Machine Learning and Data Mining: 08 Clustering: Hierarchical
Machine Learning and Data Mining: 08 Clustering: Hierarchical Machine Learning and Data Mining: 08 Clustering: Hierarchical
Machine Learning and Data Mining: 08 Clustering: Hierarchical
 
Linear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actionsLinear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actions
 
Marketing wartosci dodanej (added value marketing)
Marketing wartosci dodanej (added value marketing)Marketing wartosci dodanej (added value marketing)
Marketing wartosci dodanej (added value marketing)
 
фотоальбом дороги
фотоальбом дорогифотоальбом дороги
фотоальбом дороги
 
Citizen Journalism (Myanmar Version) 2014
Citizen Journalism (Myanmar Version) 2014Citizen Journalism (Myanmar Version) 2014
Citizen Journalism (Myanmar Version) 2014
 
The Fail Lecture
The Fail LectureThe Fail Lecture
The Fail Lecture
 
Newforma by numbers
Newforma by numbersNewforma by numbers
Newforma by numbers
 
Concentrated Stress & Stress Tensor
Concentrated Stress & Stress TensorConcentrated Stress & Stress Tensor
Concentrated Stress & Stress Tensor
 
Job app
Job appJob app
Job app
 
Pwpn
PwpnPwpn
Pwpn
 

Similaire à DATA MINING:Clustering Types

Similaire à DATA MINING:Clustering Types (20)

My8clst
My8clstMy8clst
My8clst
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
ML basic &amp; clustering
ML basic &amp; clusteringML basic &amp; clustering
ML basic &amp; clustering
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports Data
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5 Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering
 
dm_clustering2.ppt
dm_clustering2.pptdm_clustering2.ppt
dm_clustering2.ppt
 
hierarchical clustering.pptx
hierarchical clustering.pptxhierarchical clustering.pptx
hierarchical clustering.pptx
 
Project PPT
Project PPTProject PPT
Project PPT
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
ClusetrigBasic.ppt
ClusetrigBasic.pptClusetrigBasic.ppt
ClusetrigBasic.ppt
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Birch1
Birch1Birch1
Birch1
 

Dernier

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 

DATA MINING:Clustering Types

  • 1. PPrreesseenntteedd BByy,, AAsshhwwiinn SShheennooyy MM 44CCBB1133SSCCSS0022 1
  • 2.  Partitioning methods  HHiieerraarrcchhiiccaall mmeetthhooddss  DDeennssiittyy--bbaasseedd mmeetthhooddss  GGrriidd--bbaasseedd mmeetthhooddss  MMooddeell--bbaasseedd mmeetthhooddss (conceptual clustering, neural networks) 2
  • 3.  AA ppaarrttiittiioonniinngg mmeetthhoodd:: construct a partition of a database D of nn objects into a set of k clusters such that o each cluster contains at least one object o each object belongs to exactly one cluster 3
  • 4. mmeetthhooddss: o kk--mmeeaannss : Each cluster is represented by the center of the cluster (cceennttrrooiidd). o kk--mmeeddooiiddss : Each cluster is represented by one of the objects in the cluster (mmeeddooiidd). 4
  • 5.  IInnppuutt ttoo tthhee aallggoorriitthhmm: the number of clusters k, and a database of n objects  AAllggoorriitthhmm ccoonnssiissttss ooff ffoouurr sstteeppss: 1. partition object into k nonempty subsets/clusters 2. compute a seed points as the cceennttrrooiidd (the mean of the objects in the cluster) for each cluster in the current partition 3. assign each object to the cluster with the nearest centroid 4. go back to Step 2, stop when there are no more new assignments 5
  • 6. Alternative algorithm aallssoo ccoonnssiissttss ooff ffoouurr sstteeppss: 1. arbitrarily choose k objects as the initial cluster centers (centroids) 2. (re)assign each object to the cluster with the nearest centroid 3. update the centroids 4. go back to Step 2, stop when there are no more new assignments 6
  • 7. 7 10 9 8 7 6 5 4 3 2 1 0 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 0 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10
  • 8.  IInnppuutt ttoo tthhee aallggoorriitthhmm: the number of clusters k, and a database of n objects  AAllggoorriitthhmm ccoonnssiissttss ooff ffoouurr sstteeppss: 1. arbitrarily choose k objects as the initial mmeeddooiiddss (representative objects) 2. assign each remaining object to the cluster with the nearest medoid 3. select a nonmedoid and replace one of the medoids with it if this improves the clustering 4. go back to Step 2, stop when there are no more new assignments 8
  • 9.  AA hhiieerraarrcchhiiccaall mmeetthhoodd:: construct a hierarchy of clustering, not just a single partition of objects.  The number of clusters kk is not required as an input .  Use a ddiissttaannccee mmaattrriixx as clustering criteria  A tteerrmmiinnaattiioonn ccoonnddiittiioonn can be used (e.g., a number of clusters) 9
  • 10. 11..aagggglloommeerraattiivvee (bottom-up): oplace each object in its own cluster. omerge in each step the two most similar clusters until there is only one cluster left or the termination condition is satisfied. 22..ddiivviissiivvee (top-down): ostart with one big cluster containing all the objects odivide the most distinctive cluster into smaller clusters and proceed until there are n clusters or the termination condition is satisfied. 10
  • 11. 11 Step 0 Step 1 Step 2 Step 3 Step 4 a a b b c d e d e a b c d e c d e Step 4 Step 3 Step 2 Step 1 Step 0 aagggglloommeerraattiivvee ddiivviissiivvee
  • 12.  Birch: Balanced Iterative Reducing and Clustering using Hierarchies.  Incrementally construct a CF (Clustering Feature) tree, a hierarchical data structure for multiphase clustering Phase 1: scan DB to build an initial in-memory CF tree (a multi-level compression of the data that tries to preserve the inherent clustering structure of the data). Phase 2: use an arbitrary clustering algorithm to cluster the leaf nodes of the CF-tree. 12
  • 13.  Clustering feature: • summary of the statistics for a given subcluster: the 0-th, 1st and 2nd moments of the subcluster from the statistical point of view. • registers crucial measurements for computing cluster and utilizes storage efficiently A CF tree is a height-balanced tree that stores the clustering features for a hierarchical clustering • A nonleaf node in a tree has descendants or “children” • The nonleaf nodes store sums of the CFs of their children  A CF tree has two parameters • Branching factor: specify the maximum number of children. • threshold: max diameter of sub-clusters stored at the leaf nodes 13
  • 14. 14 CF1 child1 Root CF3 child3 CF2 child2 CF6 child6 CF1 child1 Non-leaf node CF3 child3 CF2 child2 CF5 child5 B = 7 L = 6 Leaf node Leaf node CF1 CF2 CF6 prev next CF1 CF2 CF4 prev next
  • 15.  ROCK: RObust Clustering using linKs  Major ideas • Use links to measure similarity/proximity. 15
  • 16.  Links: # of common neighbors • C1 <a, b, c, d, e>: {a, b, c}, {a, b, d}, {a, b, e}, {a, c, d}, {a, c, e}, {a, d, e}, {b, c, d}, {b, c, e}, {b, d, e}, {c, d, e} • C2 <a, b, f, g>: {a, b, f}, {a, b, g}, {a, f, g}, {b, f, g}  Let T1 = {a, b, c}, T2 = {c, d, e}, T3 = {a, b, f} • link(T1, T2) = 4, since they have 4 common neighbors  {a, c, d}, {a, c, e}, {b, c, d}, {b, c, e} • link(T1, T3) = 3, since they have 3 common neighbors  {a, b, d}, {a, b, e}, {a, b, g}  Thus link is a better measure than Jaccard coefficient 16
  • 17.  Measures the similarity based on a dynamic model • Two clusters are merged only if the interconnectivity and closeness (proximity) between two clusters are high relative to the internal interconnectivity of the clusters and closeness of items within the clusters  A two-phase algorithm 1. Use a graph partitioning algorithm: cluster objects into a large number of relatively small sub-clusters 2. Use an agglomerative hierarchical clustering algorithm: find the genuine clusters by repeatedly combining these sub-clusters 17
  • 18. 18 Construct Sparse Graph Partition the Graph Merge Partition Final Clusters Data Set