SlideShare une entreprise Scribd logo
1  sur  37
Machine Learning: Unsupervised Classification
Dr. Muhammad Shaheen
© M. Shahbaz – 2006Dr.Muhammad.Shahbaz@gmail.com
Clustering
© M. Shahbaz – 2006Dr.Muhammad.Shahbaz@gmail.com
Lecture Outline
• What is Clustering
• Supervised and Unsupervised
Classification
• Types of Clustering Algorithms
• Most Common Techniques
• Areas of Applications
• Discussion
• Result
Clustering - Definition
─ Process of grouping similar items together
─ Clusters should be very similar to each other
but…
─ Should be very different from the objects of other
clusters/ other clusters
─ We can say that intra-cluster similarity between
objects is high and inter-cluster similarity is low
─ Important human activity --- used from early
childhood in distinguishing between different
items such as cars and cats, animals and plants
etc.
Supervised and Unsupervised Classification
─ What is Classification?
─ What is Supervised Classification/Learning?
─ What is Unsupervised Classification/Learning?
─ SOM – Self Organizing Maps
Types of Clustering Algorithms
─ Clustering has been a popular area of research
─ Several methods and techniques have been
developed to determine natural grouping among
the objects
Jain, A. K., Murty, M. N., and Flynn, P. J., Data Clustering: A Survey.
ACM Computing Surveys, 1999. 31: pp. 264-323.
Jain, A. K. and Dubes, R. C., Algorithms for Clustering Data. 1988,
Englewood Cliffs, NJ: Prentice Hall. 013022278X
Types of Clustering Algorithms
Hierarchical
Methods
Partitioning
Methods
Grid-Based
Methods
Clustering
Algorithms Used in
Machine Learning
Algorithms For
High Dimensional
Data
Agglomerative
Algorithms
Divisive
Algorithms
Relocation
Algorithms
Probabilistic
Clustering
K-medoids
Methods
K-means Methods Density-Based
Algorithms
Density-Based
Connectivity
Clustering
Density Functions
Clustering
Gradient Descent
and Artificial
Neural Networks
Evolutionary
Methods
Subspace
Clustering
Co-Clustering
Techniques
Projection
Techniques
Clustering
Hierarchical
Methods
Partitioning
Methods
Grid-Based
Methods
Clustering
Algorithms Used in
Machine Learning
Algorithms For
High Dimensional
Data
Hierarchical
Methods
Partitioning
Methods
Grid-Based
Methods
Clustering
Algorithms Used in
Machine Learning
Algorithms For
High Dimensional
Data
Agglomerative
Algorithms
Divisive
Algorithms
Agglomerative
Algorithms
Divisive
Algorithms
Relocation
Algorithms
Probabilistic
Clustering
K-medoids
Methods
K-means Methods Density-Based
Algorithms
Relocation
Algorithms
Probabilistic
Clustering
K-medoids
Methods
K-means Methods Density-Based
Algorithms
Density-Based
Connectivity
Clustering
Density Functions
Clustering
Density-Based
Connectivity
Clustering
Density Functions
Clustering
Gradient Descent
and Artificial
Neural Networks
Evolutionary
Methods
Gradient Descent
and Artificial
Neural Networks
Evolutionary
Methods
Subspace
Clustering
Co-Clustering
Techniques
Projection
Techniques
Clustering
Classification vs. Clustering
Classification:
Supervised learning:
Learns a method for predicting the
instance class from pre-labeled
(classified) instances
Clustering
Unsupervised learning:
Finds “natural” grouping of
instances given un-labeled data
Clustering Evaluation
• Manual inspection
• Benchmarking on existing labels
• Cluster quality measures
–distance measures
–high similarity within a cluster, low across
clusters
The Distance Function
• Simplest case: one numeric attribute A
– Distance(X,Y) = A(X) – A(Y)
• Several numeric attributes:
– Distance(X,Y) = Euclidean distance between
X,Y
• Are all attributes equally important?
– Weighting the attributes might be necessary
Simple Clustering: K-means
Works with numeric data only
1) Pick a number (K) of cluster centers (at
random)
2) Assign every item to its nearest cluster
center (e.g. using Euclidean distance)
3) Move each cluster center to the mean of
its assigned items
4) Repeat steps 2,3 until convergence
(change in cluster assignments less than
a threshold)
K-means example, step 1
k1
k2
k3
X
Y
Pick 3
initial
cluster
centers
(randomly)
K-means example, step 2
k1
k2
k3
X
Y
Assign
each point
to the closest
cluster
center
K-means example, step 3
X
Y
Move
each cluster
center
to the mean
of each cluster
k1
k2
k2
k1
k3
k3
K-means example, step 4
X
Y
Reassign
points
closest to a
different new
cluster center
Q: Which
points are
reassigned?
k1
k2
k3
K-means example, step 4
…
X
Y
A: three
points with
animation
k1
k3
k2
K-means example, step 4b
X
Y
re-compute
cluster
means
k1
k3
k2
K-means example, step 5
X
Y
move cluster
centers to
cluster means
k2
k1
k3
Squared Error Criterion
Pros and cons of K-Means
K-means variations
• K-medoids – instead of mean, use
medians of each cluster
–Mean of 1, 3, 5, 7, 9 is
–Mean of 1, 3, 5, 7, 1009 is
–Median of 1, 3, 5, 7, 1009 is
–Median advantage: not affected by extreme
values
• For large databases, use sampling
5
205
5
k-Medoids
The k-Medoids Algorithm
Evaluating Cost of Swapping Medoids
Evaluating Cost of Swapping Medoids
Four Cases
Total Cost of Swap
K-means clustering summary
Advantages
• Simple, understandable
• items automatically
assigned to clusters
Disadvantages
• Must pick number of
clusters before hand
• All items forced into a
cluster
• Too sensitive to outliers
since an object with an
extremely large value
may substantially
distort the distribution
of data
Hierarchical clustering
• Agglomerative Clustering
– Start with single-instance clusters
– At each step, join the two closest clusters
– Design decision: distance between clusters
• Divisive Clustering
– Start with one universal cluster
– Find two clusters
– Proceed recursively on each subset
– Can be very fast
• Both methods produce a
dendrogram
g a c i e d k b j f h
Partial Supervision of Clustering
A two dimensional image of supervised clusters
A two dimensional image of supervised clusters (real case)
Partial Supervision of Clustering
Partial Supervision of Clustering
5
4
3
2
1
5
4
3
2
1
Disputed Data
Point
A two dimensional image of the different zones of overlapping clusters
who both claim a data point (More than two clusters claiming a point is
also common)
Research Problems
─ Effective and Efficient methods of Clustering
─ Scalability
─ Handling different types of data
─ Handling complex multidimensional data
─ Complex shapes of clusters
─ Subspace Clustering
─ Cluster overlapping etc.
Examples of Clustering Applications
• Marketing: discover customer groups and use
them for targeted marketing and re-organization
• Astronomy: find groups of similar stars and
galaxies
• Earth-quake studies: Observed earth quake
epicenters should be clustered along continent
faults
• Genomics: finding groups of gene with similar
expressions
• …
Clustering Summary
• unsupervised
• many approaches
–K-means – simple, sometimes useful
• K-medoids is less sensitive to outliers
–Hierarchical clustering – works for symbolic
attributes
–Can be used to fill in missing values
Questions
?

Contenu connexe

Tendances

K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysisDataminingTools Inc
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data MiningValerii Klymchuk
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionmrizwan969
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
Hierarchical clustering
Hierarchical clustering Hierarchical clustering
Hierarchical clustering Ashek Farabi
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Simplilearn
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)Pravinkumar Landge
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithmhadifar
 

Tendances (20)

Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Hierarchical clustering
Hierarchical clustering Hierarchical clustering
Hierarchical clustering
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 

En vedette

Band Combination of Landsat 8 Earth-observing Satellite Images
Band Combination of Landsat 8 Earth-observing Satellite ImagesBand Combination of Landsat 8 Earth-observing Satellite Images
Band Combination of Landsat 8 Earth-observing Satellite ImagesKabir Uddin
 
Few Indicies(NDVI... etc) performed on ERDAS software using Model Maker
Few Indicies(NDVI... etc) performed on ERDAS software using Model MakerFew Indicies(NDVI... etc) performed on ERDAS software using Model Maker
Few Indicies(NDVI... etc) performed on ERDAS software using Model MakerSwetha A
 
Normalized Difference Vegetation Index (NDVI)
Normalized Difference Vegetation Index (NDVI)Normalized Difference Vegetation Index (NDVI)
Normalized Difference Vegetation Index (NDVI)Susan Aragon
 
Land cover supervised classification at Toro Park, California
Land cover supervised classification at Toro Park, CaliforniaLand cover supervised classification at Toro Park, California
Land cover supervised classification at Toro Park, CaliforniaLisa Jensen
 
Supervised Classification
Supervised ClassificationSupervised Classification
Supervised ClassificationChad Yowler
 
Presentacion elafis Brasil 2009
Presentacion elafis Brasil 2009Presentacion elafis Brasil 2009
Presentacion elafis Brasil 2009Fredy Neira
 
Introduce variable/ Indices using landsat image
Introduce variable/ Indices using landsat imageIntroduce variable/ Indices using landsat image
Introduce variable/ Indices using landsat imageKabir Uddin
 
Original SOINN
Original SOINNOriginal SOINN
Original SOINNSOINN Inc.
 
Program Evaluation and Review Technique (PERT)
Program Evaluation and Review Technique (PERT)Program Evaluation and Review Technique (PERT)
Program Evaluation and Review Technique (PERT)Abhishek Pachisia
 
Pert & cpm project management
Pert & cpm   project managementPert & cpm   project management
Pert & cpm project managementRahul Dubey
 

En vedette (12)

Band Combination of Landsat 8 Earth-observing Satellite Images
Band Combination of Landsat 8 Earth-observing Satellite ImagesBand Combination of Landsat 8 Earth-observing Satellite Images
Band Combination of Landsat 8 Earth-observing Satellite Images
 
Few Indicies(NDVI... etc) performed on ERDAS software using Model Maker
Few Indicies(NDVI... etc) performed on ERDAS software using Model MakerFew Indicies(NDVI... etc) performed on ERDAS software using Model Maker
Few Indicies(NDVI... etc) performed on ERDAS software using Model Maker
 
Normalized Difference Vegetation Index (NDVI)
Normalized Difference Vegetation Index (NDVI)Normalized Difference Vegetation Index (NDVI)
Normalized Difference Vegetation Index (NDVI)
 
NDVI
NDVINDVI
NDVI
 
Land cover supervised classification at Toro Park, California
Land cover supervised classification at Toro Park, CaliforniaLand cover supervised classification at Toro Park, California
Land cover supervised classification at Toro Park, California
 
Supervised Classification
Supervised ClassificationSupervised Classification
Supervised Classification
 
Presentacion elafis Brasil 2009
Presentacion elafis Brasil 2009Presentacion elafis Brasil 2009
Presentacion elafis Brasil 2009
 
Introduce variable/ Indices using landsat image
Introduce variable/ Indices using landsat imageIntroduce variable/ Indices using landsat image
Introduce variable/ Indices using landsat image
 
Original SOINN
Original SOINNOriginal SOINN
Original SOINN
 
Pert & Cpm
Pert & CpmPert & Cpm
Pert & Cpm
 
Program Evaluation and Review Technique (PERT)
Program Evaluation and Review Technique (PERT)Program Evaluation and Review Technique (PERT)
Program Evaluation and Review Technique (PERT)
 
Pert & cpm project management
Pert & cpm   project managementPert & cpm   project management
Pert & cpm project management
 

Similaire à Clustering

Data mining Techniques
Data mining TechniquesData mining Techniques
Data mining TechniquesSulman Ahmed
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit vmalathieswaran29
 
Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptxssusere1fd42
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptx
01 Statistika Lanjut - Cluster Analysis  part 1 with sound (1).pptx01 Statistika Lanjut - Cluster Analysis  part 1 with sound (1).pptx
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptxniawiya
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdfbintis1
 
Cluster_saumitra.ppt
Cluster_saumitra.pptCluster_saumitra.ppt
Cluster_saumitra.pptssuser6b3336
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfigeabroad
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsNithyananthSengottai
 
Hierarchical clustering.pptx
Hierarchical clustering.pptxHierarchical clustering.pptx
Hierarchical clustering.pptxNTUConcepts1
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in RSudhakar Chavan
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptxNIKHILGR3
 

Similaire à Clustering (20)

Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
Clusteryanam
ClusteryanamClusteryanam
Clusteryanam
 
Data mining Techniques
Data mining TechniquesData mining Techniques
Data mining Techniques
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
Unsupervised learning Modi.pptx
Unsupervised learning Modi.pptxUnsupervised learning Modi.pptx
Unsupervised learning Modi.pptx
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptx
01 Statistika Lanjut - Cluster Analysis  part 1 with sound (1).pptx01 Statistika Lanjut - Cluster Analysis  part 1 with sound (1).pptx
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptx
 
Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
 
Cluster_saumitra.ppt
Cluster_saumitra.pptCluster_saumitra.ppt
Cluster_saumitra.ppt
 
Data mining
Data miningData mining
Data mining
 
Clustering on DSS
Clustering on DSSClustering on DSS
Clustering on DSS
 
Clustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdfClustering[306] [Read-Only].pdf
Clustering[306] [Read-Only].pdf
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
Hierarchical clustering.pptx
Hierarchical clustering.pptxHierarchical clustering.pptx
Hierarchical clustering.pptx
 
Cluster Analysis.pptx
Cluster Analysis.pptxCluster Analysis.pptx
Cluster Analysis.pptx
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Clustering

  • 1. Machine Learning: Unsupervised Classification Dr. Muhammad Shaheen
  • 2. © M. Shahbaz – 2006Dr.Muhammad.Shahbaz@gmail.com Clustering
  • 3. © M. Shahbaz – 2006Dr.Muhammad.Shahbaz@gmail.com Lecture Outline • What is Clustering • Supervised and Unsupervised Classification • Types of Clustering Algorithms • Most Common Techniques • Areas of Applications • Discussion • Result
  • 4. Clustering - Definition ─ Process of grouping similar items together ─ Clusters should be very similar to each other but… ─ Should be very different from the objects of other clusters/ other clusters ─ We can say that intra-cluster similarity between objects is high and inter-cluster similarity is low ─ Important human activity --- used from early childhood in distinguishing between different items such as cars and cats, animals and plants etc.
  • 5. Supervised and Unsupervised Classification ─ What is Classification? ─ What is Supervised Classification/Learning? ─ What is Unsupervised Classification/Learning? ─ SOM – Self Organizing Maps
  • 6. Types of Clustering Algorithms ─ Clustering has been a popular area of research ─ Several methods and techniques have been developed to determine natural grouping among the objects Jain, A. K., Murty, M. N., and Flynn, P. J., Data Clustering: A Survey. ACM Computing Surveys, 1999. 31: pp. 264-323. Jain, A. K. and Dubes, R. C., Algorithms for Clustering Data. 1988, Englewood Cliffs, NJ: Prentice Hall. 013022278X
  • 7. Types of Clustering Algorithms Hierarchical Methods Partitioning Methods Grid-Based Methods Clustering Algorithms Used in Machine Learning Algorithms For High Dimensional Data Agglomerative Algorithms Divisive Algorithms Relocation Algorithms Probabilistic Clustering K-medoids Methods K-means Methods Density-Based Algorithms Density-Based Connectivity Clustering Density Functions Clustering Gradient Descent and Artificial Neural Networks Evolutionary Methods Subspace Clustering Co-Clustering Techniques Projection Techniques Clustering Hierarchical Methods Partitioning Methods Grid-Based Methods Clustering Algorithms Used in Machine Learning Algorithms For High Dimensional Data Hierarchical Methods Partitioning Methods Grid-Based Methods Clustering Algorithms Used in Machine Learning Algorithms For High Dimensional Data Agglomerative Algorithms Divisive Algorithms Agglomerative Algorithms Divisive Algorithms Relocation Algorithms Probabilistic Clustering K-medoids Methods K-means Methods Density-Based Algorithms Relocation Algorithms Probabilistic Clustering K-medoids Methods K-means Methods Density-Based Algorithms Density-Based Connectivity Clustering Density Functions Clustering Density-Based Connectivity Clustering Density Functions Clustering Gradient Descent and Artificial Neural Networks Evolutionary Methods Gradient Descent and Artificial Neural Networks Evolutionary Methods Subspace Clustering Co-Clustering Techniques Projection Techniques Clustering
  • 8. Classification vs. Clustering Classification: Supervised learning: Learns a method for predicting the instance class from pre-labeled (classified) instances
  • 9. Clustering Unsupervised learning: Finds “natural” grouping of instances given un-labeled data
  • 10. Clustering Evaluation • Manual inspection • Benchmarking on existing labels • Cluster quality measures –distance measures –high similarity within a cluster, low across clusters
  • 11. The Distance Function • Simplest case: one numeric attribute A – Distance(X,Y) = A(X) – A(Y) • Several numeric attributes: – Distance(X,Y) = Euclidean distance between X,Y • Are all attributes equally important? – Weighting the attributes might be necessary
  • 12. Simple Clustering: K-means Works with numeric data only 1) Pick a number (K) of cluster centers (at random) 2) Assign every item to its nearest cluster center (e.g. using Euclidean distance) 3) Move each cluster center to the mean of its assigned items 4) Repeat steps 2,3 until convergence (change in cluster assignments less than a threshold)
  • 13. K-means example, step 1 k1 k2 k3 X Y Pick 3 initial cluster centers (randomly)
  • 14. K-means example, step 2 k1 k2 k3 X Y Assign each point to the closest cluster center
  • 15. K-means example, step 3 X Y Move each cluster center to the mean of each cluster k1 k2 k2 k1 k3 k3
  • 16. K-means example, step 4 X Y Reassign points closest to a different new cluster center Q: Which points are reassigned? k1 k2 k3
  • 17. K-means example, step 4 … X Y A: three points with animation k1 k3 k2
  • 18. K-means example, step 4b X Y re-compute cluster means k1 k3 k2
  • 19. K-means example, step 5 X Y move cluster centers to cluster means k2 k1 k3
  • 21. Pros and cons of K-Means
  • 22. K-means variations • K-medoids – instead of mean, use medians of each cluster –Mean of 1, 3, 5, 7, 9 is –Mean of 1, 3, 5, 7, 1009 is –Median of 1, 3, 5, 7, 1009 is –Median advantage: not affected by extreme values • For large databases, use sampling 5 205 5
  • 25. Evaluating Cost of Swapping Medoids
  • 26. Evaluating Cost of Swapping Medoids
  • 29. K-means clustering summary Advantages • Simple, understandable • items automatically assigned to clusters Disadvantages • Must pick number of clusters before hand • All items forced into a cluster • Too sensitive to outliers since an object with an extremely large value may substantially distort the distribution of data
  • 30. Hierarchical clustering • Agglomerative Clustering – Start with single-instance clusters – At each step, join the two closest clusters – Design decision: distance between clusters • Divisive Clustering – Start with one universal cluster – Find two clusters – Proceed recursively on each subset – Can be very fast • Both methods produce a dendrogram g a c i e d k b j f h
  • 31. Partial Supervision of Clustering A two dimensional image of supervised clusters
  • 32. A two dimensional image of supervised clusters (real case) Partial Supervision of Clustering
  • 33. Partial Supervision of Clustering 5 4 3 2 1 5 4 3 2 1 Disputed Data Point A two dimensional image of the different zones of overlapping clusters who both claim a data point (More than two clusters claiming a point is also common)
  • 34. Research Problems ─ Effective and Efficient methods of Clustering ─ Scalability ─ Handling different types of data ─ Handling complex multidimensional data ─ Complex shapes of clusters ─ Subspace Clustering ─ Cluster overlapping etc.
  • 35. Examples of Clustering Applications • Marketing: discover customer groups and use them for targeted marketing and re-organization • Astronomy: find groups of similar stars and galaxies • Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults • Genomics: finding groups of gene with similar expressions • …
  • 36. Clustering Summary • unsupervised • many approaches –K-means – simple, sometimes useful • K-medoids is less sensitive to outliers –Hierarchical clustering – works for symbolic attributes –Can be used to fill in missing values