SlideShare une entreprise Scribd logo
1  sur  53
Distributed Computing Seminar Lecture 4: Clustering – an Overview and Sample MapReduce Implementation Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet Summer 2007 Except as otherwise noted, the content of this presentation is © 2007 Google Inc. and licensed under the Creative Commons Attribution 2.5 License.
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Clustering ,[object Object]
Google News ,[object Object],[object Object],[object Object]
Other less glamorous things... ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Distance Measure ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],Types of Algorithms
Hierarchical Clustering ,[object Object]
Partitional Clustering ,[object Object]
Partitional Clustering  ,[object Object]
K-Means Clustering  ,[object Object],[object Object],[object Object],[object Object]
K-Means Clustering ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
But! ,[object Object],[object Object],[object Object]
Furthermore ,[object Object],[object Object],[object Object],[object Object],[object Object]
Canopy Clustering ,[object Object],[object Object],[object Object],[object Object]
Canopy Clustering ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
After the canopy clustering… ,[object Object],[object Object]
MapReduce Implementation:  ,[object Object]
The Distance Metric ,[object Object],[object Object]
Steps! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Massage ,[object Object]
Selecting Canopy Centers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Assigning Points to Canopies
 
 
 
 
 
K-Means Map
 
 
 
 
 
 
 
Elbow Criterion ,[object Object],[object Object],[object Object],[object Object]
Conclusions ,[object Object],[object Object],[object Object],[object Object]

Contenu connexe

Tendances

Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
Adeyemi Fowe
 

Tendances (20)

3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
presentation 2019 04_09_rev1
presentation 2019 04_09_rev1presentation 2019 04_09_rev1
presentation 2019 04_09_rev1
 
Clustering
ClusteringClustering
Clustering
 
Lalal
LalalLalal
Lalal
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 
Development Infographic
Development InfographicDevelopment Infographic
Development Infographic
 
DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering Types
 
Dataa miining
Dataa miiningDataa miining
Dataa miining
 
Improved k-means
Improved k-meansImproved k-means
Improved k-means
 
Pillar k means
Pillar k meansPillar k means
Pillar k means
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
 
Clique
Clique Clique
Clique
 
Unsupervised Learning
Unsupervised LearningUnsupervised Learning
Unsupervised Learning
 
Clustering
ClusteringClustering
Clustering
 

En vedette

Pagerank (from Google)
Pagerank (from Google)Pagerank (from Google)
Pagerank (from Google)
Sri Prasanna
 
Map reduce (from Google)
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)
Sri Prasanna
 
Distributed computing seminar lecture 3 - distributed file systems
Distributed computing seminar   lecture 3 - distributed file systemsDistributed computing seminar   lecture 3 - distributed file systems
Distributed computing seminar lecture 3 - distributed file systems
tugrulh
 
Introduction to Cluster Computing and Map Reduce (from Google)
Introduction to Cluster Computing and Map Reduce  (from Google)Introduction to Cluster Computing and Map Reduce  (from Google)
Introduction to Cluster Computing and Map Reduce (from Google)
Sri Prasanna
 
Distributed file systems
Distributed file systemsDistributed file systems
Distributed file systems
Sri Prasanna
 
Distributed file systems (from Google)
Distributed file systems (from Google)Distributed file systems (from Google)
Distributed file systems (from Google)
Sri Prasanna
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)
Sri Prasanna
 
Qr codes para tech radar
Qr codes para tech radarQr codes para tech radar
Qr codes para tech radar
Sri Prasanna
 
Google: Cluster computing and MapReduce: Introduction to Distributed System D...
Google: Cluster computing and MapReduce: Introduction to Distributed System D...Google: Cluster computing and MapReduce: Introduction to Distributed System D...
Google: Cluster computing and MapReduce: Introduction to Distributed System D...
tugrulh
 
Mutual Exclusion Election (Distributed computing)
Mutual Exclusion Election (Distributed computing)Mutual Exclusion Election (Distributed computing)
Mutual Exclusion Election (Distributed computing)
Sri Prasanna
 

En vedette (10)

Pagerank (from Google)
Pagerank (from Google)Pagerank (from Google)
Pagerank (from Google)
 
Map reduce (from Google)
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)
 
Distributed computing seminar lecture 3 - distributed file systems
Distributed computing seminar   lecture 3 - distributed file systemsDistributed computing seminar   lecture 3 - distributed file systems
Distributed computing seminar lecture 3 - distributed file systems
 
Introduction to Cluster Computing and Map Reduce (from Google)
Introduction to Cluster Computing and Map Reduce  (from Google)Introduction to Cluster Computing and Map Reduce  (from Google)
Introduction to Cluster Computing and Map Reduce (from Google)
 
Distributed file systems
Distributed file systemsDistributed file systems
Distributed file systems
 
Distributed file systems (from Google)
Distributed file systems (from Google)Distributed file systems (from Google)
Distributed file systems (from Google)
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)
 
Qr codes para tech radar
Qr codes para tech radarQr codes para tech radar
Qr codes para tech radar
 
Google: Cluster computing and MapReduce: Introduction to Distributed System D...
Google: Cluster computing and MapReduce: Introduction to Distributed System D...Google: Cluster computing and MapReduce: Introduction to Distributed System D...
Google: Cluster computing and MapReduce: Introduction to Distributed System D...
 
Mutual Exclusion Election (Distributed computing)
Mutual Exclusion Election (Distributed computing)Mutual Exclusion Election (Distributed computing)
Mutual Exclusion Election (Distributed computing)
 

Similaire à Clustering (from Google)

Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
refedey275
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
Nandhini S
 

Similaire à Clustering (from Google) (20)

Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clustering
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
My8clst
My8clstMy8clst
My8clst
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
ML basic & clustering
ML basic & clusteringML basic & clustering
ML basic & clustering
 

Plus de Sri Prasanna

Plus de Sri Prasanna (20)

Qr codes para tech radar 2
Qr codes para tech radar 2Qr codes para tech radar 2
Qr codes para tech radar 2
 
Test
TestTest
Test
 
Test
TestTest
Test
 
assds
assdsassds
assds
 
assds
assdsassds
assds
 
asdsa
asdsaasdsa
asdsa
 
dsd
dsddsd
dsd
 
About stacks
About stacksAbout stacks
About stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About  StacksAbout  Stacks
About Stacks
 
About Stacks
About StacksAbout Stacks
About Stacks
 
About Stacks
About StacksAbout Stacks
About Stacks
 
Network and distributed systems
Network and distributed systemsNetwork and distributed systems
Network and distributed systems
 
Introduction & Parellelization on large scale clusters
Introduction & Parellelization on large scale clustersIntroduction & Parellelization on large scale clusters
Introduction & Parellelization on large scale clusters
 
Mapreduce: Theory and implementation
Mapreduce: Theory and implementationMapreduce: Theory and implementation
Mapreduce: Theory and implementation
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systems
 
Naming And Binding (Distributed computing)
Naming And Binding (Distributed computing)Naming And Binding (Distributed computing)
Naming And Binding (Distributed computing)
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Dernier (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Clustering (from Google)