SlideShare une entreprise Scribd logo
1  sur  21
Embarrassingly Parallel
Problems
CS5225 Parallel and Concurrent Programming
Dilum Bandara
Dilum.Bandara@uom.lk
Some slides adapted from Dr. Srinath Perera
Embarrassingly Parallel Problems
 A.k.a. Delightfully Parallel Problems
 Can be easily parallelizable
 Usually use simple communication patterns
 Usually work without much communication
among each other
 Map-Reduce programming model provides a
powerful abstraction to handle embarrassingly
parallel problems
2
Map-Reduce
 Common pattern to solve parallel problems
 Based on 2 constructs from functional programming,
map & reduce
 Introduced by Google
 Dean et. al., “MapReduce: Simplified Data Processing
on Large Clusters,” OSDI, 2004
 Extensible for different applications
 Scale to very large number of nodes
 Hide details like failures from users
3
High-Order Functions
 Programming languages (e.g., Java) pass data
as parameters & results of functions
 Higher-order functions pass both data as well as
functions as parameters or results of functions
 E.g., Python, Ruby, JavaScript
 For example
def f(x):
return x + 3
def g(function, x):
return function(x) * function(x)
print g(f, 7) 4
Map-Reduce
 Accepts 2 functions as inputs
1. Map function
 Y fn1(X)
 Accepts input X & outputs another Y
2. Reduce function
 Z fn2(List<Y>)
 Accepts array of Y’s & returns another output Z
5
Map-Reduce (Contd.)
 Map-reduce support is provided by a function
like following
 Y map-reduce(mapfn, reducefn, List<X>)
 Map reduce implementation takes list of inputs
(list) & does following
 Apply map function to each entry in the list, which
emit (key, value) pairs
 Collect results, group them by keys, & then pass them
to reduce function as array
6
Map-Reduce (Contd.)
7
Source: www.datasciencecentral.com/profiles/blogs/practical-
illustration-of-map-reduce-hadoop-style-on-real-data
Map-Reduce for Word Counting
8
Source: http://xiaochongzhang.me/blog/?p=338
How to do this for a large dataset using a distributed system?
In Class Activity
1. Card sorting
2. Card sorting with 2 rounds
3. Identify missing cards
9
Inspired by Marcio Silva's “The MapReduce Card Game” at
http://blog.marciosilva.com/2012/10/the-mapreduce-card-game.html
Why Map-Reduce?
 Implementing same pattern in a distributed
system isn’t that easy
 Need to worry about communication, failures,
initialization, etc.
 MapReduce frameworks worry about all those
 You write map & reduce functions & call
framework
 It forces you to think parallel in design time
 It gives you a higher-level of abstraction to think in
 It’s very generic, & covers lot of usecases
 See http://wiki.apache.org/hadoop/PoweredBy
10
Map-Reduce Implementations
 Can be implemented in many ways
 In-memory implementation
 Distributed implementation
 Communication by messages
 Communication by file system
 Communication by databases
 Communication Requirements
 Need broadcast & reduce operations only
11
Map-Reduce with Hadoop
 Apache Hadoop is an implementation of Map-
reduce
 Handles all details about distributed execution
 You just have to give Map & Reduce functions
12
Map-Reduce Data Model
13
Source: http://slides.com/bearrito/pittsburgh-nosql-_-mapreduce#/
Map-Reduce Data Model (Cont.)
 Hadoop breaks input data into multiple data items by
new lines & runs map function once for each data item
 When executed, map function outputs (key, value) pairs
 Hadoop collects all (key, value) pairs generated by map
function, sorts them by the key, & groups values with the
same key together into groups
 For each distinct key, Hadoop runs reduce function once
while passing key & list of values for that key as input
 Reduce function outputs (key, value) pairs, & Hadoop
writes them to a file as final result
14
Execution on a Cluster/Cloud
15
Source: www.cbsolution.net/techniques/ontarget/mapreduce_vs_data_warehouse
MapReduce Execution
16
Source: Dean et. al.,
“MapReduce, OSDI, 2004
Designing Map-Reduce Applications
 You control task granularity by changing no of
map & reduce tasks
 How many map tasks?
 How many reduce tasks?
 Fine Grain  more parallelism  more
communication overhead and vise versa
 Usually frameworks handle load balancing &
failures
 If large number of maps are there, you need a
Combine Function as well
17
Examples
 Sorting
 How to sort an array of 1 million integers using
MapReduce?
 Inverted Index
 Normal index is a mapping from document to terms
 Inverted index is mapping from terms to documents
 If we have a million documents, how do we build a
inverted index using MapReduce?
 Frequency Distribution of Word Occurrences
 Count number of occurrences & build a histogram
18
Examples (Cont.)
 Stitch Imagery
 For Google maps, Google need to combine many
map data into a single set of data
 Business Intelligence
 A business want to create a graph of income
generated by each region & marketing money spend
on each region
19
Examples (Cont.)
 K-Means
 Assume you are given a list of earth quakes
coordinates happened in the world in last 50 years.
 You are asked to use K-Means Clustering algorithm
to find 10 locations around which those earth quakes
were located.
 K-Means starts with 10 random cluster locations.
 It proceeds iteratively, & at each iteration, it assigns each
data point (earth quake) to the closest cluster location
 At end of each iteration, it recalculates each cluster location
using mean of all data point coordinates assigned to that
location
 It stops when cluster locations doesn’t change after
recalculation 20
K-Means Algorithm
List kmeans(datapointsList , initialClustersList){
oldlocations = null;
newLocations = initialClustersList ;
while(oldlocations != newLocations){
for(d in datapointsList){
oldlocations = newLocations ;
newLocations = //recalculate locations
}
//assign d to closest location in newLocations
}
}
return newLocations ;
21

Contenu connexe

Similaire à Embarrassingly/Delightfully Parallel Problems

Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
Sri Prasanna
 
Map reduce
Map reduceMap reduce
Map reduce
xydii
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
coolmirza143
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
ivascucristian
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Yahoo Developer Network
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
Abhishek Singh
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
Noha Elprince
 

Similaire à Embarrassingly/Delightfully Parallel Problems (20)

Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
 
E031201032036
E031201032036E031201032036
E031201032036
 
Big data
Big dataBig data
Big data
 
Map reduce
Map reduceMap reduce
Map reduce
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
MAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHAREMAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHARE
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
Hadoop
HadoopHadoop
Hadoop
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 

Plus de Dilum Bandara

Plus de Dilum Bandara (20)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in Practice
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCA
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive Analytics
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data Structures
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale Computers
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level Parallelism
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in Microprocessors
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware Techniques
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler Techniques
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An Introduction
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCP
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery Networks
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and Streaming
 
Mobile Services
Mobile ServicesMobile Services
Mobile Services
 
Wired Broadband Communication
Wired Broadband CommunicationWired Broadband Communication
Wired Broadband Communication
 

Dernier

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Dernier (20)

Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 

Embarrassingly/Delightfully Parallel Problems

  • 1. Embarrassingly Parallel Problems CS5225 Parallel and Concurrent Programming Dilum Bandara Dilum.Bandara@uom.lk Some slides adapted from Dr. Srinath Perera
  • 2. Embarrassingly Parallel Problems  A.k.a. Delightfully Parallel Problems  Can be easily parallelizable  Usually use simple communication patterns  Usually work without much communication among each other  Map-Reduce programming model provides a powerful abstraction to handle embarrassingly parallel problems 2
  • 3. Map-Reduce  Common pattern to solve parallel problems  Based on 2 constructs from functional programming, map & reduce  Introduced by Google  Dean et. al., “MapReduce: Simplified Data Processing on Large Clusters,” OSDI, 2004  Extensible for different applications  Scale to very large number of nodes  Hide details like failures from users 3
  • 4. High-Order Functions  Programming languages (e.g., Java) pass data as parameters & results of functions  Higher-order functions pass both data as well as functions as parameters or results of functions  E.g., Python, Ruby, JavaScript  For example def f(x): return x + 3 def g(function, x): return function(x) * function(x) print g(f, 7) 4
  • 5. Map-Reduce  Accepts 2 functions as inputs 1. Map function  Y fn1(X)  Accepts input X & outputs another Y 2. Reduce function  Z fn2(List<Y>)  Accepts array of Y’s & returns another output Z 5
  • 6. Map-Reduce (Contd.)  Map-reduce support is provided by a function like following  Y map-reduce(mapfn, reducefn, List<X>)  Map reduce implementation takes list of inputs (list) & does following  Apply map function to each entry in the list, which emit (key, value) pairs  Collect results, group them by keys, & then pass them to reduce function as array 6
  • 8. Map-Reduce for Word Counting 8 Source: http://xiaochongzhang.me/blog/?p=338 How to do this for a large dataset using a distributed system?
  • 9. In Class Activity 1. Card sorting 2. Card sorting with 2 rounds 3. Identify missing cards 9 Inspired by Marcio Silva's “The MapReduce Card Game” at http://blog.marciosilva.com/2012/10/the-mapreduce-card-game.html
  • 10. Why Map-Reduce?  Implementing same pattern in a distributed system isn’t that easy  Need to worry about communication, failures, initialization, etc.  MapReduce frameworks worry about all those  You write map & reduce functions & call framework  It forces you to think parallel in design time  It gives you a higher-level of abstraction to think in  It’s very generic, & covers lot of usecases  See http://wiki.apache.org/hadoop/PoweredBy 10
  • 11. Map-Reduce Implementations  Can be implemented in many ways  In-memory implementation  Distributed implementation  Communication by messages  Communication by file system  Communication by databases  Communication Requirements  Need broadcast & reduce operations only 11
  • 12. Map-Reduce with Hadoop  Apache Hadoop is an implementation of Map- reduce  Handles all details about distributed execution  You just have to give Map & Reduce functions 12
  • 13. Map-Reduce Data Model 13 Source: http://slides.com/bearrito/pittsburgh-nosql-_-mapreduce#/
  • 14. Map-Reduce Data Model (Cont.)  Hadoop breaks input data into multiple data items by new lines & runs map function once for each data item  When executed, map function outputs (key, value) pairs  Hadoop collects all (key, value) pairs generated by map function, sorts them by the key, & groups values with the same key together into groups  For each distinct key, Hadoop runs reduce function once while passing key & list of values for that key as input  Reduce function outputs (key, value) pairs, & Hadoop writes them to a file as final result 14
  • 15. Execution on a Cluster/Cloud 15 Source: www.cbsolution.net/techniques/ontarget/mapreduce_vs_data_warehouse
  • 16. MapReduce Execution 16 Source: Dean et. al., “MapReduce, OSDI, 2004
  • 17. Designing Map-Reduce Applications  You control task granularity by changing no of map & reduce tasks  How many map tasks?  How many reduce tasks?  Fine Grain  more parallelism  more communication overhead and vise versa  Usually frameworks handle load balancing & failures  If large number of maps are there, you need a Combine Function as well 17
  • 18. Examples  Sorting  How to sort an array of 1 million integers using MapReduce?  Inverted Index  Normal index is a mapping from document to terms  Inverted index is mapping from terms to documents  If we have a million documents, how do we build a inverted index using MapReduce?  Frequency Distribution of Word Occurrences  Count number of occurrences & build a histogram 18
  • 19. Examples (Cont.)  Stitch Imagery  For Google maps, Google need to combine many map data into a single set of data  Business Intelligence  A business want to create a graph of income generated by each region & marketing money spend on each region 19
  • 20. Examples (Cont.)  K-Means  Assume you are given a list of earth quakes coordinates happened in the world in last 50 years.  You are asked to use K-Means Clustering algorithm to find 10 locations around which those earth quakes were located.  K-Means starts with 10 random cluster locations.  It proceeds iteratively, & at each iteration, it assigns each data point (earth quake) to the closest cluster location  At end of each iteration, it recalculates each cluster location using mean of all data point coordinates assigned to that location  It stops when cluster locations doesn’t change after recalculation 20
  • 21. K-Means Algorithm List kmeans(datapointsList , initialClustersList){ oldlocations = null; newLocations = initialClustersList ; while(oldlocations != newLocations){ for(d in datapointsList){ oldlocations = newLocations ; newLocations = //recalculate locations } //assign d to closest location in newLocations } } return newLocations ; 21

Notes de l'éditeur

  1. Shovel example