The document discusses K-Means clustering and how to parallelize it using the Bulk Synchronous Parallel (BSP) model. It explains that K-Means clustering groups data points into K clusters based on their features, and the standard algorithm assigns points to the nearest cluster center and updates the centers iteratively. To parallelize it with BSP, the data is partitioned across processes, each process performs local computations and exchanges partial results through message passing barriers, and a new set of cluster centers is calculated globally at each superstep. Benchmark results showed the parallel BSP approach achieved logarithmic scaling and outperformed the linear scaling of MapReduce.
9. What is K-Means Clustering?
Unsupervised Learning
Huge number of input vectors
k initial centers
Two step iterative algorithm
Assignment
Update
9/33
11. What is BSP?
BSP = Bulk Synchronous Parallel
Paradigm to design parallel algorithms
Two basic operations
Send message
Barrier synchronization
11/33
12. What is BSP?
P1 P2 P3
Computation
Superstep Sync
Communication
Sync
12/33
13. What is BSP?
Computation phase is queuing messages
Within two barrier synchronizations messages are
exchanged in bulk
Messages from previous superstep are available in
next superstep
13
15. K-Means with BSP
Put centers into RAM on each process
Centers
Sum assigned vectors to a new temporary center object
Iterate sequentially over vectors on disk
15/33
17. K-Means with BSP
Sums
Centers • Center 1
• Sum=25
• 5 times summed
• Center 2
• Sum=50
• 10 times summed
• Center 3
• Sum=10
• 5 times summed
17/33
18. K-Means with BSP
Sum
Centers
Send the sum Sum
Centers
Sum
Centers
Sum
Centers
19. K-Means with BSP
Sum
Centers
Send the sum Sum
Centers
Sum
Centers
Sum
Centers
20. K-Means mit BSP
Centers Sum
Sum • The same calculation
on every process
Sum • Floating point error
Sum can be corrected by
Divide by total synchronizing when
increments
Total it exceeds a given
Means
Sum threshold
New Centers
20/33
22. K-Means with BSP
Partition vectors into equal sized blocks
# Blocks = # Tasks
Put centers in RAM
Assignmentphase
Iterative vectors on disk sequentially
Sum up temporary centers with assigned vectors
Message all tasks with sum and how often something was
summed
Updatephase
Calculate the total sum over all received messages and average
Replace old centers with new centers and calc convergence
22/33
23. Benchmark
16 Server, 256 Cores, 10G network
80 seconds!
Possible
starvation: add
more servers