3. Motivation
Page 3
• This is the age of big data and distributed data processing
frameworks are key to analyzing them
• Companies such as Google (MapReduce), Microsoft (Naiad)
and open-source communities such as Apache (Hadoop, Spark)
have proposed such frameworks
– require developers to follow a functional programming model
Garefalakis, Panagiotis, et al. "ACaZoo: A Distributed Key-Value Store based on Replicated LSM-Trees."
6. Motivating Example
Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.Page 6
7. PageRank in Map-Reduce
Page 7 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Dataflow models do not expose global state!
8. PageRank with RPC/MPI
Page 8 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
9. Piccolo’s Goal: Distributed Shared State
Page 9 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
• Expose this state in a useful form for the programmer but not deal with communication
• Interact with state and graph data and not with machines
10. Piccolo programming model
Page 10 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
• Need an easy and effective way to access and represent the sate in matter of performance
• We need the right level of abstraction
11. PageRank with Piccolo
Page 11 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
12. Piccolo - Locality
Page 12 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
• Communication between machines is slow!
13. Piccolo - Locality
Page 13 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
• We need to exploit locality!
14. PageRank with Piccolo Updated
Page 14 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
15. Piccolo - Synchronization
Page 15
Avoid write conflicts with accumulation functions
•NewValue = Accum(OldValue, Update)
•sum, product, min, max
Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
16. PageRank with Piccolo Updated
Page 16 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
17. Piccolo - Failure Recovery
Page 17 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
18. PageRank with Piccolo Updated
Page 18 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
19. Piccolo Evaluation
• 12 nodes cluster, 64 cores
• 100M-page graph
Page 19
Piccolo Evaluation
Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
20. Piccolo Evaluation
• EC2 Cluster – linearly scaled the amount of data in proportion with the
number of workers
Page 20 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.