SlideShare une entreprise Scribd logo
1  sur  26
17-11-2014 © Imperial College LondonPage 1
Piccolo: Building Fast, Distributed
Programs with Partitioned Tables
Presenter: Panagiotis Garefalakis
Course 590 - Academic Writing
Russell Power and Jinyang Li - New York University
Outline
• Motivation
• Background
• Piccolo
– Challenges
– Contribution
– Evaluation
• Conclusion
• Discussion
17-11-2014 © Imperial College LondonPage 2
Motivation
Page 3
• This is the age of big data and distributed data processing
frameworks are key to analyzing them
• Companies such as Google (MapReduce), Microsoft (Naiad)
and open-source communities such as Apache (Hadoop, Spark)
have proposed such frameworks
– require developers to follow a functional programming model
Garefalakis, Panagiotis, et al. "ACaZoo: A Distributed Key-Value Store based on Replicated LSM-Trees."
Motivation
17-11-2014 © Imperial College London
• Scaling out: Processing data is quick, I/O is very slow
– 􏰀 1 HDD = 75 MB/sec
– 􏰀 1000 HDDs = 75 GB/sec
• For data-intensive workloads, a large number of
commodity servers is preferred over a small number
of high-end servers
– 􏰀 Cost of super-computers is not linear
– 􏰀 But datacenter efficiency is a difficult problem to solve
Page 4
MapReduce
17-11-2014 © Imperial College LondonPage 5
• Partition a large problem into smaller sub-problems
• 􏰀Independent sub-problems executed in parallel
• Combine intermediate results from each individual node (worker)
Parallel problems which
are independent
(shared nothing)
Computations depend
on fragments of the
dataset
Motivating Example
Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.Page 6
PageRank in Map-Reduce
Page 7 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Dataflow models do not expose global state!
PageRank with RPC/MPI
Page 8 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Piccolo’s Goal: Distributed Shared State
Page 9 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
• Expose this state in a useful form for the programmer but not deal with communication
• Interact with state and graph data and not with machines
Piccolo programming model
Page 10 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
• Need an easy and effective way to access and represent the sate in matter of performance
• We need the right level of abstraction
PageRank with Piccolo
Page 11 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Piccolo - Locality
Page 12 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
• Communication between machines is slow!
Piccolo - Locality
Page 13 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
• We need to exploit locality!
PageRank with Piccolo Updated
Page 14 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Piccolo - Synchronization
Page 15
Avoid write conflicts with accumulation functions
•NewValue = Accum(OldValue, Update)
•sum, product, min, max
Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
PageRank with Piccolo Updated
Page 16 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Piccolo - Failure Recovery
Page 17 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
PageRank with Piccolo Updated
Page 18 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Piccolo Evaluation
• 12 nodes cluster, 64 cores
• 100M-page graph
Page 19
Piccolo Evaluation
Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Piccolo Evaluation
• EC2 Cluster – linearly scaled the amount of data in proportion with the
number of workers
Page 20 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Conclusion
• Parallel in memory applications might need to access
and share intermediate state which resides in
different machines
• Piccolo provides a programming model supporting
distributed shared table model
• It provides user-specified policies for
– Effective use of locality
– Efficient synchronization
– Robust failure recovery
17-11-2014 © Imperial College LondonPage 21
Limitations??
17-11-2014 © Imperial College LondonPage 22
Limitations??
• Aggregate functions are not always an
option
• The shared state should fit in memory
• If a node fails you should restore all nodes
to the last checkpoint
17-11-2014 © Imperial College LondonPage 23
Paper Comments
• Piccolo paper is clear and concise with extensive evaluation
• It was published in 2010 and it was presented in a top-tier
systems conference (OSDI) collocated with USENIX annual
conference
• Is cited 100 time according to Google Scholar
• The reason: It introduces a new programming model for sharing
mutable state in parallel applications
• Map-Reduce which can be considered a de-facto standard for
parallel execution does not support sharing state
• It continues getting attention as it is an open research area
17-11-2014 © Imperial College LondonPage 24
17-11-2014 © Imperial College LondonPage 25
Panagiotis Garefalakis 17/11/2014
Review Presentation
Course 590 - Academic Writing
Backup - LB
17-11-2014 © Imperial College LondonPage 26

Contenu connexe

Similaire à Pgaref Piccolo Building Fast, Distributed Programs with Partitioned Tables

Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...IndicThreads
 
How Microsoft Built and Scaled Cosmos
How Microsoft Built and Scaled CosmosHow Microsoft Built and Scaled Cosmos
How Microsoft Built and Scaled CosmosSingleStore
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Adrian Cockcroft
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing Worldinside-BigData.com
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.pptVipin Singhal
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.pptgeminass1
 
Bleeding, Leading, or Not Competing
Bleeding, Leading, or Not CompetingBleeding, Leading, or Not Competing
Bleeding, Leading, or Not CompetingRobert H. McDonald
 
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_dbhyeongchae lee
 
browserCloud.js - David Dias M.Sc Thesis Defense Deck
browserCloud.js - David Dias M.Sc Thesis Defense Deck browserCloud.js - David Dias M.Sc Thesis Defense Deck
browserCloud.js - David Dias M.Sc Thesis Defense Deck David Dias
 
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...Docker, Inc.
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)Emmanuel Olowosulu
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15MLconf
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConfQubole
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMPowering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMAlluxio, Inc.
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...David Wallom
 
Eduserv Education Cloud
Eduserv Education CloudEduserv Education Cloud
Eduserv Education CloudAndy Powell
 

Similaire à Pgaref Piccolo Building Fast, Distributed Programs with Partitioned Tables (20)

Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
 
How Microsoft Built and Scaled Cosmos
How Microsoft Built and Scaled CosmosHow Microsoft Built and Scaled Cosmos
How Microsoft Built and Scaled Cosmos
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Java Spring
Java SpringJava Spring
Java Spring
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing World
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
Bleeding, Leading, or Not Competing
Bleeding, Leading, or Not CompetingBleeding, Leading, or Not Competing
Bleeding, Leading, or Not Competing
 
Spark
SparkSpark
Spark
 
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db
 
browserCloud.js - David Dias M.Sc Thesis Defense Deck
browserCloud.js - David Dias M.Sc Thesis Defense Deck browserCloud.js - David Dias M.Sc Thesis Defense Deck
browserCloud.js - David Dias M.Sc Thesis Defense Deck
 
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMPowering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 
Eduserv Education Cloud
Eduserv Education CloudEduserv Education Cloud
Eduserv Education Cloud
 

Plus de Panagiotis Garefalakis

Accelerating distributed joins in Apache Hive: Runtime filtering enhancements
Accelerating distributed joins in Apache Hive: Runtime filtering enhancementsAccelerating distributed joins in Apache Hive: Runtime filtering enhancements
Accelerating distributed joins in Apache Hive: Runtime filtering enhancementsPanagiotis Garefalakis
 
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch ApplicationsNeptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch ApplicationsPanagiotis Garefalakis
 
Medea: Scheduling of Long Running Applications in Shared Production Clusters
Medea: Scheduling of Long Running Applications in Shared Production ClustersMedea: Scheduling of Long Running Applications in Shared Production Clusters
Medea: Scheduling of Long Running Applications in Shared Production ClustersPanagiotis Garefalakis
 

Plus de Panagiotis Garefalakis (8)

Accelerating distributed joins in Apache Hive: Runtime filtering enhancements
Accelerating distributed joins in Apache Hive: Runtime filtering enhancementsAccelerating distributed joins in Apache Hive: Runtime filtering enhancements
Accelerating distributed joins in Apache Hive: Runtime filtering enhancements
 
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch ApplicationsNeptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications
 
Medea: Scheduling of Long Running Applications in Shared Production Clusters
Medea: Scheduling of Long Running Applications in Shared Production ClustersMedea: Scheduling of Long Running Applications in Shared Production Clusters
Medea: Scheduling of Long Running Applications in Shared Production Clusters
 
Mres presentation
Mres presentationMres presentation
Mres presentation
 
Dais 2013 2 6 june
Dais 2013 2 6 juneDais 2013 2 6 june
Dais 2013 2 6 june
 
Master presentation-21-7-2014
Master presentation-21-7-2014Master presentation-21-7-2014
Master presentation-21-7-2014
 
Storage managment using nagios
Storage managment using nagiosStorage managment using nagios
Storage managment using nagios
 
Ithings2012 20nov
Ithings2012 20novIthings2012 20nov
Ithings2012 20nov
 

Dernier

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxNikitaBankoti2
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 

Dernier (20)

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 

Pgaref Piccolo Building Fast, Distributed Programs with Partitioned Tables

  • 1. 17-11-2014 © Imperial College LondonPage 1 Piccolo: Building Fast, Distributed Programs with Partitioned Tables Presenter: Panagiotis Garefalakis Course 590 - Academic Writing Russell Power and Jinyang Li - New York University
  • 2. Outline • Motivation • Background • Piccolo – Challenges – Contribution – Evaluation • Conclusion • Discussion 17-11-2014 © Imperial College LondonPage 2
  • 3. Motivation Page 3 • This is the age of big data and distributed data processing frameworks are key to analyzing them • Companies such as Google (MapReduce), Microsoft (Naiad) and open-source communities such as Apache (Hadoop, Spark) have proposed such frameworks – require developers to follow a functional programming model Garefalakis, Panagiotis, et al. "ACaZoo: A Distributed Key-Value Store based on Replicated LSM-Trees."
  • 4. Motivation 17-11-2014 © Imperial College London • Scaling out: Processing data is quick, I/O is very slow – 􏰀 1 HDD = 75 MB/sec – 􏰀 1000 HDDs = 75 GB/sec • For data-intensive workloads, a large number of commodity servers is preferred over a small number of high-end servers – 􏰀 Cost of super-computers is not linear – 􏰀 But datacenter efficiency is a difficult problem to solve Page 4
  • 5. MapReduce 17-11-2014 © Imperial College LondonPage 5 • Partition a large problem into smaller sub-problems • 􏰀Independent sub-problems executed in parallel • Combine intermediate results from each individual node (worker) Parallel problems which are independent (shared nothing) Computations depend on fragments of the dataset
  • 6. Motivating Example Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.Page 6
  • 7. PageRank in Map-Reduce Page 7 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010. Dataflow models do not expose global state!
  • 8. PageRank with RPC/MPI Page 8 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 9. Piccolo’s Goal: Distributed Shared State Page 9 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010. • Expose this state in a useful form for the programmer but not deal with communication • Interact with state and graph data and not with machines
  • 10. Piccolo programming model Page 10 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010. • Need an easy and effective way to access and represent the sate in matter of performance • We need the right level of abstraction
  • 11. PageRank with Piccolo Page 11 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 12. Piccolo - Locality Page 12 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010. • Communication between machines is slow!
  • 13. Piccolo - Locality Page 13 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010. • We need to exploit locality!
  • 14. PageRank with Piccolo Updated Page 14 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 15. Piccolo - Synchronization Page 15 Avoid write conflicts with accumulation functions •NewValue = Accum(OldValue, Update) •sum, product, min, max Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 16. PageRank with Piccolo Updated Page 16 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 17. Piccolo - Failure Recovery Page 17 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 18. PageRank with Piccolo Updated Page 18 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 19. Piccolo Evaluation • 12 nodes cluster, 64 cores • 100M-page graph Page 19 Piccolo Evaluation Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 20. Piccolo Evaluation • EC2 Cluster – linearly scaled the amount of data in proportion with the number of workers Page 20 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 21. Conclusion • Parallel in memory applications might need to access and share intermediate state which resides in different machines • Piccolo provides a programming model supporting distributed shared table model • It provides user-specified policies for – Effective use of locality – Efficient synchronization – Robust failure recovery 17-11-2014 © Imperial College LondonPage 21
  • 22. Limitations?? 17-11-2014 © Imperial College LondonPage 22
  • 23. Limitations?? • Aggregate functions are not always an option • The shared state should fit in memory • If a node fails you should restore all nodes to the last checkpoint 17-11-2014 © Imperial College LondonPage 23
  • 24. Paper Comments • Piccolo paper is clear and concise with extensive evaluation • It was published in 2010 and it was presented in a top-tier systems conference (OSDI) collocated with USENIX annual conference • Is cited 100 time according to Google Scholar • The reason: It introduces a new programming model for sharing mutable state in parallel applications • Map-Reduce which can be considered a de-facto standard for parallel execution does not support sharing state • It continues getting attention as it is an open research area 17-11-2014 © Imperial College LondonPage 24
  • 25. 17-11-2014 © Imperial College LondonPage 25 Panagiotis Garefalakis 17/11/2014 Review Presentation Course 590 - Academic Writing
  • 26. Backup - LB 17-11-2014 © Imperial College LondonPage 26