SlideShare une entreprise Scribd logo
1  sur  3
Télécharger pour lire hors ligne
Pregel: A System for Large-Scale Graph
                 Processing
                                     Paper Review

                                   Maria Stylianou

                                 November 2, 2012


1 Motivation
Nowadays, large-scale graphs, like the Web graph and social networks, are among the
main sources of new computing problems. Processing such graphs efficiently can be a
challenge. MapReduce can be a solution, though very inefficient due to the require-
ment of passing the entire state of the graph from one stage to another. Hence, the
authors propound Pregel, a distributed programming model especially designed to ad-
dress the processing of large-scale graphs, which preserves efficiency, scalability and
fault-tolerance[1].


2 Contributions
So far, there was a gap in the area of frameworks for large-scale graphs processing that
can offer scalability, while being distributed and fault-tolerant. Pregel is exactly designed
with these characteristics. The authors designed Pregel for the Google cluster architec-
ture, in which clusters are interconnected and geographically distributed, and each one
of them containing thousands of commodity machines. Their main contributions in-
clude: 1. Design of a fault-tolerant distributed programming framework for enabling
execution of graph algorithms in parallel over thousands of machines. 2. Provision of
an API with direct message-passing among vertices, combiners for reducing overhead,
aggregators for global communication and monitoring, and lastly topology mutations by
solving conflicting requests.


3 Solution
Pregel operates as a repeated synchronized computation process on vertices. Upon
inserting a graph as an input, the graph is divided into partitions, which include a set
of vertices and their outgoing edges. The vertices are assigned to machines and one of




                                             1
them acts as a master for coordinating the worker machines. The workers then undergo
a series of iterations, called supersteps. In every superstep, all vertices in each worker
execute the same user-defined function which can (a) receive messages sent during the
previous superstep, (b) modify the state of the vertex and its outgoing edges (vertices
and edges are kept on the machines) and (c) send messages to be delivered during the
next superstep. At the end of each superstep a global synchronization point occurs.
Vertices can become inactive and the sequence of iterations terminates when all vertices
are inactive and there are no messages in transit. During computation, the master also
sends ping messages to check for workers failures. The network is used only for sending
messages and therefore it significantly reduces the communication overhead, becoming
more efficient.


4 Strong Points
S1 Fault-tolerance is achieved with the use of checkpoints, in which the state of nodes’
     partitions is saved to a persistent storage. Upon a machine failure during compu-
     tation, the rest of the machines reload their partition state from the most recent
     checkpoint.
S2 Combiners are an optimization for less network traffic and can be manually enabled
    by the user. With this option, messages can be combined and sent in a single
    message, reducing the overhead.
S3 Aggregators are a mechanism for global communication and monitoring. They have
    different uses, like: in statistics, for global coordination or even in more advanced
    implementations. . . .


5 Weak Points
W1 The user has to modify Pregel a lot in order to personalise it to his/her needs.
    More precisely, the user has to code for enabling combiners and for customizing
    aggregators. Additionally, the user is responsible for solving conflicting requests.
    He/She needs to define handlers, which increases the complexity in the system.
W2 No failure detection is mentioned for the master, making it a single point of failure.
W3 The evaluation presented in the paper is very limited with very little explanation.
    There is no clear comparison with other systems. An experimental comparison
    with MapReduce would be an interesting approach. Also, there is no experiment
    evaluating the fault-tolerance of the system. . . .


References
[1] G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Cza-
    jkowski, “Pregel: a system for large-scale graph processing,” in Proceedings of the



                                            2
2010 ACM SIGMOD International Conference on Management of data, SIGMOD
’10, (New York, NY, USA), pp. 135–146, ACM, 2010.




                                 3

Contenu connexe

Tendances

"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea
 
MapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large ClustersMapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large Clusterskazuma_sato
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacmlmphuong06
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersAbolfazl Asudeh
 
Page a partition aware engine
Page a partition aware enginePage a partition aware engine
Page a partition aware enginejpstudcorner
 
IEEE Projects 2015 | Page a partition aware engine for parallel graph computa...
IEEE Projects 2015 | Page a partition aware engine for parallel graph computa...IEEE Projects 2015 | Page a partition aware engine for parallel graph computa...
IEEE Projects 2015 | Page a partition aware engine for parallel graph computa...1crore projects
 
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...Ural-PDC
 
Parallel Processing
Parallel ProcessingParallel Processing
Parallel ProcessingRTigger
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processingjins0618
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersAshraf Uddin
 
MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?Spyros Eleftheriadis
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updatedVajira Thambawita
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologiesjaliyae
 
Page a partition aware engine for parallel graph computation
Page a partition aware engine for parallel graph computationPage a partition aware engine for parallel graph computation
Page a partition aware engine for parallel graph computationCloudTechnologies
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Yahoo Developer Network
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Casesmathieuraj
 

Tendances (20)

"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
MapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large ClustersMapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large Clusters
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacm
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large Clusters
 
Page a partition aware engine
Page a partition aware enginePage a partition aware engine
Page a partition aware engine
 
IEEE Projects 2015 | Page a partition aware engine for parallel graph computa...
IEEE Projects 2015 | Page a partition aware engine for parallel graph computa...IEEE Projects 2015 | Page a partition aware engine for parallel graph computa...
IEEE Projects 2015 | Page a partition aware engine for parallel graph computa...
 
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
 
Parallel Processing
Parallel ProcessingParallel Processing
Parallel Processing
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
FrackingPaper
FrackingPaperFrackingPaper
FrackingPaper
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters
 
MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updated
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologies
 
Page a partition aware engine for parallel graph computation
Page a partition aware engine for parallel graph computationPage a partition aware engine for parallel graph computation
Page a partition aware engine for parallel graph computation
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Cases
 

Similaire à Pregel - Paper Review

Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 
program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer ArchitecturePankaj Kumar Jain
 
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATIONPAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATIONNexgen Technology
 
Page a partition aware engine
Page a partition aware enginePage a partition aware engine
Page a partition aware enginenexgentech15
 
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
 PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATIONNexgen Technology
 
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...Shakas Technologies
 
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...ijassn
 
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...ijassn
 
Load balancing in Distributed Systems
Load balancing in Distributed SystemsLoad balancing in Distributed Systems
Load balancing in Distributed SystemsRicha Singh
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 
Performance evaluation of larger matrices over cluster of four nodes using mpi
Performance evaluation of larger matrices over cluster of four nodes using mpiPerformance evaluation of larger matrices over cluster of four nodes using mpi
Performance evaluation of larger matrices over cluster of four nodes using mpieSAT Journals
 
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...csandit
 
Improving the Performance of Mapping based on Availability- Alert Algorithm U...
Improving the Performance of Mapping based on Availability- Alert Algorithm U...Improving the Performance of Mapping based on Availability- Alert Algorithm U...
Improving the Performance of Mapping based on Availability- Alert Algorithm U...AM Publications
 
Sharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow ApplicationsSharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow Applicationsijcsit
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computinghuda2018
 

Similaire à Pregel - Paper Review (20)

Facade
FacadeFacade
Facade
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
 
program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer Architecture
 
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATIONPAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
 
Page a partition aware engine
Page a partition aware enginePage a partition aware engine
Page a partition aware engine
 
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
 PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
 
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
 
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
 
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Load balancing in Distributed Systems
Load balancing in Distributed SystemsLoad balancing in Distributed Systems
Load balancing in Distributed Systems
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Performance evaluation of larger matrices over cluster of four nodes using mpi
Performance evaluation of larger matrices over cluster of four nodes using mpiPerformance evaluation of larger matrices over cluster of four nodes using mpi
Performance evaluation of larger matrices over cluster of four nodes using mpi
 
International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)
 
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...
 
Improving the Performance of Mapping based on Availability- Alert Algorithm U...
Improving the Performance of Mapping based on Availability- Alert Algorithm U...Improving the Performance of Mapping based on Availability- Alert Algorithm U...
Improving the Performance of Mapping based on Availability- Alert Algorithm U...
 
Sharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow ApplicationsSharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow Applications
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
Ie3514301434
Ie3514301434Ie3514301434
Ie3514301434
 

Plus de Maria Stylianou

Quantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksQuantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksMaria Stylianou
 
Scaling Online Social Networks (OSNs)
Scaling Online Social Networks (OSNs)Scaling Online Social Networks (OSNs)
Scaling Online Social Networks (OSNs)Maria Stylianou
 
Green Optical Networks with Signal Quality Guarantee
Green Optical Networks with Signal Quality Guarantee Green Optical Networks with Signal Quality Guarantee
Green Optical Networks with Signal Quality Guarantee Maria Stylianou
 
Cano projectGreen Optical Networks with Signal Quality Guarantee
Cano projectGreen Optical Networks with Signal Quality Guarantee Cano projectGreen Optical Networks with Signal Quality Guarantee
Cano projectGreen Optical Networks with Signal Quality Guarantee Maria Stylianou
 
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...Maria Stylianou
 
Performance Analysis of multithreaded applications based on Hardware Simulati...
Performance Analysis of multithreaded applications based on Hardware Simulati...Performance Analysis of multithreaded applications based on Hardware Simulati...
Performance Analysis of multithreaded applications based on Hardware Simulati...Maria Stylianou
 
Automatic Energy-based Scheduling
Automatic Energy-based SchedulingAutomatic Energy-based Scheduling
Automatic Energy-based SchedulingMaria Stylianou
 
Intelligent Placement of Datacenters for Internet Services
Intelligent Placement of Datacenters for Internet ServicesIntelligent Placement of Datacenters for Internet Services
Intelligent Placement of Datacenters for Internet ServicesMaria Stylianou
 
Instrumenting the MG applicaiton of NAS Parallel Benchmark
Instrumenting the MG applicaiton of NAS Parallel BenchmarkInstrumenting the MG applicaiton of NAS Parallel Benchmark
Instrumenting the MG applicaiton of NAS Parallel BenchmarkMaria Stylianou
 
Low-Latency Multi-Writer Atomic Registers
Low-Latency Multi-Writer Atomic RegistersLow-Latency Multi-Writer Atomic Registers
Low-Latency Multi-Writer Atomic RegistersMaria Stylianou
 
How Companies Learn Your Secrets
How Companies Learn Your SecretsHow Companies Learn Your Secrets
How Companies Learn Your SecretsMaria Stylianou
 
EEDC - Why use of REST for Web Services
EEDC - Why use of REST for Web Services EEDC - Why use of REST for Web Services
EEDC - Why use of REST for Web Services Maria Stylianou
 
EEDC - Distributed Systems
EEDC - Distributed SystemsEEDC - Distributed Systems
EEDC - Distributed SystemsMaria Stylianou
 

Plus de Maria Stylianou (15)

Quantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksQuantum Cryptography and Possible Attacks
Quantum Cryptography and Possible Attacks
 
Scaling Online Social Networks (OSNs)
Scaling Online Social Networks (OSNs)Scaling Online Social Networks (OSNs)
Scaling Online Social Networks (OSNs)
 
Erlang in 10 minutes
Erlang in 10 minutesErlang in 10 minutes
Erlang in 10 minutes
 
Google's Dremel
Google's DremelGoogle's Dremel
Google's Dremel
 
Green Optical Networks with Signal Quality Guarantee
Green Optical Networks with Signal Quality Guarantee Green Optical Networks with Signal Quality Guarantee
Green Optical Networks with Signal Quality Guarantee
 
Cano projectGreen Optical Networks with Signal Quality Guarantee
Cano projectGreen Optical Networks with Signal Quality Guarantee Cano projectGreen Optical Networks with Signal Quality Guarantee
Cano projectGreen Optical Networks with Signal Quality Guarantee
 
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
 
Performance Analysis of multithreaded applications based on Hardware Simulati...
Performance Analysis of multithreaded applications based on Hardware Simulati...Performance Analysis of multithreaded applications based on Hardware Simulati...
Performance Analysis of multithreaded applications based on Hardware Simulati...
 
Automatic Energy-based Scheduling
Automatic Energy-based SchedulingAutomatic Energy-based Scheduling
Automatic Energy-based Scheduling
 
Intelligent Placement of Datacenters for Internet Services
Intelligent Placement of Datacenters for Internet ServicesIntelligent Placement of Datacenters for Internet Services
Intelligent Placement of Datacenters for Internet Services
 
Instrumenting the MG applicaiton of NAS Parallel Benchmark
Instrumenting the MG applicaiton of NAS Parallel BenchmarkInstrumenting the MG applicaiton of NAS Parallel Benchmark
Instrumenting the MG applicaiton of NAS Parallel Benchmark
 
Low-Latency Multi-Writer Atomic Registers
Low-Latency Multi-Writer Atomic RegistersLow-Latency Multi-Writer Atomic Registers
Low-Latency Multi-Writer Atomic Registers
 
How Companies Learn Your Secrets
How Companies Learn Your SecretsHow Companies Learn Your Secrets
How Companies Learn Your Secrets
 
EEDC - Why use of REST for Web Services
EEDC - Why use of REST for Web Services EEDC - Why use of REST for Web Services
EEDC - Why use of REST for Web Services
 
EEDC - Distributed Systems
EEDC - Distributed SystemsEEDC - Distributed Systems
EEDC - Distributed Systems
 

Dernier

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Dernier (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Pregel - Paper Review

  • 1. Pregel: A System for Large-Scale Graph Processing Paper Review Maria Stylianou November 2, 2012 1 Motivation Nowadays, large-scale graphs, like the Web graph and social networks, are among the main sources of new computing problems. Processing such graphs efficiently can be a challenge. MapReduce can be a solution, though very inefficient due to the require- ment of passing the entire state of the graph from one stage to another. Hence, the authors propound Pregel, a distributed programming model especially designed to ad- dress the processing of large-scale graphs, which preserves efficiency, scalability and fault-tolerance[1]. 2 Contributions So far, there was a gap in the area of frameworks for large-scale graphs processing that can offer scalability, while being distributed and fault-tolerant. Pregel is exactly designed with these characteristics. The authors designed Pregel for the Google cluster architec- ture, in which clusters are interconnected and geographically distributed, and each one of them containing thousands of commodity machines. Their main contributions in- clude: 1. Design of a fault-tolerant distributed programming framework for enabling execution of graph algorithms in parallel over thousands of machines. 2. Provision of an API with direct message-passing among vertices, combiners for reducing overhead, aggregators for global communication and monitoring, and lastly topology mutations by solving conflicting requests. 3 Solution Pregel operates as a repeated synchronized computation process on vertices. Upon inserting a graph as an input, the graph is divided into partitions, which include a set of vertices and their outgoing edges. The vertices are assigned to machines and one of 1
  • 2. them acts as a master for coordinating the worker machines. The workers then undergo a series of iterations, called supersteps. In every superstep, all vertices in each worker execute the same user-defined function which can (a) receive messages sent during the previous superstep, (b) modify the state of the vertex and its outgoing edges (vertices and edges are kept on the machines) and (c) send messages to be delivered during the next superstep. At the end of each superstep a global synchronization point occurs. Vertices can become inactive and the sequence of iterations terminates when all vertices are inactive and there are no messages in transit. During computation, the master also sends ping messages to check for workers failures. The network is used only for sending messages and therefore it significantly reduces the communication overhead, becoming more efficient. 4 Strong Points S1 Fault-tolerance is achieved with the use of checkpoints, in which the state of nodes’ partitions is saved to a persistent storage. Upon a machine failure during compu- tation, the rest of the machines reload their partition state from the most recent checkpoint. S2 Combiners are an optimization for less network traffic and can be manually enabled by the user. With this option, messages can be combined and sent in a single message, reducing the overhead. S3 Aggregators are a mechanism for global communication and monitoring. They have different uses, like: in statistics, for global coordination or even in more advanced implementations. . . . 5 Weak Points W1 The user has to modify Pregel a lot in order to personalise it to his/her needs. More precisely, the user has to code for enabling combiners and for customizing aggregators. Additionally, the user is responsible for solving conflicting requests. He/She needs to define handlers, which increases the complexity in the system. W2 No failure detection is mentioned for the master, making it a single point of failure. W3 The evaluation presented in the paper is very limited with very little explanation. There is no clear comparison with other systems. An experimental comparison with MapReduce would be an interesting approach. Also, there is no experiment evaluating the fault-tolerance of the system. . . . References [1] G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Cza- jkowski, “Pregel: a system for large-scale graph processing,” in Proceedings of the 2
  • 3. 2010 ACM SIGMOD International Conference on Management of data, SIGMOD ’10, (New York, NY, USA), pp. 135–146, ACM, 2010. 3