SlideShare a Scribd company logo
1 of 22
Mihai Capotã, Arnau Prat, Peter Boncz, Hassan Chafi
Yong Guo,
Ana Lucia Varbanescu,
Graphalytics: Benchmarking Graph-Processing Platforms
LDBC TUC Meeting
UPC Barcelona, March 2016
GRAPHALYTICS
A Big Data Benchmark for Graph-Processing Platforms
1
http://bl.ocks.org/mbostock/4062045
Tim Hegeman,
Wing Lung Ngai,
https://github.com/tudelft-atlarge/graphalytics/
GRAPHALYTICS was made
possible by a generous
contribution from Oracle.
Alexandru Iosup,
Stijn Heldens,
Graphs at the Core of Our Society:
The LinkedIn ExampleData Deluge
2
Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/
via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/
Apr 2014
400
Nov 2015
400
LinkedIn Is Not Unique: Data Deluge
270M MAU
200+ avg followers
>54B edges
1.2B MAU 0.8B DAU
200+ avg followers
>240B edges
company/day:
100+ posts, 1,000+ comments
IBM 280k employee-
-users, 2.6M followers
Graph Processing @large
4
A Graph Processing
Platform
Streaming not considered in this presentation.
Interactive processing not considered in this presentation.
AlgorithmETL
Active Storage
(filtering, compression,
replication, caching)
Distribution
to processing
platform
Graph Processing @large
5
A Graph Processing
Platform
Streaming not considered in this presentation.
Interactive processing not considered in this presentation.
AlgorithmETL
Active Storage
(filtering, compression,
replication, caching)
Distribution
to processing
platform
Ideally,
N cores/disks
 Nx faster
Ideally,
N cores/disks
 Nx faster
Graph Processing @large
6
A Graph Processing
Platform
Streaming not considered in this presentation.
Interactive processing not considered in this presentation.
AlgorithmETL
Active Storage
(filtering, compression,
replication, caching)
Distribution
to processing
platform
Ideally,
N cores/disks
 Nx faster
Ideally,
N cores/disks
 Nx faster
Compute-intesive workload
different/more complex analysis  ?x slower
Dataset-dependent workload
unfriendly graphs  ??x slower
Data-intesive workload
10x graph size  100x—1,000x slower
Graph-Processing Platforms
• Platform: the combined hardware, software, and
programming system that is being used to complete
a graph processing task
7
Trinity
2
Which to choose?
What to tune?
Graphalytics, in a nutshell
• An LDBC benchmark*
• Advanced benchmarking harness
• Diverse real and synthetic datasets
• Many classes of algorithms
• Granula for manual choke-point analysis
• Modern software engineering practices
• Supports many platforms
• Enables comparison of
community-driven and industrial systems
8
http://graphalytics.ewi.tudelft.nl
https://github.com/tudelft-atlarge/graphalytics/
Benchmarking Harness
9
Iosup et al. LDBC Graphalytics: A Benchmark for Large
Scale Graph Analysis on Parallel and Distributed Platform (submitted).
Graphalytics = Representative
Classes of Algorithms
and Datasets
• 2-stage selection process of algorithms datasets
10
Class Examples %
Graph Statistics Diameter, Local Clust. Coeff., PageRank 20
Graph Traversal BFS, SSSP, DFS 50
Connected Comp. Reachability, BiCC, Weakly CC 10
Community
Detection
Clustering, Nearest Neighbor,
Community Detection w Label Propagation
5
Other Sampling, Partitioning <15
Guo et al. How Well do Graph-Processing Platforms Perform? An Empirical
Performance Evaluation and Analysis, IPDPS’14.
+ weighted graphs: Single-Source Shortest Paths (~35%)
Graphalytics = Distributed Graph
Generation w DATAGEN
Person
Generation
Edge
Generation
Activity
Generation
“Knows”
graph
serializa
tion
Activity
serializa
tion
Graphalytics
11
• Rich set of configurations
• More diverse degree distribution than Graph500
• Realistic clustering coefficient and assortativity
Level of Detail
Graphalytics = Portable
Perf. Analysis w Granula
Graph Processing System
Logging Patch
Performance
Analyzer
Granula
Performance
Archive
Granula
Performance
Model
Modeling
Archiving
logs
rules
Granula
Archiver
Sharing
Monitoring
Minimal code invasion + automated data collection at runtime
+ portable archive (+ web UI)  portable bottleneck analysis
Graphalytics = Diverse Set of
Automated Experiments
Category Experiment Algo. Data Nodes/
Threads
Metrics
Baseline Dataset variety BFS,PR All 1 Run, norm.
Algorithm variety All R4(S),
D300(L)
1 Runtime
Scalability Vertical vs. horiz. BFS, PR D300(L),
D1000(XL)
1—16/1—32 Runtime, S
Weak vs. strong BFS, PR G22(S)—
G26(XL)
1—16 Runtime, S
Robustness Stress test BFS All 1 SLA met
Variability BFS D300(L),
D1000(L)
1/16 CV
Self-Test Time to run/part -- Datagen 1—16 Runtime
13
Implementation status
Map
Red
uce
2
Gir
ap
h
Gra
ph
X
Pow
erGr
aph
Graph
Lab
Neo4j PG
X.D
Gra
ph
Mat
Ope
nG
TOTE
M
Map
Graph
M
ed
us
a
LCC G G G G G G -- G G -- -- --
BFS G G G G G G G G G V V V
WC
C
G G G G G G G G G V V V
CDL
P
G G G G G G G G G -- -- --
P’R
ank
-- G G G V -- G G G V V V
SSS
P
-- G G G -- -- G G G -- -- --
https://github.com/tudelft-atlarge/graphalytics/
G=validated, on GitHub
V=validation stage
Implementation status
Map
Red
uce
2
Gir
ap
h
Gra
ph
X
Pow
erGr
aph
Graph
Lab
Neo4j PG
X.D
Gra
ph
Mat
Ope
nG
TOTE
M
Map
Graph
M
ed
us
a
LCC G G G G G G -- G G -- -- --
BFS G G G G G G G G G V V V
WC
C
G G G G G G G G G V V V
CDL
P
G G G G G G G G G -- -- --
P’R
ank
-- G G G V -- G G G V V V
SSS
P
-- G G G -- -- G G G -- -- --
Benchmarking and tuning performed by vendors
G=validated, on GitHub
V=validation stage
Graphalytics Capabilities: An Example
16
Graphalytics enables deep comparison of many systems
at once, through diverse experiments and metrics
Your system here!
Diverse algorithms Diverse metrics
Diverse
datasets
Processing time (s) + Edges[+Vertices]/s
17
Which system is the best?
It depends…
Algorithm + Dataset + Metric
OK, but … why is this system better
for this workload for this metric?
Granula Visualizer
Portable choke-point analysis for everyone!
Graphalytics = Modern Software
Engineering Process
• Graphalytics code reviews
• Internal release to LDBC partners (first, Feb 2015; last, Feb 2016)
• Public release, announced first through LDBC (Apr 2015)
• First full benchmark specification, LDBC criteria (Q1 2016)
• Jenkins continuous integration server
• SonarQube software quality analyzer
19https://github.com/tudelft-atlarge/graphalytics/
Graphalytics, in the future
• An LDBC benchmark*
• Advanced benchmarking harness
• Diverse real and synthetic datasets
• Many classes of algorithms
• Granula for manual choke-point analysis
• Modern software engineering practices
• Supports many platforms
• Enables comparison of
community-driven and industrial systems
20
github.com/tudelft-atlarge/graphalytics/
+ more data generation
+ deeper performance metrics
+ choke-point analysis
PELGA – Performance Engineering for
Large-scale Graph Analytics,
workshop with EuroPar 2016
21
22

More Related Content

What's hot

Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Cases
mathieuraj
 

What's hot (18)

Benchmarking Tool for Graph Algorithms
Benchmarking Tool for Graph AlgorithmsBenchmarking Tool for Graph Algorithms
Benchmarking Tool for Graph Algorithms
 
SexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial DataSexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial Data
 
2014 july use_r
2014 july use_r2014 july use_r
2014 july use_r
 
Prague Hacks 2015
Prague Hacks 2015Prague Hacks 2015
Prague Hacks 2015
 
GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
 
OpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conferenceOpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conference
 
Data_Size_statistics
Data_Size_statisticsData_Size_statistics
Data_Size_statistics
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
 
GI2016 ppt shi (big data analytics on the internet)
GI2016 ppt shi (big data analytics on the internet)GI2016 ppt shi (big data analytics on the internet)
GI2016 ppt shi (big data analytics on the internet)
 
Distributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive AnalyticsDistributed R: The Next Generation Platform for Predictive Analytics
Distributed R: The Next Generation Platform for Predictive Analytics
 
Streaming Weather Data from Web APIs to Jupyter through Kafka
Streaming Weather Data from Web APIs to Jupyter through KafkaStreaming Weather Data from Web APIs to Jupyter through Kafka
Streaming Weather Data from Web APIs to Jupyter through Kafka
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Cases
 
Field Data Collecting, Processing and Sharing: Using web Service Technologies
Field Data Collecting, Processing and Sharing: Using web Service TechnologiesField Data Collecting, Processing and Sharing: Using web Service Technologies
Field Data Collecting, Processing and Sharing: Using web Service Technologies
 
Introduction To R
Introduction To RIntroduction To R
Introduction To R
 
CKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試みCKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試み
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaMagellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
 

Viewers also liked

Company Profile PT SKY LAB
Company Profile PT SKY LABCompany Profile PT SKY LAB
Company Profile PT SKY LAB
Gatot Wahyu
 

Viewers also liked (20)

Modelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphModelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graph
 
Holistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBITHolistic Benchmarking of Big Linked Data: HOBBIT
Holistic Benchmarking of Big Linked Data: HOBBIT
 
Benchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataBenchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked Data
 
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud EnvironmentsUse of Graphs for Cloud Service Selection in Multi-Cloud Environments
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
 
Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotPolyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivot
 
The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...The scarcity of crossing dependencies: a direct outcome of a specific constra...
The scarcity of crossing dependencies: a direct outcome of a specific constra...
 
Using Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generationUsing Evolutionary Computing for Feature-driven Graph generation
Using Evolutionary Computing for Feature-driven Graph generation
 
Identifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual NetworksIdentifiability in Dynamic Casual Networks
Identifiability in Dynamic Casual Networks
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modeling
 
Computing on Event-sourced Graphs
Computing on Event-sourced GraphsComputing on Event-sourced Graphs
Computing on Event-sourced Graphs
 
Big data Career Opportunuties
Big data  Career OpportunutiesBig data  Career Opportunuties
Big data Career Opportunuties
 
Big Data
Big DataBig Data
Big Data
 
Demystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingDemystifying Distributed Graph Processing
Demystifying Distributed Graph Processing
 
Oracle Big Data Cloud Serviceのご紹介
Oracle Big Data Cloud Serviceのご紹介Oracle Big Data Cloud Serviceのご紹介
Oracle Big Data Cloud Serviceのご紹介
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Mejores Imagenes(10) Maiwald
Mejores  Imagenes(10) MaiwaldMejores  Imagenes(10) Maiwald
Mejores Imagenes(10) Maiwald
 
Company Profile PT SKY LAB
Company Profile PT SKY LABCompany Profile PT SKY LAB
Company Profile PT SKY LAB
 
05 questões comentadas (bônus)
05 questões comentadas (bônus)05 questões comentadas (bônus)
05 questões comentadas (bônus)
 
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...
Foro de Innovación en Red. II Beyond Internet Barcelona. Ponencia 'Experienci...
 
Monitor
MonitorMonitor
Monitor
 

Similar to Graphalytics: A big data benchmark for graph-processing platforms

RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
ivascucristian
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
inside-BigData.com
 

Similar to Graphalytics: A big data benchmark for graph-processing platforms (20)

Scaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & ExpertsScaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & Experts
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
 
Efficient And Invincible Big Data Platform In LINE
Efficient And Invincible Big Data Platform In LINEEfficient And Invincible Big Data Platform In LINE
Efficient And Invincible Big Data Platform In LINE
 
Scaling Spatial Analytics with Google Cloud & CARTO
Scaling Spatial Analytics with Google Cloud & CARTOScaling Spatial Analytics with Google Cloud & CARTO
Scaling Spatial Analytics with Google Cloud & CARTO
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Graph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d appsGraph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d apps
 
Visual Network Analysis
Visual Network AnalysisVisual Network Analysis
Visual Network Analysis
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis Tools
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
H2O at Berlin R Meetup
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R Meetup
 
Berlin R Meetup
Berlin R MeetupBerlin R Meetup
Berlin R Meetup
 

More from Graph-TA

More from Graph-TA (16)

RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL Platforms
 
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMSGRAPHITE — An Extensible Graph Traversal Framework for RDBMS
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
 
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphsOn the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
 
Autograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph toolAutograph: an evolving lightweight graph tool
Autograph: an evolving lightweight graph tool
 
Understanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge BasesUnderstanding Graph Structure in Knowledge Bases
Understanding Graph Structure in Knowledge Bases
 
Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...Finding patterns of chronic disease and medication prescriptions from a large...
Finding patterns of chronic disease and medication prescriptions from a large...
 
Recent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal DataRecent Updates on IBM System G — GraphBIG and Temporal Data
Recent Updates on IBM System G — GraphBIG and Temporal Data
 
Analysing the degree distribution of real graphs by means of several probabil...
Analysing the degree distribution of real graphs by means of several probabil...Analysing the degree distribution of real graphs by means of several probabil...
Analysing the degree distribution of real graphs by means of several probabil...
 
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...
 
Generating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologiesGenerating synthetic online social network graph data and topologies
Generating synthetic online social network graph data and topologies
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
 
Managing RDF data with graph databases
Managing RDF data with graph databasesManaging RDF data with graph databases
Managing RDF data with graph databases
 
Graph Based Word Spotting Approach for Large Document Collections
Graph Based Word Spotting Approach for Large Document CollectionsGraph Based Word Spotting Approach for Large Document Collections
Graph Based Word Spotting Approach for Large Document Collections
 
Use of graphs for political analysis
Use of graphs for political analysisUse of graphs for political analysis
Use of graphs for political analysis
 
Graphium Chrysalis: Exploiting Graph Database
Graphium Chrysalis: Exploiting Graph DatabaseGraphium Chrysalis: Exploiting Graph Database
Graphium Chrysalis: Exploiting Graph Database
 
Langford sequences through a product of labeled digraphs
Langford sequences through a product of labeled digraphsLangford sequences through a product of labeled digraphs
Langford sequences through a product of labeled digraphs
 

Recently uploaded

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 

Recently uploaded (20)

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 

Graphalytics: A big data benchmark for graph-processing platforms

  • 1. Mihai Capotã, Arnau Prat, Peter Boncz, Hassan Chafi Yong Guo, Ana Lucia Varbanescu, Graphalytics: Benchmarking Graph-Processing Platforms LDBC TUC Meeting UPC Barcelona, March 2016 GRAPHALYTICS A Big Data Benchmark for Graph-Processing Platforms 1 http://bl.ocks.org/mbostock/4062045 Tim Hegeman, Wing Lung Ngai, https://github.com/tudelft-atlarge/graphalytics/ GRAPHALYTICS was made possible by a generous contribution from Oracle. Alexandru Iosup, Stijn Heldens,
  • 2. Graphs at the Core of Our Society: The LinkedIn ExampleData Deluge 2 Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/ via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/ Apr 2014 400 Nov 2015 400
  • 3. LinkedIn Is Not Unique: Data Deluge 270M MAU 200+ avg followers >54B edges 1.2B MAU 0.8B DAU 200+ avg followers >240B edges company/day: 100+ posts, 1,000+ comments IBM 280k employee- -users, 2.6M followers
  • 4. Graph Processing @large 4 A Graph Processing Platform Streaming not considered in this presentation. Interactive processing not considered in this presentation. AlgorithmETL Active Storage (filtering, compression, replication, caching) Distribution to processing platform
  • 5. Graph Processing @large 5 A Graph Processing Platform Streaming not considered in this presentation. Interactive processing not considered in this presentation. AlgorithmETL Active Storage (filtering, compression, replication, caching) Distribution to processing platform Ideally, N cores/disks  Nx faster Ideally, N cores/disks  Nx faster
  • 6. Graph Processing @large 6 A Graph Processing Platform Streaming not considered in this presentation. Interactive processing not considered in this presentation. AlgorithmETL Active Storage (filtering, compression, replication, caching) Distribution to processing platform Ideally, N cores/disks  Nx faster Ideally, N cores/disks  Nx faster Compute-intesive workload different/more complex analysis  ?x slower Dataset-dependent workload unfriendly graphs  ??x slower Data-intesive workload 10x graph size  100x—1,000x slower
  • 7. Graph-Processing Platforms • Platform: the combined hardware, software, and programming system that is being used to complete a graph processing task 7 Trinity 2 Which to choose? What to tune?
  • 8. Graphalytics, in a nutshell • An LDBC benchmark* • Advanced benchmarking harness • Diverse real and synthetic datasets • Many classes of algorithms • Granula for manual choke-point analysis • Modern software engineering practices • Supports many platforms • Enables comparison of community-driven and industrial systems 8 http://graphalytics.ewi.tudelft.nl https://github.com/tudelft-atlarge/graphalytics/
  • 9. Benchmarking Harness 9 Iosup et al. LDBC Graphalytics: A Benchmark for Large Scale Graph Analysis on Parallel and Distributed Platform (submitted).
  • 10. Graphalytics = Representative Classes of Algorithms and Datasets • 2-stage selection process of algorithms datasets 10 Class Examples % Graph Statistics Diameter, Local Clust. Coeff., PageRank 20 Graph Traversal BFS, SSSP, DFS 50 Connected Comp. Reachability, BiCC, Weakly CC 10 Community Detection Clustering, Nearest Neighbor, Community Detection w Label Propagation 5 Other Sampling, Partitioning <15 Guo et al. How Well do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis, IPDPS’14. + weighted graphs: Single-Source Shortest Paths (~35%)
  • 11. Graphalytics = Distributed Graph Generation w DATAGEN Person Generation Edge Generation Activity Generation “Knows” graph serializa tion Activity serializa tion Graphalytics 11 • Rich set of configurations • More diverse degree distribution than Graph500 • Realistic clustering coefficient and assortativity Level of Detail
  • 12. Graphalytics = Portable Perf. Analysis w Granula Graph Processing System Logging Patch Performance Analyzer Granula Performance Archive Granula Performance Model Modeling Archiving logs rules Granula Archiver Sharing Monitoring Minimal code invasion + automated data collection at runtime + portable archive (+ web UI)  portable bottleneck analysis
  • 13. Graphalytics = Diverse Set of Automated Experiments Category Experiment Algo. Data Nodes/ Threads Metrics Baseline Dataset variety BFS,PR All 1 Run, norm. Algorithm variety All R4(S), D300(L) 1 Runtime Scalability Vertical vs. horiz. BFS, PR D300(L), D1000(XL) 1—16/1—32 Runtime, S Weak vs. strong BFS, PR G22(S)— G26(XL) 1—16 Runtime, S Robustness Stress test BFS All 1 SLA met Variability BFS D300(L), D1000(L) 1/16 CV Self-Test Time to run/part -- Datagen 1—16 Runtime 13
  • 14. Implementation status Map Red uce 2 Gir ap h Gra ph X Pow erGr aph Graph Lab Neo4j PG X.D Gra ph Mat Ope nG TOTE M Map Graph M ed us a LCC G G G G G G -- G G -- -- -- BFS G G G G G G G G G V V V WC C G G G G G G G G G V V V CDL P G G G G G G G G G -- -- -- P’R ank -- G G G V -- G G G V V V SSS P -- G G G -- -- G G G -- -- -- https://github.com/tudelft-atlarge/graphalytics/ G=validated, on GitHub V=validation stage
  • 15. Implementation status Map Red uce 2 Gir ap h Gra ph X Pow erGr aph Graph Lab Neo4j PG X.D Gra ph Mat Ope nG TOTE M Map Graph M ed us a LCC G G G G G G -- G G -- -- -- BFS G G G G G G G G G V V V WC C G G G G G G G G G V V V CDL P G G G G G G G G G -- -- -- P’R ank -- G G G V -- G G G V V V SSS P -- G G G -- -- G G G -- -- -- Benchmarking and tuning performed by vendors G=validated, on GitHub V=validation stage
  • 16. Graphalytics Capabilities: An Example 16 Graphalytics enables deep comparison of many systems at once, through diverse experiments and metrics Your system here! Diverse algorithms Diverse metrics Diverse datasets
  • 17. Processing time (s) + Edges[+Vertices]/s 17 Which system is the best? It depends… Algorithm + Dataset + Metric OK, but … why is this system better for this workload for this metric?
  • 18. Granula Visualizer Portable choke-point analysis for everyone!
  • 19. Graphalytics = Modern Software Engineering Process • Graphalytics code reviews • Internal release to LDBC partners (first, Feb 2015; last, Feb 2016) • Public release, announced first through LDBC (Apr 2015) • First full benchmark specification, LDBC criteria (Q1 2016) • Jenkins continuous integration server • SonarQube software quality analyzer 19https://github.com/tudelft-atlarge/graphalytics/
  • 20. Graphalytics, in the future • An LDBC benchmark* • Advanced benchmarking harness • Diverse real and synthetic datasets • Many classes of algorithms • Granula for manual choke-point analysis • Modern software engineering practices • Supports many platforms • Enables comparison of community-driven and industrial systems 20 github.com/tudelft-atlarge/graphalytics/ + more data generation + deeper performance metrics + choke-point analysis
  • 21. PELGA – Performance Engineering for Large-scale Graph Analytics, workshop with EuroPar 2016 21
  • 22. 22