SlideShare une entreprise Scribd logo
1  sur  34
Dynamic graph / iterative
computation on Apache Giraph
6/3/2014
Avery Ching
Hadoop Summit
Motivation
Apache Giraph
• Inspired by Google’s Pregel but runs on Hadoop
• “Think like a vertex”
• Maximum value vertex example
Processor 1
Processor 2
Time
5
5
5
5
2
5
5
5
2
1
5
5
2
1
Giraph on Hadoop / Yarn
MapReduce YARN
Giraph
Hadoop
0.20.x
Hadoop
0.20.203
Hadoop
2.0.x
Hadoop
1.x
Send page rank
value to
neighbors for
30 iterations
Calculate
updated
page rank
value from
neighbors
Page rank in Giraph
!
!
public class PageRankComputation extends BasicComputation<LongWritable,
DoubleWritable, FloatWritable, DoubleWritable> {
public void compute(Vertex<LongWritable, DoubleWritable, FloatWritable> vertex,
Iterable<DoubleWritable> messages) {
if (getSuperstep() >= 1) {
double sum = 0;
for (DoubleWritable message : messages) {
sum += message.get();
}
vertex.getValue().set(DoubleWritable((0.15d / getTotalNumVertices()) + 0.85d * sum);
}
if (getSuperstep() < 30) {
sendMsgToAllEdges(new DoubleWritable(getVertexValue().get() / getNumOutEdges()));
} else {
voteToHalt();
}
}
}
Apache Giraph data flow
Loading the graph
Input

format
Split0
Split1
Split2
Split3
Master
Load/
Send
Graph
Worker0
Load/
Send
Graph
Worker1
Storing the graph
Worker0Worker1
Output

format
Part0
Part1
Part2
Part3
Part0
Part1
Part2
Part3
Compute / Iterate
Compute/
Send
Messages
Compute/
Send
Messages
In-memory
graph
Part0
Part1
Part2
Part3Master
Worker0Worker1
Sendstats/iterate!
Pipelined computation
Master “computes”
• Sets computation, in/out message, combiner for next super step
•Can set/modify aggregator values
Time
Worker 0
Worker 1
Master
phase 1a
phase 1a
phase 1b
phase 1b
phase 2
phase 2
phase 3
phase 3
Use case
Affinity propagation
Frey and Dueck “Clustering by passing messages between data points”
Science 2007
Organically discover exemplars based on similarity
Initialization Intermediate Convergence
Responsibility r(i,k)
• How well suited is k to be an exemplar for i?
Availability a(i,k)
• How appropriate for point i to choose point k as an exemplar given all
of i’s responsibilities?
Update exemplars
• Based on known responsibilities/availabilities, which vertex should be
my exemplar?
!
* Dampen responsibility, availability
3 stages
Responsibility
Every vertex i with an edge to k maintains responsibility of k for i
Sends responsibility to k in ResponsibilityMessage (senderid,
responsibility(i,k))
C
A
D
B
r(c,a)
r(d,a)
r(b,d)
r(b,a)
Availability
Vertex sums positive messages
Sends availability to i in AvailabilityMessage (senderid, availability(i,k))
C
A
D
B
a(c,a)
a(d,a)
a(b,d)
a(b,a)
Update exemplars
Dampens availabilities and scans edges to find exemplar k
Updates self-exemplar
C
A
D
Bupdate update
update update
exemplar=a exemplar=d
exemplar=a exemplar=a
Master logic
calculate
responsibility
calculate
availability
update
exemplars
initial
state
halt
if (exemplars agree they are exemplars
&& changed exemplars < ∆) then halt,
otherwise continue
Performance
Faster than Hive?
Application Graph Size CPU Time Speedup Elapsed Time Speedup
Page rank

(single iteration)
400B+ edges 26x 120x
Friends of
friends score
 71B+ edges 12.5x 48x
Apache Giraph scalability
Scalability of workers
(200B edges)
Seconds
0
125
250
375
500
# of Workers
50 100 150 200 250 300
Giraph Ideal
Scalability of edges (50
workers)
Seconds
0
125
250
375
500
# of Edges
1E+09 7E+10 1E+11 2E+11
Giraph Ideal
Trillion social edges page rank
Minutesperiteration
0
1
2
3
4
6/30/2013 6/2/2014
Improvements
• GIRAPH-840 - Netty 4 upgrade
• G1 Collector / tuning
Graph partitioning
Why balanced partitioning
Random partitioning == good balance
BUT ignores entity affinity
0 1
2
3
4 5
6
7
8 9
10
11
Balanced partitioning application
Results from one service:
Cache hit rate grew from 70% to 85%, bandwidth cut in 1/2
!
!
0
2
3
5
6 9
11
1 4 7
8
10
Balanced label propagation results
* Loosely based on Ugander and Backstrom. Balanced label
propagation for partitioning massive graphs, WSDM '13
Partitioning experimentsSecondsperiteration
0
40
80
120
160
Random 47% Local Edges
345B edge page rank
Improvements
• 56% faster!
• Native vertex remapping
Power-law graphs
Avoiding out-of-core
Example: Mutual friends calculation between
neighbors
1. Send your friends a list of your friends
2. Intersect with your friend list
!
1.23B (as of 1/2014)
200+ average friends (2011 S1)
8-byte ids (longs)
= 394 TB / 100 GB machines
3,940 machines (not including the graph)
A B
C
D
E
A:{D}
D:{A,E}
E:{D}
B:{}
C:{D}
D:{C}
A:{C}
C:{A,E}
E:{C}
!
C:{D}
D:{C}
!
!
E:{}
Superstep splitting
Subsets of sources/destinations edges per superstep
* Currently manual - future work automatic!
A
Sources: A (on), B (off)
Destinations: A (on), B (off)
B
B
B
A
A
A
Sources: A (on), B (off)
Destinations: A (off), B (on)
B
B
B
A
A
A
Sources: A (off), B (on)
Destinations: A (on), B (off)
B
B
B
A
A
A
Sources: A (off), B (on)
Destinations: A (off), B (on)
B
B
B
A
A
Giraph in production
Over 1.5 years in production
Over 100 jobs processed a week
30+ applications in our internal application repository
Sample production job - 700B+ edge
GiraphicJam demo
Giraph related projects
Graft: The distributed
Giraph debugger
Giraph roadmap
2/12 - 0.1 6/14 - 1.15/13 - 1.0
Future work
Automatic checkpointing make scheduling more fair
Investigate alternative computing models
•Giraph++ (IBM research)
•Giraphx (University at Buffalo, SUNY)
Performance
Lower the barrier to entry
Applications
Our team
!
Maja
Kabiljo
Sergey
Edunov
Pavan
Athivarapu
Avery
Ching
Sambavi
Muthukrishnan
Dynamic graph computation and iterative processing with Apache Giraph

Contenu connexe

Tendances

Introduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGIntroduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGAdam Kawa
 
Yarn Resource Management Using Machine Learning
Yarn Resource Management Using Machine LearningYarn Resource Management Using Machine Learning
Yarn Resource Management Using Machine Learningojavajava
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
 
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)PyData
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
SparkR - Scalable machine learning - Utah R Users Group - U of U - June 17th
SparkR - Scalable machine learning - Utah R Users Group - U of U - June 17thSparkR - Scalable machine learning - Utah R Users Group - U of U - June 17th
SparkR - Scalable machine learning - Utah R Users Group - U of U - June 17thAlton Alexander
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinPietro Michiardi
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialFarzad Nozarian
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tipsSubhas Kumar Ghosh
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-ReduceBrendan Tierney
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
 
How LinkedIn Uses Scalding for Data Driven Product Development
How LinkedIn Uses Scalding for Data Driven Product DevelopmentHow LinkedIn Uses Scalding for Data Driven Product Development
How LinkedIn Uses Scalding for Data Driven Product DevelopmentSasha Ovsankin
 

Tendances (20)

Introduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGIntroduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUG
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Yarn Resource Management Using Machine Learning
Yarn Resource Management Using Machine LearningYarn Resource Management Using Machine Learning
Yarn Resource Management Using Machine Learning
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Apache pig
Apache pigApache pig
Apache pig
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
SparkR - Scalable machine learning - Utah R Users Group - U of U - June 17th
SparkR - Scalable machine learning - Utah R Users Group - U of U - June 17thSparkR - Scalable machine learning - Utah R Users Group - U of U - June 17th
SparkR - Scalable machine learning - Utah R Users Group - U of U - June 17th
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 
Overview of Spark for HPC
Overview of Spark for HPCOverview of Spark for HPC
Overview of Spark for HPC
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 
How LinkedIn Uses Scalding for Data Driven Product Development
How LinkedIn Uses Scalding for Data Driven Product DevelopmentHow LinkedIn Uses Scalding for Data Driven Product Development
How LinkedIn Uses Scalding for Data Driven Product Development
 

En vedette

2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users GroupNitay Joffe
 
Link prediction 방법의 개념 및 활용
Link prediction 방법의 개념 및 활용Link prediction 방법의 개념 및 활용
Link prediction 방법의 개념 및 활용Kyunghoon Kim
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsNesreen K. Ahmed
 
Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...Mani kandan
 
Fast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARNFast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARNDataWorks Summit
 
Hadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphHadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphDataWorks Summit
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Claudio Martella
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - HortonworksAvery Ching
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXAndrea Iacono
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingPetr Zapletal
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Bio heidi reformer training
Bio heidi reformer trainingBio heidi reformer training
Bio heidi reformer trainingHeidi Tritt
 
MALDEF y la Prevención Primaria de la Violencia Domestica Abril 2012
MALDEF y la Prevención Primaria de la Violencia Domestica Abril 2012MALDEF y la Prevención Primaria de la Violencia Domestica Abril 2012
MALDEF y la Prevención Primaria de la Violencia Domestica Abril 2012Ana Paula Noguez Mercado
 
Informe sobre el Encuentro Pedagógico del 31 de julio del 2014, Comunicado 013
Informe sobre el Encuentro Pedagógico del 31 de julio del 2014, Comunicado 013Informe sobre el Encuentro Pedagógico del 31 de julio del 2014, Comunicado 013
Informe sobre el Encuentro Pedagógico del 31 de julio del 2014, Comunicado 013Ernesto Fernández
 
Brand Storytelling Awards 2012
Brand Storytelling Awards 2012Brand Storytelling Awards 2012
Brand Storytelling Awards 2012Eugenio Amendola
 

En vedette (20)

2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group
 
Link prediction 방법의 개념 및 활용
Link prediction 방법의 개념 및 활용Link prediction 방법의 개념 및 활용
Link prediction 방법의 개념 및 활용
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
 
Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...
 
Apache giraph
Apache giraphApache giraph
Apache giraph
 
Fast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARNFast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARN
 
Hadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphHadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache Giraph
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphX
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
 
Graph Analytics
Graph AnalyticsGraph Analytics
Graph Analytics
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Bio heidi reformer training
Bio heidi reformer trainingBio heidi reformer training
Bio heidi reformer training
 
MALDEF y la Prevención Primaria de la Violencia Domestica Abril 2012
MALDEF y la Prevención Primaria de la Violencia Domestica Abril 2012MALDEF y la Prevención Primaria de la Violencia Domestica Abril 2012
MALDEF y la Prevención Primaria de la Violencia Domestica Abril 2012
 
Portugal
PortugalPortugal
Portugal
 
Informe sobre el Encuentro Pedagógico del 31 de julio del 2014, Comunicado 013
Informe sobre el Encuentro Pedagógico del 31 de julio del 2014, Comunicado 013Informe sobre el Encuentro Pedagógico del 31 de julio del 2014, Comunicado 013
Informe sobre el Encuentro Pedagógico del 31 de julio del 2014, Comunicado 013
 
Brand Storytelling Awards 2012
Brand Storytelling Awards 2012Brand Storytelling Awards 2012
Brand Storytelling Awards 2012
 

Similaire à Dynamic graph computation and iterative processing with Apache Giraph

2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache GiraphAvery Ching
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph ProcessingVasia Kalavri
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsFlink Forward
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120Hyoungjun Kim
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013alanfgates
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
Comparing pregel related systems
Comparing pregel related systemsComparing pregel related systems
Comparing pregel related systemsPrashant Raaghav
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansattilacsordas
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Ontico
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Alexey Zinoviev
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraJim Dowling
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleSean Chittenden
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfssusere05ec21
 
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkFlink Forward
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Martin Junghanns
 

Similaire à Dynamic graph computation and iterative processing with Apache Giraph (20)

2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Comparing pregel related systems
Comparing pregel related systemsComparing pregel related systems
Comparing pregel related systems
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
 
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
 
Yahoo's Experience Running Pig on Tez at Scale
Yahoo's Experience Running Pig on Tez at ScaleYahoo's Experience Running Pig on Tez at Scale
Yahoo's Experience Running Pig on Tez at Scale
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Dernier (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Dynamic graph computation and iterative processing with Apache Giraph

  • 1. Dynamic graph / iterative computation on Apache Giraph 6/3/2014 Avery Ching Hadoop Summit
  • 3. Apache Giraph • Inspired by Google’s Pregel but runs on Hadoop • “Think like a vertex” • Maximum value vertex example Processor 1 Processor 2 Time 5 5 5 5 2 5 5 5 2 1 5 5 2 1
  • 4. Giraph on Hadoop / Yarn MapReduce YARN Giraph Hadoop 0.20.x Hadoop 0.20.203 Hadoop 2.0.x Hadoop 1.x
  • 5. Send page rank value to neighbors for 30 iterations Calculate updated page rank value from neighbors Page rank in Giraph ! ! public class PageRankComputation extends BasicComputation<LongWritable, DoubleWritable, FloatWritable, DoubleWritable> { public void compute(Vertex<LongWritable, DoubleWritable, FloatWritable> vertex, Iterable<DoubleWritable> messages) { if (getSuperstep() >= 1) { double sum = 0; for (DoubleWritable message : messages) { sum += message.get(); } vertex.getValue().set(DoubleWritable((0.15d / getTotalNumVertices()) + 0.85d * sum); } if (getSuperstep() < 30) { sendMsgToAllEdges(new DoubleWritable(getVertexValue().get() / getNumOutEdges())); } else { voteToHalt(); } } }
  • 6. Apache Giraph data flow Loading the graph Input
 format Split0 Split1 Split2 Split3 Master Load/ Send Graph Worker0 Load/ Send Graph Worker1 Storing the graph Worker0Worker1 Output
 format Part0 Part1 Part2 Part3 Part0 Part1 Part2 Part3 Compute / Iterate Compute/ Send Messages Compute/ Send Messages In-memory graph Part0 Part1 Part2 Part3Master Worker0Worker1 Sendstats/iterate!
  • 7. Pipelined computation Master “computes” • Sets computation, in/out message, combiner for next super step •Can set/modify aggregator values Time Worker 0 Worker 1 Master phase 1a phase 1a phase 1b phase 1b phase 2 phase 2 phase 3 phase 3
  • 9. Affinity propagation Frey and Dueck “Clustering by passing messages between data points” Science 2007 Organically discover exemplars based on similarity Initialization Intermediate Convergence
  • 10. Responsibility r(i,k) • How well suited is k to be an exemplar for i? Availability a(i,k) • How appropriate for point i to choose point k as an exemplar given all of i’s responsibilities? Update exemplars • Based on known responsibilities/availabilities, which vertex should be my exemplar? ! * Dampen responsibility, availability 3 stages
  • 11. Responsibility Every vertex i with an edge to k maintains responsibility of k for i Sends responsibility to k in ResponsibilityMessage (senderid, responsibility(i,k)) C A D B r(c,a) r(d,a) r(b,d) r(b,a)
  • 12. Availability Vertex sums positive messages Sends availability to i in AvailabilityMessage (senderid, availability(i,k)) C A D B a(c,a) a(d,a) a(b,d) a(b,a)
  • 13. Update exemplars Dampens availabilities and scans edges to find exemplar k Updates self-exemplar C A D Bupdate update update update exemplar=a exemplar=d exemplar=a exemplar=a
  • 14. Master logic calculate responsibility calculate availability update exemplars initial state halt if (exemplars agree they are exemplars && changed exemplars < ∆) then halt, otherwise continue
  • 16. Faster than Hive? Application Graph Size CPU Time Speedup Elapsed Time Speedup Page rank
 (single iteration) 400B+ edges 26x 120x Friends of friends score
 71B+ edges 12.5x 48x
  • 17. Apache Giraph scalability Scalability of workers (200B edges) Seconds 0 125 250 375 500 # of Workers 50 100 150 200 250 300 Giraph Ideal Scalability of edges (50 workers) Seconds 0 125 250 375 500 # of Edges 1E+09 7E+10 1E+11 2E+11 Giraph Ideal
  • 18. Trillion social edges page rank Minutesperiteration 0 1 2 3 4 6/30/2013 6/2/2014 Improvements • GIRAPH-840 - Netty 4 upgrade • G1 Collector / tuning
  • 20. Why balanced partitioning Random partitioning == good balance BUT ignores entity affinity 0 1 2 3 4 5 6 7 8 9 10 11
  • 21. Balanced partitioning application Results from one service: Cache hit rate grew from 70% to 85%, bandwidth cut in 1/2 ! ! 0 2 3 5 6 9 11 1 4 7 8 10
  • 22. Balanced label propagation results * Loosely based on Ugander and Backstrom. Balanced label propagation for partitioning massive graphs, WSDM '13
  • 23. Partitioning experimentsSecondsperiteration 0 40 80 120 160 Random 47% Local Edges 345B edge page rank Improvements • 56% faster! • Native vertex remapping
  • 25. Avoiding out-of-core Example: Mutual friends calculation between neighbors 1. Send your friends a list of your friends 2. Intersect with your friend list ! 1.23B (as of 1/2014) 200+ average friends (2011 S1) 8-byte ids (longs) = 394 TB / 100 GB machines 3,940 machines (not including the graph) A B C D E A:{D} D:{A,E} E:{D} B:{} C:{D} D:{C} A:{C} C:{A,E} E:{C} ! C:{D} D:{C} ! ! E:{}
  • 26.
  • 27. Superstep splitting Subsets of sources/destinations edges per superstep * Currently manual - future work automatic! A Sources: A (on), B (off) Destinations: A (on), B (off) B B B A A A Sources: A (on), B (off) Destinations: A (off), B (on) B B B A A A Sources: A (off), B (on) Destinations: A (on), B (off) B B B A A A Sources: A (off), B (on) Destinations: A (off), B (on) B B B A A
  • 28. Giraph in production Over 1.5 years in production Over 100 jobs processed a week 30+ applications in our internal application repository Sample production job - 700B+ edge
  • 30. Giraph related projects Graft: The distributed Giraph debugger
  • 31. Giraph roadmap 2/12 - 0.1 6/14 - 1.15/13 - 1.0
  • 32. Future work Automatic checkpointing make scheduling more fair Investigate alternative computing models •Giraph++ (IBM research) •Giraphx (University at Buffalo, SUNY) Performance Lower the barrier to entry Applications