SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Hello, I'm Eli Reisman!
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Eli is...
•  Apache Giraph Committer and PMC Member
•  Apache Tajo Committer
•  Wrote initial port of Giraph to YARN
•  Collaborating with fellow Giraph committers on
Giraph in Action book for Manning publishing
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Eli is...
•  Only able to do all this with the support of:
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Eli is a software engineer at
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Etsy enables non-technical folks to sell
handmade and vintage stuff:
We have a great blog called Code As Craft:
Fast, Scalable Graph Processing:
Apache Giraph on YARN
...but, enough about me, lets talk Giraph!
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Key Topics
What is Apache Giraph?
Why do I need it?
Giraph + MapReduce
Giraph + YARN
Giraph Roadmap
Fast, Scalable Graph Processing:
Apache Giraph on YARN
What is Apache Giraph?
Giraph is a framework for performing offline
batch processing of semi-structured graph
data on a massive scale.
Giraph is loosely based upon Google's Pregel
graph processing framework.
Fast, Scalable Graph Processing:
Apache Giraph on YARN
What is Apache Giraph?
Giraph performs iterative calculations on top of an
existing Hadoop cluster.
Fast, Scalable Graph Processing:
Apache Giraph on YARN
What is Apache Giraph?
Giraph uses Apache Zookeeper to enforce atomic
barrier waits and perform leader election.
Done! Done! ...Still
working...
Fast, Scalable Graph Processing:
Apache Giraph on YARN
What is Apache Giraph?
Giraph benefits from a vibrant Apache community, and is
under active development:
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Why do I need it?
Giraph makes graph algorithms easy to reason about
and implement by following the Bulk Synchronous
Parallel (BSP) programming model.
In BSP, all algorithms are implemented from the point
of view of a single vertex in the input graph
performing a single iteration of the computation.
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Why do I need it?
•  Giraph makes iterative data processing more
practical for Hadoop users.
•  Giraph can avoid costly disk and network
operations that are mandatory in MR.
•  No concept of message passing in MR.
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Why do I need it?
Each cycle of an iterative calculation on
Hadoop means running a full MapReduce
job.
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Let's use simple PageRank as a quick
example:
http://en.wikipedia.org/wiki/PageRank
1.0
1.0
1.0
Fast, Scalable Graph Processing:
Apache Giraph on YARN
1. All vertices start with same PageRank
1.0
1.0
1.0
Fast, Scalable Graph Processing:
Apache Giraph on YARN
2. Each vertex distributes an equal portion of
its PageRank to all neighbors:
0.5
0.5
1
1
Fast, Scalable Graph Processing:
Apache Giraph on YARN
3. Each vertex sums incoming values times a
weight factor and adds in small adjustment:
1/(# vertices in graph)
(.5*.85) + (.15/3)
(1.5*.85) + (.15/3)
(1*.85) + (.15/3)
Fast, Scalable Graph Processing:
Apache Giraph on YARN
4. This value becomes the vertices' PageRank
for the next iteration
.43
.21
.64
Fast, Scalable Graph Processing:
Apache Giraph on YARN
5. Repeat until convergence:
(change in PR per-iteration < epsilon)
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Vertices with more in-degrees converge to higher
PageRank
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Put another way:
Fast, Scalable Graph Processing:
Apache Giraph on YARN
PageRank on MapReduce
1. Load complete input graph from disk as
[K= Vertex ID, V = out-edges and PR]
Map Sort/Shuffle Reduce
Fast, Scalable Graph Processing:
Apache Giraph on YARN
PageRank on MapReduce
2. Emit all input records (full graph state),
Emit [K = edgeTarget, V = share of PR]
Map Sort/Shuffle Reduce
Fast, Scalable Graph Processing:
Apache Giraph on YARN
PageRank on MapReduce
3. Sort and Shuffle this entire mess!
Map Sort/Shuffle Reduce
Fast, Scalable Graph Processing:
Apache Giraph on YARN
PageRank on MapReduce
4. Sum incoming PR shares for each vertex,
update PR values in graph state records
Map Sort/Shuffle Reduce
Fast, Scalable Graph Processing:
Apache Giraph on YARN
PageRank on MapReduce
5. Emit full graph state to disk...
Map Sort/Shuffle Reduce
Fast, Scalable Graph Processing:
Apache Giraph on YARN
PageRank on MapReduce
6. ...and start over!
Map Sort/Shuffle Reduce
Fast, Scalable Graph Processing:
Apache Giraph on YARN
PageRank on MapReduce
•  Awkward to reason about
•  I/O bound despite simple core business logic
Map Sort/Shuffle Reduce
Fast, Scalable Graph Processing:
Apache Giraph on YARN
PageRank on Giraph
1. Hadoop Mappers are "hijacked" to host
Giraph master and worker tasks.
Map Sort/Shuffle Reduce
Fast, Scalable Graph Processing:
Apache Giraph on YARN
PageRank on Giraph
2. Input graph is loaded once, maintaining
code-data locality when possible.
Map Sort/Shuffle Reduce
Fast, Scalable Graph Processing:
Apache Giraph on YARN
PageRank on Giraph
3. All iterations are performed on data in memory,
optionally spilled to disk. Disk access is linear/
scan-based.
Map Sort/Shuffle Reduce
Fast, Scalable Graph Processing:
Apache Giraph on YARN
PageRank on Giraph
4. Output is written from the Mappers hosting
the calculation, and the job run ends.
Map Sort/Shuffle Reduce
Fast, Scalable Graph Processing:
Apache Giraph on YARN
This is all well and good, but must we
manipulate Hadoop this way?
?
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Giraph + MapReduce
•  Heap and other resources are set once, globally, for all
Mappers in the computation.
•  No control of which cluster nodes host which tasks.
•  No control over how Mappers are scheduled.
•  Mapper and Reducer slots abstraction is meaningless
for Giraph at best, an artificial limit at worst.
Fast, Scalable Graph Processing:
Apache Giraph on YARN
YARN
•  YARN (Yet Another Resource Negotiator) is Hadoop's
next-gen job management platform.
•  Powers MapReduce v2, but is a general purpose
framework that is not tied to the MapReduce paradigm.
•  Offers fine-grained control over each task's resource
allocations and host placement for clients that need it.
Fast, Scalable Graph Processing:
Apache Giraph on YARN
YARN Architecture
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Giraph + YARN
Its a natural fit!
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Giraph + YARN
•  Giraph has maintained compatibility with Hadoop since
0.1 release by executing via MapReduce interface.
•  Giraph has featured a "pure YARN" build profile since
1.0 release. It supports Hadoop-2.0.3 and trunk.
*Patches to add 2.0.4 and 2.0.5 support are in review :)
•  Giraph's YARN component is easy to extend or use as
a template to port other projects!
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Giraph + YARN: Roadmap
•  YARN Application Master allows for more natural and
stable bootstrapping of Giraph jobs.
•  Zookeeper management can find natural home in
Application Master.
•  Giraph on YARN can stop borrowing from Hadoop and
have its own web interface.
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Giraph + YARN: Roadmap
•  Variable per-task resource allocation opens up the
possibility of Supertasks to manage graph supernodes.
•  Ability to spawn or retire tasks per-iteration enables in-
flight reassignment of data partitions.
•  AppMaster managed utility tasks such as dedicated
sub-aggregators for tree-like aggregation, or data pre-
samplers.
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Giraph New Developments
•  Decoupling of logic and graph data means tasks host
computations that are pluggable per-iteration.
•  Support for Giraph job scripting, starting with Jython.
More to follow...
•  New website, fresh docs, upcoming Manning book, and
large, active community means Giraph has never been
easier to use or contribute to!
Fast, Scalable Graph Processing:
Apache Giraph on YARN
Great! Where can I learn more?
http://giraph.apache.org
Mailing List:
user@giraph.apache.org

Contenu connexe

Tendances

Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?rhatr
 
Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloudrhatr
 
Apache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesApache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesVinay Shukla
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanLuke Han
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & ZeppelinVinay Shukla
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit
 
Dynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDataWorks Summit
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng ShiDatabricks
 
SparkR + Zeppelin
SparkR + ZeppelinSparkR + Zeppelin
SparkR + Zeppelinfelixcss
 
Future of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsFuture of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsMilind Bhandarkar
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Alexey Zinoviev
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupNed Shawa
 
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to HamsterThe Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to HamsterMilind Bhandarkar
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCJosh Baer
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark Summit
 

Tendances (20)

Hadoop to spark_v2
Hadoop to spark_v2Hadoop to spark_v2
Hadoop to spark_v2
 
Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?
 
Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloud
 
Apache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesApache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenches
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke Han
 
Neo4j vs giraph
Neo4j vs giraphNeo4j vs giraph
Neo4j vs giraph
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & Zeppelin
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko Korndorf
 
Dynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache Giraph
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 
SparkR + Zeppelin
SparkR + ZeppelinSparkR + Zeppelin
SparkR + Zeppelin
 
Future of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsFuture of Data Intensive Applicaitons
Future of Data Intensive Applicaitons
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to HamsterThe Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
 

En vedette

Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXrhatr
 
Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...Mani kandan
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsNesreen K. Ahmed
 
Hadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphHadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphDataWorks Summit
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - HortonworksAvery Ching
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXAndrea Iacono
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingPetr Zapletal
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataMarin Dimitrov
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphsNicola Barbieri
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesData Ninja API
 
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkFlink Forward
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)Ankur Dave
 
Graph theory
Graph theoryGraph theory
Graph theoryKumar
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 
Recomendation system: Community Detection Based Recomendation System using Hy...
Recomendation system: Community Detection Based Recomendation System using Hy...Recomendation system: Community Detection Based Recomendation System using Hy...
Recomendation system: Community Detection Based Recomendation System using Hy...Rajul Kukreja
 

En vedette (20)

Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
 
Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
 
Apache giraph
Apache giraphApache giraph
Apache giraph
 
Hadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphHadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache Giraph
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphX
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Graph Analytics
Graph AnalyticsGraph Analytics
Graph Analytics
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphs
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databases
 
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
Graph theory
Graph theoryGraph theory
Graph theory
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
Recomendation system: Community Detection Based Recomendation System using Hy...
Recomendation system: Community Detection Based Recomendation System using Hy...Recomendation system: Community Detection Based Recomendation System using Hy...
Recomendation system: Community Detection Based Recomendation System using Hy...
 

Similaire à Fast, Scalable Graph Processing with Apache Giraph on YARN

[@NaukriEngineering] Apache Spark
[@NaukriEngineering] Apache Spark[@NaukriEngineering] Apache Spark
[@NaukriEngineering] Apache SparkNaukri.com
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML ConferenceDB Tsai
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark Aakashdata
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceUwe Printz
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfWasyihunSema2
 
Guagua an iterative computing framework on hadoop
Guagua an iterative computing framework on hadoopGuagua an iterative computing framework on hadoop
Guagua an iterative computing framework on hadooppengshanzhang
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Sparkrhatr
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 
Apache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduceApache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduceEdureka!
 
Apache spark installation [autosaved]
Apache spark installation [autosaved]Apache spark installation [autosaved]
Apache spark installation [autosaved]Shweta Patnaik
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...SCAPE Project
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkHome
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwordsSzehon Ho
 

Similaire à Fast, Scalable Graph Processing with Apache Giraph on YARN (20)

Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14
 
[@NaukriEngineering] Apache Spark
[@NaukriEngineering] Apache Spark[@NaukriEngineering] Apache Spark
[@NaukriEngineering] Apache Spark
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
 
Guagua an iterative computing framework on hadoop
Guagua an iterative computing framework on hadoopGuagua an iterative computing framework on hadoop
Guagua an iterative computing framework on hadoop
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Empire: JPA for RDF & SPARQL
Empire: JPA for RDF & SPARQLEmpire: JPA for RDF & SPARQL
Empire: JPA for RDF & SPARQL
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
Apache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduceApache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduce
 
Apache spark installation [autosaved]
Apache spark installation [autosaved]Apache spark installation [autosaved]
Apache spark installation [autosaved]
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
 
Apache spark
Apache sparkApache spark
Apache spark
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Module01
 Module01 Module01
Module01
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Fast, Scalable Graph Processing with Apache Giraph on YARN

  • 1. Fast, Scalable Graph Processing: Apache Giraph on YARN
  • 2. Fast, Scalable Graph Processing: Apache Giraph on YARN Hello, I'm Eli Reisman!
  • 3. Fast, Scalable Graph Processing: Apache Giraph on YARN Eli is... •  Apache Giraph Committer and PMC Member •  Apache Tajo Committer •  Wrote initial port of Giraph to YARN •  Collaborating with fellow Giraph committers on Giraph in Action book for Manning publishing
  • 4. Fast, Scalable Graph Processing: Apache Giraph on YARN Eli is... •  Only able to do all this with the support of:
  • 5. Fast, Scalable Graph Processing: Apache Giraph on YARN Eli is a software engineer at
  • 6. Fast, Scalable Graph Processing: Apache Giraph on YARN Etsy enables non-technical folks to sell handmade and vintage stuff: We have a great blog called Code As Craft:
  • 7. Fast, Scalable Graph Processing: Apache Giraph on YARN ...but, enough about me, lets talk Giraph!
  • 8. Fast, Scalable Graph Processing: Apache Giraph on YARN Key Topics What is Apache Giraph? Why do I need it? Giraph + MapReduce Giraph + YARN Giraph Roadmap
  • 9. Fast, Scalable Graph Processing: Apache Giraph on YARN What is Apache Giraph? Giraph is a framework for performing offline batch processing of semi-structured graph data on a massive scale. Giraph is loosely based upon Google's Pregel graph processing framework.
  • 10. Fast, Scalable Graph Processing: Apache Giraph on YARN What is Apache Giraph? Giraph performs iterative calculations on top of an existing Hadoop cluster.
  • 11. Fast, Scalable Graph Processing: Apache Giraph on YARN What is Apache Giraph? Giraph uses Apache Zookeeper to enforce atomic barrier waits and perform leader election. Done! Done! ...Still working...
  • 12. Fast, Scalable Graph Processing: Apache Giraph on YARN What is Apache Giraph? Giraph benefits from a vibrant Apache community, and is under active development:
  • 13. Fast, Scalable Graph Processing: Apache Giraph on YARN Why do I need it? Giraph makes graph algorithms easy to reason about and implement by following the Bulk Synchronous Parallel (BSP) programming model. In BSP, all algorithms are implemented from the point of view of a single vertex in the input graph performing a single iteration of the computation.
  • 14. Fast, Scalable Graph Processing: Apache Giraph on YARN Why do I need it? •  Giraph makes iterative data processing more practical for Hadoop users. •  Giraph can avoid costly disk and network operations that are mandatory in MR. •  No concept of message passing in MR.
  • 15. Fast, Scalable Graph Processing: Apache Giraph on YARN Why do I need it? Each cycle of an iterative calculation on Hadoop means running a full MapReduce job.
  • 16. Fast, Scalable Graph Processing: Apache Giraph on YARN Let's use simple PageRank as a quick example: http://en.wikipedia.org/wiki/PageRank 1.0 1.0 1.0
  • 17. Fast, Scalable Graph Processing: Apache Giraph on YARN 1. All vertices start with same PageRank 1.0 1.0 1.0
  • 18. Fast, Scalable Graph Processing: Apache Giraph on YARN 2. Each vertex distributes an equal portion of its PageRank to all neighbors: 0.5 0.5 1 1
  • 19. Fast, Scalable Graph Processing: Apache Giraph on YARN 3. Each vertex sums incoming values times a weight factor and adds in small adjustment: 1/(# vertices in graph) (.5*.85) + (.15/3) (1.5*.85) + (.15/3) (1*.85) + (.15/3)
  • 20. Fast, Scalable Graph Processing: Apache Giraph on YARN 4. This value becomes the vertices' PageRank for the next iteration .43 .21 .64
  • 21. Fast, Scalable Graph Processing: Apache Giraph on YARN 5. Repeat until convergence: (change in PR per-iteration < epsilon)
  • 22. Fast, Scalable Graph Processing: Apache Giraph on YARN Vertices with more in-degrees converge to higher PageRank
  • 23. Fast, Scalable Graph Processing: Apache Giraph on YARN Put another way:
  • 24. Fast, Scalable Graph Processing: Apache Giraph on YARN PageRank on MapReduce 1. Load complete input graph from disk as [K= Vertex ID, V = out-edges and PR] Map Sort/Shuffle Reduce
  • 25. Fast, Scalable Graph Processing: Apache Giraph on YARN PageRank on MapReduce 2. Emit all input records (full graph state), Emit [K = edgeTarget, V = share of PR] Map Sort/Shuffle Reduce
  • 26. Fast, Scalable Graph Processing: Apache Giraph on YARN PageRank on MapReduce 3. Sort and Shuffle this entire mess! Map Sort/Shuffle Reduce
  • 27. Fast, Scalable Graph Processing: Apache Giraph on YARN PageRank on MapReduce 4. Sum incoming PR shares for each vertex, update PR values in graph state records Map Sort/Shuffle Reduce
  • 28. Fast, Scalable Graph Processing: Apache Giraph on YARN PageRank on MapReduce 5. Emit full graph state to disk... Map Sort/Shuffle Reduce
  • 29. Fast, Scalable Graph Processing: Apache Giraph on YARN PageRank on MapReduce 6. ...and start over! Map Sort/Shuffle Reduce
  • 30. Fast, Scalable Graph Processing: Apache Giraph on YARN PageRank on MapReduce •  Awkward to reason about •  I/O bound despite simple core business logic Map Sort/Shuffle Reduce
  • 31. Fast, Scalable Graph Processing: Apache Giraph on YARN PageRank on Giraph 1. Hadoop Mappers are "hijacked" to host Giraph master and worker tasks. Map Sort/Shuffle Reduce
  • 32. Fast, Scalable Graph Processing: Apache Giraph on YARN PageRank on Giraph 2. Input graph is loaded once, maintaining code-data locality when possible. Map Sort/Shuffle Reduce
  • 33. Fast, Scalable Graph Processing: Apache Giraph on YARN PageRank on Giraph 3. All iterations are performed on data in memory, optionally spilled to disk. Disk access is linear/ scan-based. Map Sort/Shuffle Reduce
  • 34. Fast, Scalable Graph Processing: Apache Giraph on YARN PageRank on Giraph 4. Output is written from the Mappers hosting the calculation, and the job run ends. Map Sort/Shuffle Reduce
  • 35. Fast, Scalable Graph Processing: Apache Giraph on YARN This is all well and good, but must we manipulate Hadoop this way? ?
  • 36. Fast, Scalable Graph Processing: Apache Giraph on YARN Giraph + MapReduce •  Heap and other resources are set once, globally, for all Mappers in the computation. •  No control of which cluster nodes host which tasks. •  No control over how Mappers are scheduled. •  Mapper and Reducer slots abstraction is meaningless for Giraph at best, an artificial limit at worst.
  • 37. Fast, Scalable Graph Processing: Apache Giraph on YARN YARN •  YARN (Yet Another Resource Negotiator) is Hadoop's next-gen job management platform. •  Powers MapReduce v2, but is a general purpose framework that is not tied to the MapReduce paradigm. •  Offers fine-grained control over each task's resource allocations and host placement for clients that need it.
  • 38. Fast, Scalable Graph Processing: Apache Giraph on YARN YARN Architecture
  • 39. Fast, Scalable Graph Processing: Apache Giraph on YARN Giraph + YARN Its a natural fit!
  • 40. Fast, Scalable Graph Processing: Apache Giraph on YARN Giraph + YARN •  Giraph has maintained compatibility with Hadoop since 0.1 release by executing via MapReduce interface. •  Giraph has featured a "pure YARN" build profile since 1.0 release. It supports Hadoop-2.0.3 and trunk. *Patches to add 2.0.4 and 2.0.5 support are in review :) •  Giraph's YARN component is easy to extend or use as a template to port other projects!
  • 41. Fast, Scalable Graph Processing: Apache Giraph on YARN Giraph + YARN: Roadmap •  YARN Application Master allows for more natural and stable bootstrapping of Giraph jobs. •  Zookeeper management can find natural home in Application Master. •  Giraph on YARN can stop borrowing from Hadoop and have its own web interface.
  • 42. Fast, Scalable Graph Processing: Apache Giraph on YARN Giraph + YARN: Roadmap •  Variable per-task resource allocation opens up the possibility of Supertasks to manage graph supernodes. •  Ability to spawn or retire tasks per-iteration enables in- flight reassignment of data partitions. •  AppMaster managed utility tasks such as dedicated sub-aggregators for tree-like aggregation, or data pre- samplers.
  • 43. Fast, Scalable Graph Processing: Apache Giraph on YARN Giraph New Developments •  Decoupling of logic and graph data means tasks host computations that are pluggable per-iteration. •  Support for Giraph job scripting, starting with Jython. More to follow... •  New website, fresh docs, upcoming Manning book, and large, active community means Giraph has never been easier to use or contribute to!
  • 44. Fast, Scalable Graph Processing: Apache Giraph on YARN Great! Where can I learn more? http://giraph.apache.org Mailing List: user@giraph.apache.org