SlideShare a Scribd company logo
1 of 93
Download to read offline
Large-Scale Machine Learning and Graphs
Carlos Guestrin
November 15, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
PHASE 1: POSSIBILITY
PHASE 2: SCALABILITY
PHASE 3: USABILITY
Three Phases in Technological
Development

1.
Possibility

2.
Scalability

3.
Usability

Wide
Adoption
Beyond
Experts &
Enthusiast
Machine Learning PHASE 1:
POSSIBILITY
Rosenblatt 1957
Machine Learning PHASE 2:
SCALABILITY
Needless to Say, We Need Machine
Learning for Big Data
6 Billion
Flickr Photos

28 Million
Wikipedia Pages

1 Billion
Facebook Users

72 Hours a Minute
YouTube

“… data a new class of economic
asset, like currency or gold.”
Big Learning
How will we design and implement
parallel learning systems?
MapReduce for Data-Parallel ML
Excellent for large data-parallel tasks!
Data-Parallel

MapReduce
Feature
Extraction

Cross
Validation

Computing Sufficient Statistics

Graph-Parallel

Is there more to
Machine Learning

?
The Power of
Dependencies

where the value is!
Flashback to 1998
Why?

First Google advantage:
a Graph Algorithm & a System to Support it!
It’s all about the
graphs…
Social Media

•

Science

Advertising

Web

Graphs encode the relationships between:

People

Facts

Products

Interests

• Big: 100 billions of vertices and edges and rich metadata
–
–

Facebook (10/2012): 1B users, 144B friendships
Twitter (2011): 15B follower edges

Ideas
Examples of
Graphs in
Machine Learning
Label a Face
and Propagate
Pairwise similarity not enough…

Not similar enough
to be sure
Propagate Similarities & Co-occurrences for
Accurate Predictions

similarity
edges

Probabilistic Graphical Models
co-occurring
faces
further evidence
Collaborative Filtering: Exploiting Dependencies
Women on the Verge of a
Nervous Breakdown

The Celebration

Latent Factor Models
City of God
Non-negative Matrix Factorization
What do I
recommend???

Wild Strawberries

La Dolce Vita
Estimate Political Bias
?

Liberal
?

?
Post

Post

Semi-Supervised & ?
Post
Post
Transductive Learning ?
Post
Post

?
?

Conservative

?

?
Post

Post

?
?

?

?

Post

?

?

?
?

?

?

Post

?

Post

Post

Post
?

?

?

?

Post
?

Post

?

Post

?
?

?

?
Topic Modeling
Cat
Apple
Growth
Hat
Plant

LDA and co.
Machine Learning Pipeline

Data

Extract
Features

Graph
Formation

Structured
Machine
Learning
Algorithm

Value
from
Data
Face labels

images
docs
Movie ratings
Social activity

Doc topics
movie
recommend

sentiment
analysis
ML Tasks Beyond Data-Parallelism
Data-Parallel

Graph-Parallel

Map Reduce
Feature
Extraction

Cross
Validation

Computing Sufficient
Statistics

Graphical Models Semi-Supervised
Gibbs Sampling
Learning
Belief Propagation
Variational Opt.

Collaborative
Filtering

Tensor Factorization

Label Propagation
CoEM
Graph Analysis
PageRank
Triangle Counting
Example of a
Graph-Parallel
Algorithm
PageRank
Depends on rank
of who follows her

Depends on rank
of who follows them…

What’s the rank
of this user?

Rank?

Loops in graph  Must iterate!
PageRank Iteration

Iterate until convergence:
R[j]

“My rank is weighted
average of my friends’ ranks”

wji

R[i]
– α is the random reset probability
– wji is the prob. transitioning (similarity) from j to i
Properties of Graph Parallel Algorithms
Dependency
Graph

Local
Updates

Iterative
Computation
My Rank

Friends Rank
The Need for a New Abstraction
• Need: Asynchronous, Dynamic Parallel Computations

Data-Parallel

Graph-Parallel

Map Reduce
Feature
Extraction

Cross
Validation

Computing Sufficient
Statistics

Graphical Models
Gibbs Sampling
Belief Propagation
Variational Opt.

Collaborative
Filtering
Tensor Factorization

Semi-Supervised
Learning
Label Propagation
CoEM

Data-Mining
PageRank
Triangle Counting
The GraphLab Goals
Know how to
solve ML problem
on 1 machine

Efficient
parallel
predictions
POSSIBILITY
Data Graph
Data associated with vertices and edges
Graph:
• Social Network

Vertex Data:
• User profile text
• Current interests estimates
Edge Data:
• Similarity weights
How do we program
graph computation?

“Think like a Vertex.”
-Malewicz et al. [SIGMOD’10]
Update Functions
User-defined program: applied to
vertex transforms data in scope of vertex
pagerank(i, scope){
// Get Neighborhood data
(R[i], wij, R[j]) scope;

Update function applied (asynchronously)
// R[i] ¬the + (1- a ) å w ´ R[ j];
Update a vertex data
in parallel until convergence
ji

jÎN [i ]

// Reschedule Neighbors if needed

Many schedulers available tochanges thencomputation
if R[i] prioritize
reschedule_neighbors_of(i);
}

Dynamic
computation
The GraphLab Framework
Graph Based
Data Representation

Scheduler

Update Functions
User Computation

Consistency
Model
Alternating Least
Squares

SVD
CoEM

Lasso
LDA

Splash Sampler
Bayesian Tensor
Factorization

Belief Propagation
PageRank

Gibbs Sampling
SVM
K-Means
Dynamic Block Gibbs Sampling
Linear Solvers

…Many others…

Matrix
Factorization
Never Ending Learner Project (CoEM)
Hadoop

95 Cores

7.5 hrs

Distributed
GraphLab

32 EC2 machines 80 secs

0.3% of Hadoop time
2 orders of mag faster 
2 orders of mag cheaper
– ML algorithms as vertex programs
– Asynchronous execution and consistency
models
Thus far…

GraphLab 1 provided exciting
scaling performance
But…

We couldn’t scale up to
Altavista Webgraph 2002
1.4B vertices, 6.7B edges
Natural Graphs

[Image from WikiCommons]
Problem:
Existing distributed graph
computation systems perform
poorly on Natural Graphs
Achilles Heel: Idealized Graph Assumption
Assumed…

Small degree 
Easy to partition

But, Natural Graphs…

Many high degree vertices
(power-law degree distribution)

Very hard to partition
Power-Law Degree Distribution
10

Number of Vertices
count

10

8

10

High-Degree Vertices:
1% vertices adjacent to
50% of edges

6

10

4

10

2

10

AltaVista WebGraph
1.4B Vertices, 6.6B Edges

0

10

0

10

2

10

4

10
degree

Degree

6

10

8

10
High Degree Vertices are Common
Popular Movies
Users

“Social” People

Netflix
Movies

Hyper Parameters

θθ
ZZ θ θ
ZZ
Z
Z

ww Z Z Z
ww Z Z
ww
ww Z
ww Z
ww
ww
ww

b

Docs

α

Common Words

LDA
Words

Obama
Power-Law Degree Distribution

“Star Like” Motif
President
Obama

Followers
Problem: High Degree Vertices  High
Communication for Distributed Updates
Data transmitted
across network
Y
O(# cut edges)

Natural graphs do not have low-cost balanced cuts

Machine 1

[Leskovec et al. 08, Lang 04]

Machine 2

Popular partitioning tools (Metis, Chaco,…) perform poorly
[Abou-Rjeili et al. 06]
Extremely slow and require substantial memory
Random Partitioning
• Both GraphLab 1, Pregel, Twitter, Facebook,… rely on
Random (hashed) partitioning for Natural Graphs

For p Machines:

10 Machines  90% of edges cut
Machine 1 Machine 2
100 Machines  99% of edges cut!
All data is communicated… Little advantage over MapReduce
In Summary
GraphLab 1 and Pregel are not well
suited for natural graphs
• Poor performance on high-degree vertices
• Low Quality Partitioning
PowerGraph

SCALABILITY

2
Common Pattern for Update Fncs.
R[j]
GraphLab_PageRank(i)
// Compute sum over neighbors
total = 0
foreach( j in in_neighbors(i)):
total = total + R[j] * wji
// Update the PageRank
R[i] = 0.1 + total
// Trigger neighbors to run again
if R[i] not converged then
foreach( j in out_neighbors(i))
signal vertex-program on j

wji
R[i]
Gather Information
About Neighborhood
Apply Update to Vertex

Scatter Signal to Neighbors
& Modify Edge Data
GAS Decomposition
Gather (Reduce)

Apply

Scatter

Accumulate information
about neighborhood

Apply the accumulated
value to center vertex

Update adjacent edges
and vertices.

Σ

+…+


Y

+

Y

Parallel
“Sum”

Y

Y

Y

Y
’

Y’
Many ML Algorithms fit
into GAS Model
graph analytics, inference in graphical
models, matrix factorization,
collaborative filtering, clustering, LDA, …
Minimizing Communication in GL2
PowerGraph: Vertex Cuts
GL2 PowerGraph includes novel vertex cut algorithms
Y
Communication linear

Provides order in #magnitude gains in performance
of spanned machines

A vertex-cut minimizes # machines per vertex
Percolation theory suggests Power Law graphs can be split by
removing only a small set of vertices [Albert et al. 2000]


Small vertex cuts possible!
2

From the Abstraction
to a System
Triangle Counting on Twitter Graph
34.8 Billion Triangles

Hadoop
[WWW’11]

1636 Machines
423 Minutes

64 Machines
15 Seconds
Why? Wrong Abstraction 
Broadcast O(degree2) messages per Vertex
S. Suri and S. Vassilvitskii, “Counting triangles and the curse of the last reducer,” WWW’11
Topic Modeling (LDA)

Specifically engineered for this task
64 cc2.8xlarge EC2 Nodes
200 lines of code & 4 human hours

• English language Wikipedia
– 2.6M Documents, 8.3M Words, 500M Tokens
– Computationally intensive algorithm
How well does GraphLab scale?
Yahoo Altavista Web Graph (2002):
One of the largest publicly available webgraphs

1.4B Webpages, 6.7 Billion Links
7 seconds per iter.
1B links processed per second
30 lines of Nodes code
64 HPC user
1024 Cores (2048 HT)

4.4 TB RAM
GraphChi: Going small with GraphLab

Solve huge problems on
small or embedded devices?

Key: Exploit non-volatile memory
(starting with SSDs and HDs)
GraphChi – disk-based GraphLab
Challenge:
Random Accesses

Novel GraphChi solution:
Parallel sliding windows method 
minimizes number of random accesses
Triangle Counting on Twitter Graph
40M Users
Total: 34.8 Billion Triangles
1.2B Edges

1636 Machines
423 Minutes
59 Minutes, 1 Mac Mini!

64 Machines, 1024 Cores
15 Seconds
Hadoop results from [Suri
& Vassilvitskii '11]
– ML algorithms as vertex programs
– Asynchronous execution and
consistency models

PowerGraph 2

– Natural graphs change the nature of
computation
– Vertex cuts and gather/apply/scatter
model
GL2 PowerGraph
focused on Scalability
at the loss of Usability
GraphLab 1
PageRank(i, scope){
acc = 0
for (j in InNeighbors) {
acc += pr[j] * edge[j].weight
}
pr[i] = 0.15 + 0.85 * acc
}
Explicitly described operations

Code is intuitive
GraphLab 1

GL2 PowerGraph
Implicit operation

PageRank(i, scope){
acc = 0
for (j in InNeighbors) {
acc += pr[j] * edge[j].weight
}
pr[i] = 0.15 + 0.85 * acc
}

gather(edge) {
return edge.source.value *
edge.weight
}
merge(acc1, acc2) {
return accum1 + accum2
}
Implicit

Explicitly described operations

Code is intuitive

apply(v, accum) {
aggregation
v.pr = 0.15 + 0.85 * acc
}

Need to understand engine to
understand code
Great flexibility,
but hit scalability wall

Scalability,
but very rigid abstraction
(many contortions needed to implement
SVD++, Restricted Boltzmann Machines)

What now?
WarpGraph

USABILITY

3
GL3 WarpGraph Goals
Program
Like GraphLab 1

Run Like
GraphLab 2
Machine 1

Machine 2
Fine-Grained Primitives
Expose Neighborhood Operations through Parallelizable Iterators

Y

PageRankUpdateFunction(Y) {
Y.pagerank = 0.15 + 0.85 *
MapReduceNeighbors(
lambda nbr: nbr.pagerank*nbr.weight,
(aggregate sum over neighbors)
lambda (a,b): a + b
)
}
Expressive, Extensible Neighborhood API0
MapReduce over
Neighbors

Parallel Transform
Adjacent Edges

Broadcast

Y

Y

Y

Modify adjacent edges

Schedule a selected subset
of adjacent vertices

Y

+ …+
Y

+
Y

Parallel
Sum
Can express every GL2 PowerGraph program
(more easily) in GL3 WarpGraph
But GL3 is
more
expressive

Multiple
gathers

UpdateFunction(v) {
if (v.data == 1)
accum = MapReduceNeighs(g,m)
else ...
}

Scatter before
gather

Conditional
execution
Graph Coloring
GL2 PowerGraph

Twitter Graph: 41M Vertices 1.4B Edges

227 seconds

GL3 WarpGraph 89 seconds

2.5x Faster

WarpGraph outperforms PowerGraph
with simpler code
32 Nodes x 16 Cores (EC2 HPC cc2.8x)
– ML algorithms as vertex programs
– Asynchronous execution and consistency models

PowerGraph 2

WarpGraph

– Natural graphs change the nature of computation
– Vertex cuts and gather/apply/scatter model

– Usability is key
– Access neighborhood through parallelizable
iterators and latency hiding
Usability
RECENT RELEASE: GRAPHLAB 2.2,
INCLUDING WARPGRAPH ENGINE
And support for
streaming/dynamic graphs!

Consensus that WarpGraph is much
easier to use than PowerGraph
“User study” group biased… :-)
Usability for Whom???
GL2
PowerGraph

GL3
WarpGraph

…
Machine Learning
PHASE 3
USABILITY
Exciting Time to Work in ML
With Big Data,
I’ll take over
the world!!!

We met
because of
Big Data

Why won’t
Big Data read
my mind???

Unique opportunities to change the world!! 
But, every deployed system is an one-off solution,
and requires PhDs to make work… 
But…
ML key to any
new service we
want to build

Even basics of scalable ML
can be challenging
6 months from R/Matlab to
production, at best
State-of-art ML algorithms
trapped in research papers

Goal of GraphLab 3:
Make huge-scale machine learning accessible to all! 
Step 1
Learning ML in Practice
with GraphLab Notebook
Step 2
GraphLab+Python:
ML Prototype to Production
Learn:
GraphLab
Notebook

Prototype:
pip install graphlab

local prototyping

Production:
Same code scales execute on EC2
cluster
Step 3
GraphLab Toolkits:
Integrated State-of-the-Art
ML in Production
GraphLab Toolkits
Highly scalable, state-of-the-art
machine learning straight from python
Graph
Analytics

Graphical
Models

Computer
Vision

Clustering

Topic
Modeling

Collaborative
Filtering
Now with GraphLab: Learn/Prototype/Deploy
Even basics of scalable ML
can be challenging

Learn ML with
GraphLab Notebook

6 months from R/Matlab to
production, at best

pip install graphlab
then deploy on EC2

State-of-art ML algorithms
trapped in research papers

Fully integrated
via GraphLab Toolkits
We’re selecting strategic partners

Help define our strategy & priorities
And, get the value of GraphLab in your company

partners@graphlab.com
Possibility
Scalability
Usability
GraphLab 2.2 available now: graphlab.com
Define our future: partners@graphlab.com
Needless to say: jobs@graphlab.com
Please give us your feedback on this
presentation

BDT204
As a thank you, we will select prize
winners daily for completed surveys!

More Related Content

What's hot

Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoSri Ambati
 
Data Science in Future Tense
Data Science in Future TenseData Science in Future Tense
Data Science in Future TensePaco Nathan
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinDatabricks
 
Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingJan Wiegelmann
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionSymeon Papadopoulos
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingPaco Nathan
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming DataGeoffrey Fox
 
Graph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierGraph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierDemai Ni
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...jins0618
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open PlatformJongwook Woo
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineJan Wiegelmann
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...MLconf
 
What the Bleep is Big Data? A Holistic View of Data and Algorithms
What the Bleep is Big Data? A Holistic View of Data and AlgorithmsWhat the Bleep is Big Data? A Holistic View of Data and Algorithms
What the Bleep is Big Data? A Holistic View of Data and AlgorithmsAlice Zheng
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersSaliya Ekanayake
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learningArnaud Rachez
 

What's hot (20)

Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Deep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry LarkoDeep Learning with MXNet - Dmitry Larko
Deep Learning with MXNet - Dmitry Larko
 
Data Science in Future Tense
Data Science in Future TenseData Science in Future Tense
Data Science in Future Tense
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
 
Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous Driving
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely heading
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 
Graph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierGraph Data: a New Data Management Frontier
Graph Data: a New Data Management Frontier
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
(Big) Data Science
(Big) Data Science(Big) Data Science
(Big) Data Science
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open Platform
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / Pipeline
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
What the Bleep is Big Data? A Holistic View of Data and Algorithms
What the Bleep is Big Data? A Holistic View of Data and AlgorithmsWhat the Bleep is Big Data? A Holistic View of Data and Algorithms
What the Bleep is Big Data? A Holistic View of Data and Algorithms
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
 

Viewers also liked

Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXAmir Payberah
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAmazon Web Services
 
Overview of Windows on AWS (CPN206) | AWS re:Invent 2013
Overview of Windows on AWS (CPN206) | AWS re:Invent 2013Overview of Windows on AWS (CPN206) | AWS re:Invent 2013
Overview of Windows on AWS (CPN206) | AWS re:Invent 2013Amazon Web Services
 
Graphlab under the hood
Graphlab under the hoodGraphlab under the hood
Graphlab under the hoodZuhair khayyat
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013MLconf
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processinghuguk
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseLarge-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseAapo Kyrölä
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processingjins0618
 
Mastering the AWS SDK for PHP (TLS306) | AWS re:Invent 2013
Mastering the AWS SDK for PHP (TLS306) | AWS re:Invent 2013Mastering the AWS SDK for PHP (TLS306) | AWS re:Invent 2013
Mastering the AWS SDK for PHP (TLS306) | AWS re:Invent 2013Amazon Web Services
 
AWS Webcast - Implementing SAP Solutions on the AWS Cloud
AWS Webcast - Implementing SAP Solutions on the AWS CloudAWS Webcast - Implementing SAP Solutions on the AWS Cloud
AWS Webcast - Implementing SAP Solutions on the AWS CloudAmazon Web Services
 
(ADV403) Dynamic Ad Perf. Reporting w/ Redshift: Data Science, Queries at Sca...
(ADV403) Dynamic Ad Perf. Reporting w/ Redshift: Data Science, Queries at Sca...(ADV403) Dynamic Ad Perf. Reporting w/ Redshift: Data Science, Queries at Sca...
(ADV403) Dynamic Ad Perf. Reporting w/ Redshift: Data Science, Queries at Sca...Amazon Web Services
 
Billions and Billions: Machines, Algorithms, and Growing Business in Programa...
Billions and Billions: Machines, Algorithms, and Growing Business in Programa...Billions and Billions: Machines, Algorithms, and Growing Business in Programa...
Billions and Billions: Machines, Algorithms, and Growing Business in Programa...MediaMath
 
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013Amazon Web Services
 
Realizing Customer-Centric Marketing with Programmatic Technology
Realizing Customer-Centric Marketing with Programmatic TechnologyRealizing Customer-Centric Marketing with Programmatic Technology
Realizing Customer-Centric Marketing with Programmatic TechnologyMediaMath
 
How MediaMath Built Faster, Scalable Attribution Reporting with Hadoop-Impala
How MediaMath Built Faster, Scalable Attribution Reporting with Hadoop-ImpalaHow MediaMath Built Faster, Scalable Attribution Reporting with Hadoop-Impala
How MediaMath Built Faster, Scalable Attribution Reporting with Hadoop-ImpalaMediaMath
 
(PFC303) Milliseconds Matter: Design, Deploy, and Operate Your Application fo...
(PFC303) Milliseconds Matter: Design, Deploy, and Operate Your Application fo...(PFC303) Milliseconds Matter: Design, Deploy, and Operate Your Application fo...
(PFC303) Milliseconds Matter: Design, Deploy, and Operate Your Application fo...Amazon Web Services
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Amazon Web Services
 

Viewers also liked (20)

PowerGraph
PowerGraphPowerGraph
PowerGraph
 
Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphX
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
 
Overview of Windows on AWS (CPN206) | AWS re:Invent 2013
Overview of Windows on AWS (CPN206) | AWS re:Invent 2013Overview of Windows on AWS (CPN206) | AWS re:Invent 2013
Overview of Windows on AWS (CPN206) | AWS re:Invent 2013
 
Graphlab under the hood
Graphlab under the hoodGraphlab under the hood
Graphlab under the hood
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processing
 
CS267_Graph_Lab
CS267_Graph_LabCS267_Graph_Lab
CS267_Graph_Lab
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseLarge-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
Mastering the AWS SDK for PHP (TLS306) | AWS re:Invent 2013
Mastering the AWS SDK for PHP (TLS306) | AWS re:Invent 2013Mastering the AWS SDK for PHP (TLS306) | AWS re:Invent 2013
Mastering the AWS SDK for PHP (TLS306) | AWS re:Invent 2013
 
AWS Webcast - Implementing SAP Solutions on the AWS Cloud
AWS Webcast - Implementing SAP Solutions on the AWS CloudAWS Webcast - Implementing SAP Solutions on the AWS Cloud
AWS Webcast - Implementing SAP Solutions on the AWS Cloud
 
(ADV403) Dynamic Ad Perf. Reporting w/ Redshift: Data Science, Queries at Sca...
(ADV403) Dynamic Ad Perf. Reporting w/ Redshift: Data Science, Queries at Sca...(ADV403) Dynamic Ad Perf. Reporting w/ Redshift: Data Science, Queries at Sca...
(ADV403) Dynamic Ad Perf. Reporting w/ Redshift: Data Science, Queries at Sca...
 
Billions and Billions: Machines, Algorithms, and Growing Business in Programa...
Billions and Billions: Machines, Algorithms, and Growing Business in Programa...Billions and Billions: Machines, Algorithms, and Growing Business in Programa...
Billions and Billions: Machines, Algorithms, and Growing Business in Programa...
 
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
 
Realizing Customer-Centric Marketing with Programmatic Technology
Realizing Customer-Centric Marketing with Programmatic TechnologyRealizing Customer-Centric Marketing with Programmatic Technology
Realizing Customer-Centric Marketing with Programmatic Technology
 
How MediaMath Built Faster, Scalable Attribution Reporting with Hadoop-Impala
How MediaMath Built Faster, Scalable Attribution Reporting with Hadoop-ImpalaHow MediaMath Built Faster, Scalable Attribution Reporting with Hadoop-Impala
How MediaMath Built Faster, Scalable Attribution Reporting with Hadoop-Impala
 
(PFC303) Milliseconds Matter: Design, Deploy, and Operate Your Application fo...
(PFC303) Milliseconds Matter: Design, Deploy, and Operate Your Application fo...(PFC303) Milliseconds Matter: Design, Deploy, and Operate Your Application fo...
(PFC303) Milliseconds Matter: Design, Deploy, and Operate Your Application fo...
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
 
Graph Analytics
Graph AnalyticsGraph Analytics
Graph Analytics
 

Similar to GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedTuri, Inc.
 
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...AMD Developer Central
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinTuri, Inc.
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big datajins0618
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONScscpconf
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions csandit
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQLLuigi Dell'Aquila
 
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...Dataconomy Media
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphsStanka Dalekova
 
Big Data : Bits of History, Words of Advice
Big Data : Bits of History, Words of AdviceBig Data : Bits of History, Words of Advice
Big Data : Bits of History, Words of AdviceVenu Vasudevan
 
How Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolutionHow Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolutionLuca Garulli
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is DistributedAlluxio, Inc.
 
HILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill HoweHILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill Howedomoritz
 

Similar to GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013 (20)

F14 lec12graphs
F14 lec12graphsF14 lec12graphs
F14 lec12graphs
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos Guestrin
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big data
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQL
 
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 
Big Data : Bits of History, Words of Advice
Big Data : Bits of History, Words of AdviceBig Data : Bits of History, Words of Advice
Big Data : Bits of History, Words of Advice
 
Addressing dm-cloud
Addressing dm-cloudAddressing dm-cloud
Addressing dm-cloud
 
How Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolutionHow Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolution
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
HILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill HoweHILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill Howe
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 

GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013

  • 1. Large-Scale Machine Learning and Graphs Carlos Guestrin November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 3.
  • 5.
  • 7.
  • 8. Three Phases in Technological Development 1. Possibility 2. Scalability 3. Usability Wide Adoption Beyond Experts & Enthusiast
  • 9. Machine Learning PHASE 1: POSSIBILITY
  • 11.
  • 12. Machine Learning PHASE 2: SCALABILITY
  • 13. Needless to Say, We Need Machine Learning for Big Data 6 Billion Flickr Photos 28 Million Wikipedia Pages 1 Billion Facebook Users 72 Hours a Minute YouTube “… data a new class of economic asset, like currency or gold.”
  • 14. Big Learning How will we design and implement parallel learning systems?
  • 15. MapReduce for Data-Parallel ML Excellent for large data-parallel tasks! Data-Parallel MapReduce Feature Extraction Cross Validation Computing Sufficient Statistics Graph-Parallel Is there more to Machine Learning ?
  • 16.
  • 17.
  • 18.
  • 20. Flashback to 1998 Why? First Google advantage: a Graph Algorithm & a System to Support it!
  • 21. It’s all about the graphs…
  • 22. Social Media • Science Advertising Web Graphs encode the relationships between: People Facts Products Interests • Big: 100 billions of vertices and edges and rich metadata – – Facebook (10/2012): 1B users, 144B friendships Twitter (2011): 15B follower edges Ideas
  • 24. Label a Face and Propagate
  • 25. Pairwise similarity not enough… Not similar enough to be sure
  • 26. Propagate Similarities & Co-occurrences for Accurate Predictions similarity edges Probabilistic Graphical Models co-occurring faces further evidence
  • 27. Collaborative Filtering: Exploiting Dependencies Women on the Verge of a Nervous Breakdown The Celebration Latent Factor Models City of God Non-negative Matrix Factorization What do I recommend??? Wild Strawberries La Dolce Vita
  • 28. Estimate Political Bias ? Liberal ? ? Post Post Semi-Supervised & ? Post Post Transductive Learning ? Post Post ? ? Conservative ? ? Post Post ? ? ? ? Post ? ? ? ? ? ? Post ? Post Post Post ? ? ? ? Post ? Post ? Post ? ? ? ?
  • 30. Machine Learning Pipeline Data Extract Features Graph Formation Structured Machine Learning Algorithm Value from Data Face labels images docs Movie ratings Social activity Doc topics movie recommend sentiment analysis
  • 31. ML Tasks Beyond Data-Parallelism Data-Parallel Graph-Parallel Map Reduce Feature Extraction Cross Validation Computing Sufficient Statistics Graphical Models Semi-Supervised Gibbs Sampling Learning Belief Propagation Variational Opt. Collaborative Filtering Tensor Factorization Label Propagation CoEM Graph Analysis PageRank Triangle Counting
  • 33. PageRank Depends on rank of who follows her Depends on rank of who follows them… What’s the rank of this user? Rank? Loops in graph  Must iterate!
  • 34. PageRank Iteration Iterate until convergence: R[j] “My rank is weighted average of my friends’ ranks” wji R[i] – α is the random reset probability – wji is the prob. transitioning (similarity) from j to i
  • 35. Properties of Graph Parallel Algorithms Dependency Graph Local Updates Iterative Computation My Rank Friends Rank
  • 36. The Need for a New Abstraction • Need: Asynchronous, Dynamic Parallel Computations Data-Parallel Graph-Parallel Map Reduce Feature Extraction Cross Validation Computing Sufficient Statistics Graphical Models Gibbs Sampling Belief Propagation Variational Opt. Collaborative Filtering Tensor Factorization Semi-Supervised Learning Label Propagation CoEM Data-Mining PageRank Triangle Counting
  • 37. The GraphLab Goals Know how to solve ML problem on 1 machine Efficient parallel predictions
  • 39. Data Graph Data associated with vertices and edges Graph: • Social Network Vertex Data: • User profile text • Current interests estimates Edge Data: • Similarity weights
  • 40. How do we program graph computation? “Think like a Vertex.” -Malewicz et al. [SIGMOD’10]
  • 41. Update Functions User-defined program: applied to vertex transforms data in scope of vertex pagerank(i, scope){ // Get Neighborhood data (R[i], wij, R[j]) scope; Update function applied (asynchronously) // R[i] ¬the + (1- a ) å w ´ R[ j]; Update a vertex data in parallel until convergence ji jÎN [i ] // Reschedule Neighbors if needed Many schedulers available tochanges thencomputation if R[i] prioritize reschedule_neighbors_of(i); } Dynamic computation
  • 42. The GraphLab Framework Graph Based Data Representation Scheduler Update Functions User Computation Consistency Model
  • 43. Alternating Least Squares SVD CoEM Lasso LDA Splash Sampler Bayesian Tensor Factorization Belief Propagation PageRank Gibbs Sampling SVM K-Means Dynamic Block Gibbs Sampling Linear Solvers …Many others… Matrix Factorization
  • 44. Never Ending Learner Project (CoEM) Hadoop 95 Cores 7.5 hrs Distributed GraphLab 32 EC2 machines 80 secs 0.3% of Hadoop time 2 orders of mag faster  2 orders of mag cheaper
  • 45. – ML algorithms as vertex programs – Asynchronous execution and consistency models
  • 46. Thus far… GraphLab 1 provided exciting scaling performance But… We couldn’t scale up to Altavista Webgraph 2002 1.4B vertices, 6.7B edges
  • 48. Problem: Existing distributed graph computation systems perform poorly on Natural Graphs
  • 49. Achilles Heel: Idealized Graph Assumption Assumed… Small degree  Easy to partition But, Natural Graphs… Many high degree vertices (power-law degree distribution)  Very hard to partition
  • 50. Power-Law Degree Distribution 10 Number of Vertices count 10 8 10 High-Degree Vertices: 1% vertices adjacent to 50% of edges 6 10 4 10 2 10 AltaVista WebGraph 1.4B Vertices, 6.6B Edges 0 10 0 10 2 10 4 10 degree Degree 6 10 8 10
  • 51. High Degree Vertices are Common Popular Movies Users “Social” People Netflix Movies Hyper Parameters θθ ZZ θ θ ZZ Z Z ww Z Z Z ww Z Z ww ww Z ww Z ww ww ww b Docs α Common Words LDA Words Obama
  • 52. Power-Law Degree Distribution “Star Like” Motif President Obama Followers
  • 53. Problem: High Degree Vertices  High Communication for Distributed Updates Data transmitted across network Y O(# cut edges) Natural graphs do not have low-cost balanced cuts Machine 1 [Leskovec et al. 08, Lang 04] Machine 2 Popular partitioning tools (Metis, Chaco,…) perform poorly [Abou-Rjeili et al. 06] Extremely slow and require substantial memory
  • 54. Random Partitioning • Both GraphLab 1, Pregel, Twitter, Facebook,… rely on Random (hashed) partitioning for Natural Graphs For p Machines: 10 Machines  90% of edges cut Machine 1 Machine 2 100 Machines  99% of edges cut! All data is communicated… Little advantage over MapReduce
  • 55. In Summary GraphLab 1 and Pregel are not well suited for natural graphs • Poor performance on high-degree vertices • Low Quality Partitioning
  • 57. Common Pattern for Update Fncs. R[j] GraphLab_PageRank(i) // Compute sum over neighbors total = 0 foreach( j in in_neighbors(i)): total = total + R[j] * wji // Update the PageRank R[i] = 0.1 + total // Trigger neighbors to run again if R[i] not converged then foreach( j in out_neighbors(i)) signal vertex-program on j wji R[i] Gather Information About Neighborhood Apply Update to Vertex Scatter Signal to Neighbors & Modify Edge Data
  • 58. GAS Decomposition Gather (Reduce) Apply Scatter Accumulate information about neighborhood Apply the accumulated value to center vertex Update adjacent edges and vertices. Σ +…+  Y + Y Parallel “Sum” Y Y Y Y ’ Y’
  • 59. Many ML Algorithms fit into GAS Model graph analytics, inference in graphical models, matrix factorization, collaborative filtering, clustering, LDA, …
  • 60. Minimizing Communication in GL2 PowerGraph: Vertex Cuts GL2 PowerGraph includes novel vertex cut algorithms Y Communication linear  Provides order in #magnitude gains in performance of spanned machines A vertex-cut minimizes # machines per vertex Percolation theory suggests Power Law graphs can be split by removing only a small set of vertices [Albert et al. 2000]  Small vertex cuts possible!
  • 62. Triangle Counting on Twitter Graph 34.8 Billion Triangles Hadoop [WWW’11] 1636 Machines 423 Minutes 64 Machines 15 Seconds Why? Wrong Abstraction  Broadcast O(degree2) messages per Vertex S. Suri and S. Vassilvitskii, “Counting triangles and the curse of the last reducer,” WWW’11
  • 63. Topic Modeling (LDA) Specifically engineered for this task 64 cc2.8xlarge EC2 Nodes 200 lines of code & 4 human hours • English language Wikipedia – 2.6M Documents, 8.3M Words, 500M Tokens – Computationally intensive algorithm
  • 64. How well does GraphLab scale? Yahoo Altavista Web Graph (2002): One of the largest publicly available webgraphs 1.4B Webpages, 6.7 Billion Links 7 seconds per iter. 1B links processed per second 30 lines of Nodes code 64 HPC user 1024 Cores (2048 HT) 4.4 TB RAM
  • 65. GraphChi: Going small with GraphLab Solve huge problems on small or embedded devices? Key: Exploit non-volatile memory (starting with SSDs and HDs)
  • 66. GraphChi – disk-based GraphLab Challenge: Random Accesses Novel GraphChi solution: Parallel sliding windows method  minimizes number of random accesses
  • 67. Triangle Counting on Twitter Graph 40M Users Total: 34.8 Billion Triangles 1.2B Edges 1636 Machines 423 Minutes 59 Minutes, 1 Mac Mini! 64 Machines, 1024 Cores 15 Seconds Hadoop results from [Suri & Vassilvitskii '11]
  • 68. – ML algorithms as vertex programs – Asynchronous execution and consistency models PowerGraph 2 – Natural graphs change the nature of computation – Vertex cuts and gather/apply/scatter model
  • 69. GL2 PowerGraph focused on Scalability at the loss of Usability
  • 70. GraphLab 1 PageRank(i, scope){ acc = 0 for (j in InNeighbors) { acc += pr[j] * edge[j].weight } pr[i] = 0.15 + 0.85 * acc } Explicitly described operations Code is intuitive
  • 71. GraphLab 1 GL2 PowerGraph Implicit operation PageRank(i, scope){ acc = 0 for (j in InNeighbors) { acc += pr[j] * edge[j].weight } pr[i] = 0.15 + 0.85 * acc } gather(edge) { return edge.source.value * edge.weight } merge(acc1, acc2) { return accum1 + accum2 } Implicit Explicitly described operations Code is intuitive apply(v, accum) { aggregation v.pr = 0.15 + 0.85 * acc } Need to understand engine to understand code
  • 72. Great flexibility, but hit scalability wall Scalability, but very rigid abstraction (many contortions needed to implement SVD++, Restricted Boltzmann Machines) What now?
  • 74. GL3 WarpGraph Goals Program Like GraphLab 1 Run Like GraphLab 2 Machine 1 Machine 2
  • 75. Fine-Grained Primitives Expose Neighborhood Operations through Parallelizable Iterators Y PageRankUpdateFunction(Y) { Y.pagerank = 0.15 + 0.85 * MapReduceNeighbors( lambda nbr: nbr.pagerank*nbr.weight, (aggregate sum over neighbors) lambda (a,b): a + b ) }
  • 76. Expressive, Extensible Neighborhood API0 MapReduce over Neighbors Parallel Transform Adjacent Edges Broadcast Y Y Y Modify adjacent edges Schedule a selected subset of adjacent vertices Y + …+ Y + Y Parallel Sum
  • 77. Can express every GL2 PowerGraph program (more easily) in GL3 WarpGraph But GL3 is more expressive Multiple gathers UpdateFunction(v) { if (v.data == 1) accum = MapReduceNeighs(g,m) else ... } Scatter before gather Conditional execution
  • 78. Graph Coloring GL2 PowerGraph Twitter Graph: 41M Vertices 1.4B Edges 227 seconds GL3 WarpGraph 89 seconds 2.5x Faster WarpGraph outperforms PowerGraph with simpler code 32 Nodes x 16 Cores (EC2 HPC cc2.8x)
  • 79. – ML algorithms as vertex programs – Asynchronous execution and consistency models PowerGraph 2 WarpGraph – Natural graphs change the nature of computation – Vertex cuts and gather/apply/scatter model – Usability is key – Access neighborhood through parallelizable iterators and latency hiding
  • 80. Usability RECENT RELEASE: GRAPHLAB 2.2, INCLUDING WARPGRAPH ENGINE And support for streaming/dynamic graphs! Consensus that WarpGraph is much easier to use than PowerGraph “User study” group biased… :-)
  • 83. Exciting Time to Work in ML With Big Data, I’ll take over the world!!! We met because of Big Data Why won’t Big Data read my mind??? Unique opportunities to change the world!!  But, every deployed system is an one-off solution, and requires PhDs to make work… 
  • 84. But… ML key to any new service we want to build Even basics of scalable ML can be challenging 6 months from R/Matlab to production, at best State-of-art ML algorithms trapped in research papers Goal of GraphLab 3: Make huge-scale machine learning accessible to all! 
  • 85. Step 1 Learning ML in Practice with GraphLab Notebook
  • 87. Learn: GraphLab Notebook Prototype: pip install graphlab  local prototyping Production: Same code scales execute on EC2 cluster
  • 88. Step 3 GraphLab Toolkits: Integrated State-of-the-Art ML in Production
  • 89. GraphLab Toolkits Highly scalable, state-of-the-art machine learning straight from python Graph Analytics Graphical Models Computer Vision Clustering Topic Modeling Collaborative Filtering
  • 90. Now with GraphLab: Learn/Prototype/Deploy Even basics of scalable ML can be challenging Learn ML with GraphLab Notebook 6 months from R/Matlab to production, at best pip install graphlab then deploy on EC2 State-of-art ML algorithms trapped in research papers Fully integrated via GraphLab Toolkits
  • 91. We’re selecting strategic partners Help define our strategy & priorities And, get the value of GraphLab in your company partners@graphlab.com
  • 92. Possibility Scalability Usability GraphLab 2.2 available now: graphlab.com Define our future: partners@graphlab.com Needless to say: jobs@graphlab.com
  • 93. Please give us your feedback on this presentation BDT204 As a thank you, we will select prize winners daily for completed surveys!