Graph libraries in Matlab: MatlabBGL and gaimc

A tale of two Matlab libraries !
for graph algorithms!
MatlabBGL and gaimc

David F. Gleich
Purdue University

The Setting
recursive spectral graph partitioning

To store an m×n sparse matrix M, Matlab uses compressed column format Compr
2 12 4
The Setting
[Gilbert et al., ]. Matlab never stores a 0 value in a sparse matrix. It always
16 20
rp
“re-compresses” the data structure in these cases. If M is the adjacency matrix
1 10 corresponds to storing the
of a graph, then storing the matrix by columns 4 9 7 6
graph as an in-edge list.
13 4 ci
We briey illustrate compressed row and column storage schemes in g-
3 14 5 ai

ure ..
1 2 3 4 5 6

1
0 16 13

Compressed sparse row0 0 0 Compr

0
rp2
1 3 0 5 10 9 11 0 0 cp

2 12 4

7 12 11
0

3
0 4
16 20
0 0 14

4
0 0 20
1 10 4 9 7 6

2 3 3 9 20 5 0 6 6
13 4 ci
0 0 0 7 0
5

4 3
4 ri
4

ai 13 10 12 4 14 9
16 20 4
7
6
0 0 0 ai
3 14 5
0 0 0

0 Compressed sparse column
0 16 13 0 0
0 0 cp
0 10 12 0 Most graph algorithms are designe
0 0
1 1 3 6 8 9 11
4 0 0 14
in-edge lists. Before running an algo
0 0 9 0 0 20

The Setting

A = load_adjacency_matrix;
L = speye(sum(A,2)) - A;
[V,D] = eigs(L,2,’SA’);
f = V(:,2);
A1 = A(f=0,f=0); A2 = A(f0, f0);*

The Setting

A = load_adjacency_matrix;
L = speye(sum(A,2)) - A;
[V,D] = eigs(L,2,’SA’);
f = V(:,2);
A1 = A(f=0,f=0); A2 = A(f0, f0);*

*Warning Can do much better than this split!

The Problem
disconnected components

The Problem

C = components(A);
??? Undeﬁned function or method
’components' for input arguments of type
'double’.

The Problem
*Warningthis isn’t a
speaking,
Strictly

problem. However, it’s
inefﬁcient to solve
larger eigenproblems
C = components(A);
than required.

??? Undeﬁned function or method
’components' for input arguments of type
'double’.

The Rescue

MESHPART toolkit by
John Gilbert and Sheng-hua Teng

C = components(A);

Uses Matlab’s dmperm function

The Failed Rescue

C = components(A);

caused Matlab to randomly crash

I wanted a fast max-ﬂow routine too

Matlab and the Boost graph library
MatlabBGL

The Recoup
working recursive spectral partitioning
code using Boost graph library in C++
including a max-ﬂow heuristic extension

Boost graph library has a components
function and many other graph
algorithms

Boost has a “generic” graph data-type

The Idea

add graph algorithms to Matlab
naturally using Boost graph library

The Plan

graph data type
= Matlab sparse matrix

results
= “natural” Matlab types

The Plan
A = load_adjacency_matrix
d = bfs(A,1);
d = dijkstra(A,size(A,1));
T = mst(A);
c = components(A);
F = maxﬂow(A,s,t);
test_dag(A)
[ﬂag,K] = test_planarity(A);

The Plan

suitable for large problems
= 10 million edges circa 2006
= avoid copying data

The Catch
Boost graph type
Matlab sparse type

compressed sparse column

vertices(G)
1:n
edges(G)
[i,j,w] = find(A);
num_vertices(G)
size(A,1)
out_edges(G,v)
[~,j,w] = find(A(v,:))
adjacenct(G,v)
[~,j] = find(A(v,:))

graph as an in-edge list.
We briey illustrate compressed row and column storage schemes in g-
ure ..

2 12 4
Compressed sparse row
16 20
rp 1 3 5 7 9 11 11
1 10 4 9 7 6
13 4 ci 2 3 3 4 2 5 3 6 4 6
3 14 5 ai 16 13 10 12 4 14 9 20 7 4

0 0 Compressed sparse column
16 13 0 0
0 0 cp
0 10 12 0
0 0
1 1 3 6 8 9 11
4 0 0 14

0 20
0 9 0 0
0 4 ri 1 3 1 2 4 2 5 3 4 5
0 0 7 0
0 0 ai 16 4 13 10 9 12 7 14 20 4
0 0 0 0

Most graph algorithms are designed to work with out-edge lists instead of

The Compromise

make a transpose when its required
but let “smart” users by-pass it

BGL is largely irrelevant to MatlabBGL. ere is no need for the copy_graph
The Details
function from Boost, for example.
Next, gure . shows the high level architecture of MatlabBGL. ere

dfs dfs
bfs bfs
Sparse Matrix CSR Graph
mst primmst
M code extern c code
mex code c++ code

CSR Graph Boost

Matlab libmbgl

are four main components: m-les, mex-les, libmbgl, and BGL functions.

MatlabBGL – Version 1.0
Released April 2006 on
Matlab File exchange

July ‘06 v2.0 added visitors
April ‘07 v2.1 64-bit Matlab
April ‘08 v3.0 performance improvement
Oct ‘08 v4.0 planarity testing, layout,
structural zeros

Jan ‘12 v5.0 update forthcoming?

Impact
Downloaded over 20,000 times

Used in over 10 publications by others!
including a PNAS article on brain topology

Identiﬁed numerous bugs in the
Boost graph library

Network Partitioning

… and now for a demo …

BGL is largely irrelevant to MatlabBGL. ere is no need for the copy_graph
function from Boost, for example.
Next, gure . shows the high level architecture of MatlabBGL. ere
The Devil of the Details
dfs dfs
bfs bfs
Sparse Matrix CSR Graph
mst primmst
M code extern c code
mex code c++ code

CSR Graph Boost

Matlab libmbgl

Compile mex ﬁles on Compile libmbgl on
OSX/Linux/Win in OSX/Linux/Win in
are four main components: m-les, mex-les, libmbgl, and 64-bit functions.
32-bit and 64-bit mode
32-bit and BGL mode
Let’s illustrate a typical call to a MatlabBGL function: dfs for a depth-rst
search through the graph.

The Devil of the Details
Hard to keep up with changes in Matlab

Hard for users to compile themselves
(changes in Boost and changes in Matlab)

Hard to play around with new algorithms

Mathworks graph library in
bioinformatics toolbox

graph algorithms in matlab code
gaimc

A vision
function n=my1norm(x)
n = 0; for i=1:numel(x), n=n+abs(x(i)); end

x = randn(1e7,1);
tic, n1=my1norm(x); toc
Note
Elapsed time is 0.16 seconds
R2007b on 64-bit linux
tic, n1 = norm(x,1); toc;

A vision
function n=my1norm(x)
n = 0; for i=1:numel(x), n=n+abs(x(i)); end

x = randn(1e7,1);
tic, n1=my1norm(x); toc
Note
R2011a on 64-bit osx
tic, n1 = norm(x,1); toc;

Quite impressed

get within spitting distance of vectorized
performance using Matlab for loops

even faster than some things in python

Another idea

implement graph algorithms in pure
Matlab code

should only be “somewhat” slower

much more portable

More problems

function calls make things REALLY slow
(unless the function is built-in, e.g. abs)

mst and dijkstra need a heap,
a heap in Matlab?

Problem speciﬁcs
function n=my1normfunc(x)
n = 0;for i=1:numel(x),n=n+abs1(x(i)); end
function a=myabs(a), if a0, a=-a; end

tic, n1=mynorm1(x); toc
Note
R2011a on 64-bit osx
tic, n1 = my1normfunc(x,1); toc;

tation of a heap.
ion is inspired by Kahaner []. From a
More generally speaking, algorithms
ap is a binary tree where smaller elements are written in Fortran are excellent can-
A heap in Matlab code
upports the following operations:
didates for the Matlab just-in-time
compiler.

nt to the heap;

ement from the heap with the smallest e array
5 6 7 1 9 6
Old reference
lue of an element in the heap. corresponds to the following tree:

D. K. Kahaner
s (or vectors), and a common way to store a
5

ociate Algorithm 561: a le child
the tree node of index j with
index 2 j + 1. See gure . for an example.
Fortran implementation
Matlab heap will consist of four arrays and one 8 7
of heap programs.
ACM TOMS 1980
tores the identiers of the items in the heap.
1 9 6
the element in tree node i and T(1) is the id
t of the heap tree. Figure 6.3 – Binary trees as arrays.

tores ids of elements in D so that D(T(i)) is

Graph access, take 1
Simple, efﬁcient neighbor access

At = A’;
[v,~,w] = ﬁnd(At(:,u));

Graph access, take 2
Complicated neighbor access

[i,j,w] = ﬁnd(A);
[ai,aj,a] = indexed2csr(i,j,w,size(A,1))

v = aj(ai(u):ai(u+1));

Graph access
bfs, take 1

At=A’; for w=ﬁnd(A(:,v))
tic, d=bfs(A,1), toc
Elapsed time 0.05 seconds

bfs, take 2

indexed2csr(A); for ci=rp(v):rp(v+1) …
tic, d=bfs(A,1), toc
Elapsed time, 0.007 seconds

gaimc
convert input to CSR arrays
run graph algorithms on CSR arrays

bfs, clustering coefﬁents, core numbers,
cosine knn, dfs, dijkstra, ﬂoyd warshall,
mst, strong components

bipartite_matching (Thanks to Ying Wang)

nstances of a random symmetric graph with average degree and
0, and 10000 vertices. e aggregated results of all these tests are sh
gure .. The pudding
function s=mysumsq(x)
14
Standard
12 = 0; Fast
s for i=1:numel(x), s = s + x(i)^2; end

10

x = randn(1e7,1);
Slowdown

8

tic, s1 = mysumsq(x); toc;
6

4
tic, s2 = x’*x; toc
2

0
dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs

6.4 – Performance of the gaimc library. An experimental comparison of the perform

nstances of a random symmetric graph with average degree and
0, and 10000 vertices. e aggregated results of all these tests are sh
gure .. The pudding changes
function s=mysumsq(x)
35
14
Standard
Standard
Fast
12 = 0; Fast
s
30
for i=1:numel(x), s = s + x(i)^2; end

25
10

x = randn(1e7,1);
Slowdown
Slowdown

208

tic, s1 = mysumsq(x); toc;
156

104

tic, s2 = x’*x; toc
52

00
dfs
dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs
scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs

6.4 – Performance of the gaimc library. An experimental comparison of the perform

Afterward
“putting the graph into Matlab”

Matlab could just as easily have been
called “Graphlab” with a few extra
functions

It’s a great environment to play with
graphs as matrices

Graph libraries in Matlab: MatlabBGL and gaimc

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Graph libraries in Matlab: MatlabBGL and gaimc

Similar to Graph libraries in Matlab: MatlabBGL and gaimc (10)

More from David Gleich

More from David Gleich (9)

Recently uploaded

Recently uploaded (20)

Graph libraries in Matlab: MatlabBGL and gaimc