3. To store an m×n sparse matrix M, Matlab uses compressed column format Compr
2 12 4
The Setting
[Gilbert et al., ]. Matlab never stores a 0 value in a sparse matrix. It always
16 20
rp
“re-compresses” the data structure in these cases. If M is the adjacency matrix
1 10 corresponds to storing the
of a graph, then storing the matrix by columns 4 9 7 6
graph as an in-edge list.
recursive spectral graph partitioning
13 4 ci
We briey illustrate compressed row and column storage schemes in g-
3 14 5 ai
ure ..
1 2 3 4 5 6
1
0 16 13
Compressed sparse row0 0 0 Compr
0
rp2
1 3 0 5 10 9 11 0 0 cp
2 12 4
7 12 11
0
3
0 4
16 20
0 0 14
4
0 0 20
1 10 4 9 7 6
2 3 3 9 20 5 0 6 6
13 4 ci
0 0 0 7 0
5
4 3
4 ri
4
ai 13 10 12 4 14 9
16 20 4
7
6
0 0 0 ai
3 14 5
0 0 0
0 Compressed sparse column
0 16 13 0 0
0 0 cp
0 10 12 0 Most graph algorithms are designe
0 0
1 1 3 6 8 9 11
4 0 0 14
in-edge lists. Before running an algo
0 0 9 0 0 20
4. The Setting
recursive spectral graph partitioning
A = load_adjacency_matrix;
L = speye(sum(A,2)) - A;
[V,D] = eigs(L,2,’SA’);
f = V(:,2);
A1 = A(f=0,f=0); A2 = A(f0, f0);*
5. The Setting
recursive spectral graph partitioning
A = load_adjacency_matrix;
L = speye(sum(A,2)) - A;
[V,D] = eigs(L,2,’SA’);
f = V(:,2);
A1 = A(f=0,f=0); A2 = A(f0, f0);*
*Warning Can do much better than this split!
8. The Problem
disconnected components
*Warningthis isn’t a
speaking,
Strictly
problem. However, it’s
inefficient to solve
larger eigenproblems
C = components(A);
than required.
??? Undefined function or method
’components' for input arguments of type
'double’.
12. The Recoup
working recursive spectral partitioning
code using Boost graph library in C++
including a max-flow heuristic extension
Boost graph library has a components
function and many other graph
algorithms
Boost has a “generic” graph data-type
13. The Idea
add graph algorithms to Matlab
naturally using Boost graph library
14. The Plan
graph data type
= Matlab sparse matrix
results
= “natural” Matlab types
15. The Plan
A = load_adjacency_matrix
d = bfs(A,1);
d = dijkstra(A,size(A,1));
T = mst(A);
c = components(A);
F = maxflow(A,s,t);
test_dag(A)
[flag,K] = test_planarity(A);
16. The Plan
suitable for large problems
= 10 million edges circa 2006
= avoid copying data
17. The Catch
Boost graph type
Matlab sparse type
compressed sparse column
vertices(G)
1:n
edges(G)
[i,j,w] = find(A);
num_vertices(G)
size(A,1)
out_edges(G,v)
[~,j,w] = find(A(v,:))
adjacenct(G,v)
[~,j] = find(A(v,:))
18. graph as an in-edge list.
We briey illustrate compressed row and column storage schemes in g-
ure ..
2 12 4
Compressed sparse row
16 20
rp 1 3 5 7 9 11 11
1 10 4 9 7 6
13 4 ci 2 3 3 4 2 5 3 6 4 6
3 14 5 ai 16 13 10 12 4 14 9 20 7 4
0 0 Compressed sparse column
16 13 0 0
0 0 cp
0 10 12 0
0 0
1 1 3 6 8 9 11
4 0 0 14
0 20
0 9 0 0
0 4 ri 1 3 1 2 4 2 5 3 4 5
0 0 7 0
0 0 ai 16 4 13 10 9 12 7 14 20 4
0 0 0 0
Most graph algorithms are designed to work with out-edge lists instead of
20. BGL is largely irrelevant to MatlabBGL. ere is no need for the copy_graph
The Details
function from Boost, for example.
Next, gure . shows the high level architecture of MatlabBGL. ere
dfs dfs
bfs bfs
Sparse Matrix CSR Graph
mst primmst
M code extern c code
mex code c++ code
CSR Graph Boost
Matlab libmbgl
are four main components: m-les, mex-les, libmbgl, and BGL functions.
21. MatlabBGL – Version 1.0
Released April 2006 on
Matlab File exchange
July ‘06 v2.0 added visitors
April ‘07 v2.1 64-bit Matlab
April ‘08 v3.0 performance improvement
Oct ‘08 v4.0 planarity testing, layout,
structural zeros
Jan ‘12 v5.0 update forthcoming?
22. Impact
Downloaded over 20,000 times
Used in over 10 publications by others!
including a PNAS article on brain topology
Identified numerous bugs in the
Boost graph library
25. BGL is largely irrelevant to MatlabBGL. ere is no need for the copy_graph
function from Boost, for example.
Next, gure . shows the high level architecture of MatlabBGL. ere
The Devil of the Details
dfs dfs
bfs bfs
Sparse Matrix CSR Graph
mst primmst
M code extern c code
mex code c++ code
CSR Graph Boost
Matlab libmbgl
Compile mex files on Compile libmbgl on
OSX/Linux/Win in OSX/Linux/Win in
are four main components: m-les, mex-les, libmbgl, and 64-bit functions.
32-bit and 64-bit mode
32-bit and BGL mode
Let’s illustrate a typical call to a MatlabBGL function: dfs for a depth-rst
search through the graph.
26. The Devil of the Details
Hard to keep up with changes in Matlab
Hard for users to compile themselves
(changes in Boost and changes in Matlab)
Hard to play around with new algorithms
Mathworks graph library in
bioinformatics toolbox
28. A vision
function n=my1norm(x)
n = 0; for i=1:numel(x), n=n+abs(x(i)); end
x = randn(1e7,1);
tic, n1=my1norm(x); toc
Note
Elapsed time is 0.16 seconds
R2007b on 64-bit linux
tic, n1 = norm(x,1); toc;
Elapsed time is 0.32 seconds
29. A vision
function n=my1norm(x)
n = 0; for i=1:numel(x), n=n+abs(x(i)); end
x = randn(1e7,1);
tic, n1=my1norm(x); toc
Note
Elapsed time is 0.16 seconds
R2007b on 64-bit linux
tic, n1 = norm(x,1); toc;
Elapsed time is 0.32 seconds
30. A vision
function n=my1norm(x)
n = 0; for i=1:numel(x), n=n+abs(x(i)); end
x = randn(1e7,1);
tic, n1=my1norm(x); toc
Note
Elapsed time is 0.15 seconds
R2011a on 64-bit osx
tic, n1 = norm(x,1); toc;
Elapsed time is 0.1 seconds
31. Quite impressed
get within spitting distance of vectorized
performance using Matlab for loops
even faster than some things in python
33. More problems
function calls make things REALLY slow
(unless the function is built-in, e.g. abs)
mst and dijkstra need a heap,
a heap in Matlab?
34. Problem specifics
function n=my1normfunc(x)
n = 0;for i=1:numel(x),n=n+abs1(x(i)); end
function a=myabs(a), if a0, a=-a; end
tic, n1=mynorm1(x); toc
Note
Elapsed time is 0.15 seconds
R2011a on 64-bit osx
tic, n1 = my1normfunc(x,1); toc;
Elapsed time is 3.16 seconds
35. tation of a heap.
ion is inspired by Kahaner []. From a
More generally speaking, algorithms
ap is a binary tree where smaller elements are written in Fortran are excellent can-
A heap in Matlab code
upports the following operations:
didates for the Matlab just-in-time
compiler.
nt to the heap;
ement from the heap with the smallest e array
5 6 7 1 9 6
Old reference
lue of an element in the heap. corresponds to the following tree:
D. K. Kahaner
s (or vectors), and a common way to store a
5
ociate Algorithm 561: a le child
the tree node of index j with
index 2 j + 1. See gure . for an example.
Fortran implementation
Matlab heap will consist of four arrays and one 8 7
of heap programs.
ACM TOMS 1980
tores the identiers of the items in the heap.
1 9 6
the element in tree node i and T(1) is the id
t of the heap tree. Figure 6.3 – Binary trees as arrays.
tores ids of elements in D so that D(T(i)) is
36. Graph access, take 1
Simple, efficient neighbor access
At = A’;
[v,~,w] = find(At(:,u));
37. Graph access, take 2
Complicated neighbor access
[i,j,w] = find(A);
[ai,aj,a] = indexed2csr(i,j,w,size(A,1))
v = aj(ai(u):ai(u+1));
38. Graph access
bfs, take 1
At=A’; for w=find(A(:,v))
tic, d=bfs(A,1), toc
Elapsed time 0.05 seconds
bfs, take 2
indexed2csr(A); for ci=rp(v):rp(v+1) …
tic, d=bfs(A,1), toc
Elapsed time, 0.007 seconds
39. Graph access
bfs, take 1
At=A’; for w=find(A(:,v))
tic, d=bfs(A,1), toc
Elapsed time 0.05 seconds
bfs, take 2
indexed2csr(A); for ci=rp(v):rp(v+1) …
tic, d=bfs(A,1), toc
Elapsed time, 0.007 seconds
40. gaimc
convert input to CSR arrays
run graph algorithms on CSR arrays
bfs, clustering coeffients, core numbers,
cosine knn, dfs, dijkstra, floyd warshall,
mst, strong components
bipartite_matching (Thanks to Ying Wang)
41. nstances of a random symmetric graph with average degree and
0, and 10000 vertices. e aggregated results of all these tests are sh
gure .. The pudding
function s=mysumsq(x)
14
Standard
12 = 0; Fast
s for i=1:numel(x), s = s + x(i)^2; end
10
x = randn(1e7,1);
Slowdown
8
tic, s1 = mysumsq(x); toc;
6
4
tic, s2 = x’*x; toc
2
0
dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs
6.4 – Performance of the gaimc library. An experimental comparison of the perform
42. nstances of a random symmetric graph with average degree and
0, and 10000 vertices. e aggregated results of all these tests are sh
gure .. The pudding changes
function s=mysumsq(x)
35
14
Standard
Standard
Fast
12 = 0; Fast
s
30
for i=1:numel(x), s = s + x(i)^2; end
25
10
x = randn(1e7,1);
Slowdown
Slowdown
208
tic, s1 = mysumsq(x); toc;
156
104
tic, s2 = x’*x; toc
52
00
dfs
dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs
scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs
6.4 – Performance of the gaimc library. An experimental comparison of the perform
43. Afterward
“putting the graph into Matlab”
Matlab could just as easily have been
called “Graphlab” with a few extra
functions
It’s a great environment to play with
graphs as matrices