SlideShare a Scribd company logo
1 of 43
Download to read offline
A tale of two Matlab libraries !
for graph algorithms!
MatlabBGL and gaimc

   David F. Gleich
   Purdue University
The Setting
recursive spectral graph partitioning
To store an m×n sparse matrix M, Matlab uses compressed column format    Compr
                                                    2    12   4
 The Setting
[Gilbert et al., ]. Matlab never stores a 0 value in a sparse matrix. It always
                                              16                   20
                                                                              rp
“re-compresses” the data structure in these cases. If M is the adjacency matrix
                                         1        10 corresponds to storing the
of a graph, then storing the matrix by columns 4 9 7                    6
graph as an in-edge list.
 recursive spectral graph partitioning
       13                    4         ci
    We briey illustrate compressed row and column storage schemes in g-
                                                    3    14   5               ai
 
ure ..
                                                    1 2 3 4 5           6
 
                                             1
 0 16 13
                                                  
                                            Compressed sparse row0 0     0  Compr
                                                                           
                                                  0
                                             rp2
 1 3 0 5 10 9 11 0    0  cp
                                                                           
            2     12    4

                                                           7 12 11
                                                                        0
 
                                             3
 0 4                     
      16                         20
                                                            0 0 14
                                                                          
                                               4
 0 0                  20
  1        10 4   9     7               6
 
                                                2 3 3 9 20 5 0      6  6 
      13                         4           ci
                                                  0 0 0 7 0
                                               5
 
                                                            4      3
                                                                        4  ri
                                                                           4
 
                                           ai  13 10 12 4 14 9
                                                  16                   20  4
                                                                           7
                                               6
 0 0                  0  ai
            3     14    5
                                                            0 0 0
 
                                             
                                      0  Compressed sparse column
0    16    13      0        0           
0                                    0  cp
     0     10     12       0                    Most graph algorithms are designe
0                                    0
                                                1 1 3 6  8 9 11
     4      0      0       14           
                                             in-edge lists. Before running an algo
0    0      9      0        0        20
The Setting
recursive spectral graph partitioning

 A = load_adjacency_matrix;
 L = speye(sum(A,2)) - A;
 [V,D] = eigs(L,2,’SA’);
 f = V(:,2); 
 A1 = A(f=0,f=0); A2 = A(f0, f0);*
The Setting
recursive spectral graph partitioning

 A = load_adjacency_matrix;
 L = speye(sum(A,2)) - A;
 [V,D] = eigs(L,2,’SA’);
 f = V(:,2); 
 A1 = A(f=0,f=0); A2 = A(f0, f0);*

                *Warning Can do much better than this split!
The Problem
disconnected components
The Problem
disconnected components

 C = components(A);
??? Undefined function or method
’components' for input arguments of type
'double’.
The Problem
disconnected components
 *Warningthis isn’t a
                            speaking,
                                        Strictly



                           problem. However, it’s
                            inefficient to solve
                            larger eigenproblems
 C = components(A);
      than required.

??? Undefined function or method
’components' for input arguments of type
'double’.
The Rescue
disconnected components

MESHPART toolkit by 
John Gilbert and Sheng-hua Teng

 C = components(A); 

Uses Matlab’s dmperm function
The Failed Rescue
disconnected components

 C = components(A); 

caused Matlab to randomly crash

I wanted a fast max-flow routine too
Matlab and the Boost graph library
MatlabBGL
The Recoup
working recursive spectral partitioning
code using Boost graph library in C++
including a max-flow heuristic extension

Boost graph library has a components
function and many other graph
algorithms

Boost has a “generic” graph data-type
The Idea


add graph algorithms to Matlab
naturally using Boost graph library
The Plan

graph data type 
= Matlab sparse matrix

results
= “natural” Matlab types
The Plan
 A = load_adjacency_matrix
 d = bfs(A,1); 
 d = dijkstra(A,size(A,1));
 T = mst(A);
 c = components(A);
 F = maxflow(A,s,t);
 test_dag(A)
 [flag,K] = test_planarity(A);
The Plan


suitable for large problems 
= 10 million edges circa 2006
= avoid copying data
The Catch
Boost graph type
   Matlab sparse type

                   compressed sparse column

                   
vertices(G)
        1:n
edges(G)
           [i,j,w] = find(A);
num_vertices(G)
    size(A,1)
out_edges(G,v)
     [~,j,w] = find(A(v,:))
adjacenct(G,v)
     [~,j] = find(A(v,:))
graph as an in-edge list.
   We briey illustrate compressed row and column storage schemes in g-
ure ..

             2         12    4
                                                Compressed sparse row
       16                             20
                                                 rp  1 3 5 7 9 11 11
 1          10 4       9     7              6
       13                             4         ci   2 3 3 4 2 5 3 6 4 6 
             3         14    5                  ai   16 13 10 12 4 14 9 20 7 4

0                                          0  Compressed sparse column
     16     13          0        0           
0                                         0  cp
     0      10         12       0            
0                                         0
                                                     1 1 3 6 8 9 11
     4       0          0       14           
                                             
0                                         20
     0       9          0        0           
0                                         4  ri 1 3 1 2 4 2 5 3 4 5 
     0       0         7        0            
0                                         0  ai 16 4 13 10 9 12 7 14 20 4
     0       0         0        0            


     Most graph algorithms are designed to work with out-edge lists instead of
The Compromise



make a transpose when its required
but let “smart” users by-pass it
BGL is largely irrelevant to MatlabBGL. ere is no need for the copy_graph
   The Details
function from Boost, for example.
   Next, gure . shows the high level architecture of MatlabBGL. ere



    dfs                                  dfs
    bfs                                  bfs
                   Sparse Matrix                            CSR Graph
    mst                                  primmst
          M code                                   extern c code
          mex code                                 c++ code

                    CSR Graph                                 Boost




                      Matlab                                  libmbgl




are four main components: m-les, mex-les, libmbgl, and BGL functions.
MatlabBGL – Version 1.0
Released April 2006 on 
Matlab File exchange



July ‘06 v2.0 added visitors
April ‘07 v2.1 64-bit Matlab
April ‘08 v3.0 performance improvement
Oct ‘08 v4.0 planarity testing, layout,
structural zeros



Jan ‘12 v5.0 update forthcoming?
Impact
Downloaded over 20,000 times

Used in over 10 publications by others!
including a PNAS article on brain topology

Identified numerous bugs in the 
Boost graph library
Impact
Network Partitioning



… and now for a demo …
BGL is largely irrelevant to MatlabBGL. ere is no need for the copy_graph
function from Boost, for example.
   Next, gure . shows the high level architecture of MatlabBGL. ere
  The Devil of the Details
    dfs                                   dfs
    bfs                                   bfs
                   Sparse Matrix                             CSR Graph
    mst                                   primmst
          M code                                    extern c code
          mex code                                  c++ code

                    CSR Graph                                  Boost




                      Matlab                                   libmbgl


          Compile mex files on                 Compile libmbgl on 
          OSX/Linux/Win in                    OSX/Linux/Win in 
are four main components: m-les, mex-les, libmbgl, and 64-bit functions.
          32-bit and 64-bit mode
              32-bit and BGL mode
Let’s illustrate a typical call to a MatlabBGL function: dfs for a depth-rst
search through the graph.
The Devil of the Details
Hard to keep up with changes in Matlab

Hard for users to compile themselves
(changes in Boost and changes in Matlab)

Hard to play around with new algorithms 

Mathworks graph library in
bioinformatics toolbox
graph algorithms in matlab code
gaimc
A vision
function n=my1norm(x)
n = 0; for i=1:numel(x), n=n+abs(x(i)); end

 x = randn(1e7,1); 
 tic, n1=my1norm(x); toc
                               Note
Elapsed time is 0.16 seconds
 R2007b on 64-bit linux
 tic, n1 = norm(x,1); toc;
Elapsed time is 0.32 seconds
A vision
function n=my1norm(x)
n = 0; for i=1:numel(x), n=n+abs(x(i)); end

 x = randn(1e7,1); 
 tic, n1=my1norm(x); toc
                               Note
Elapsed time is 0.16 seconds
 R2007b on 64-bit linux
 tic, n1 = norm(x,1); toc;
Elapsed time is 0.32 seconds
A vision
function n=my1norm(x)
n = 0; for i=1:numel(x), n=n+abs(x(i)); end

 x = randn(1e7,1); 
 tic, n1=my1norm(x); toc
                               Note
Elapsed time is 0.15 seconds
 R2011a on 64-bit osx
 tic, n1 = norm(x,1); toc;
Elapsed time is 0.1 seconds
Quite impressed


get within spitting distance of vectorized
performance using Matlab for loops

even faster than some things in python
Another idea

implement graph algorithms in pure
Matlab code

should only be “somewhat” slower

much more portable
More problems

function calls make things REALLY slow
(unless the function is built-in, e.g. abs)


mst and dijkstra need a heap, 
a heap in Matlab?
Problem specifics
function n=my1normfunc(x)
n = 0;for i=1:numel(x),n=n+abs1(x(i)); end
function a=myabs(a), if a0, a=-a; end
  
 tic, n1=mynorm1(x); toc
                               Note
Elapsed time is 0.15 seconds
 R2011a on 64-bit osx
 tic, n1 = my1normfunc(x,1); toc;
Elapsed time is 3.16 seconds
tation of a heap.
ion is inspired by Kahaner []. From a      
                                                   More generally speaking, algorithms
ap is a binary tree where smaller elements are   written in Fortran  are excellent can-
      A heap in Matlab code
upports the following operations:
                                                 didates for the Matlab just-in-time
                                                 compiler.

 nt to the heap;
       
 ement from the heap with the smallest                              e array
                                                           5    6    7   1     9   6
       Old reference
 lue of an element in the heap.                       corresponds to the following tree:

       D. K. Kahaner 
 s (or vectors), and a common way to store a
                                                                           5

ociate Algorithm 561:  a le child
        the tree node of index j with
  index 2 j + 1. See gure . for an example.
       Fortran implementation
Matlab heap will consist of four arrays and one                     8              7
       of heap programs. 
       ACM TOMS 1980
  tores the identiers of the items in the heap.
                                                            1              9       6
 the element in tree node i and T(1) is the id
 t of the heap tree.                                  Figure 6.3 – Binary trees as arrays.

 tores ids of elements in D so that D(T(i)) is
Graph access, take 1
Simple, efficient neighbor access

At = A’;
[v,~,w] = find(At(:,u));
Graph access, take 2
Complicated neighbor access

[i,j,w] = find(A);
[ai,aj,a] = indexed2csr(i,j,w,size(A,1))

v = aj(ai(u):ai(u+1));
Graph access
bfs, take 1 
 
At=A’; for w=find(A(:,v))
 tic, d=bfs(A,1), toc
Elapsed time 0.05 seconds

bfs, take 2 
 

  
indexed2csr(A); for ci=rp(v):rp(v+1) …
 tic, d=bfs(A,1), toc
Elapsed time, 0.007 seconds
Graph access
bfs, take 1 
 
At=A’; for w=find(A(:,v))
 tic, d=bfs(A,1), toc
Elapsed time 0.05 seconds

bfs, take 2 
 

  
indexed2csr(A); for ci=rp(v):rp(v+1) …
 tic, d=bfs(A,1), toc
Elapsed time, 0.007 seconds
gaimc
convert input to CSR arrays
run graph algorithms on CSR arrays

bfs, clustering coeffients, core numbers,
cosine knn, dfs, dijkstra, floyd warshall,
mst, strong components

bipartite_matching (Thanks to Ying Wang)
nstances of a random symmetric graph with average degree  and
0, and 10000 vertices. e aggregated results of all these tests are sh
gure ..    The pudding
             function s=mysumsq(x)
             14
                     Standard
             12 = 0; Fast
             s        for i=1:numel(x), s = s + x(i)^2; end
             
             10


              x = randn(1e7,1); 
  Slowdown




              8


              tic, s1 = mysumsq(x); toc;
              6


             
4
              tic, s2 = x’*x; toc
              2

             0
                    dfs   scomponents   dijkstra dirclustercoeffs mst_prim clustercoeffs


 6.4 – Performance of the gaimc library. An experimental comparison of the perform
nstances of a random symmetric graph with average degree  and
0, and 10000 vertices. e aggregated results of all these tests are sh
gure ..   The pudding changes
         function s=mysumsq(x)
        35
         14
                Standard
                 Standard
                Fast
         12 = 0; Fast
         s
        30
                  for i=1:numel(x), s = s + x(i)^2; end
         
        25
         10


          x = randn(1e7,1); 
 Slowdown
 Slowdown




        208


          tic, s1 = mysumsq(x); toc;
        156


         
        104


          tic, s2 = x’*x; toc
         52

            00
                 dfs
                  dfs   scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs
                         scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs


 6.4 – Performance of the gaimc library. An experimental comparison of the perform
Afterward
“putting the graph into Matlab”

Matlab could just as easily have been
called “Graphlab” with a few extra
functions

It’s a great environment to play with
graphs as matrices

More Related Content

Viewers also liked

Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignment
David Gleich
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
David Gleich
 

Viewers also liked (20)

The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulants
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignment
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspective
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architectures
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architectures
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 

Similar to Graph libraries in Matlab: MatlabBGL and gaimc

Day 15 graphing lines stations
Day 15 graphing lines stationsDay 15 graphing lines stations
Day 15 graphing lines stations
Erik Tjersland
 
computer notes - Traversal of a binary tree
computer notes - Traversal of a binary treecomputer notes - Traversal of a binary tree
computer notes - Traversal of a binary tree
ecomputernotes
 

Similar to Graph libraries in Matlab: MatlabBGL and gaimc (10)

tracing a recursive factorial function in assembly language
tracing a recursive factorial function in assembly languagetracing a recursive factorial function in assembly language
tracing a recursive factorial function in assembly language
 
Embedded System Microcontroller Interactive Course using BASCOM-AVR - Lecture3
Embedded System Microcontroller Interactive Course using BASCOM-AVR - Lecture3Embedded System Microcontroller Interactive Course using BASCOM-AVR - Lecture3
Embedded System Microcontroller Interactive Course using BASCOM-AVR - Lecture3
 
Day 15 graphing lines stations
Day 15 graphing lines stationsDay 15 graphing lines stations
Day 15 graphing lines stations
 
Greedy embedding problem
Greedy embedding problemGreedy embedding problem
Greedy embedding problem
 
Introduction to MATLAB Programming and Numerical Methods for Engineers 1st Ed...
Introduction to MATLAB Programming and Numerical Methods for Engineers 1st Ed...Introduction to MATLAB Programming and Numerical Methods for Engineers 1st Ed...
Introduction to MATLAB Programming and Numerical Methods for Engineers 1st Ed...
 
computer notes - Traversal of a binary tree
computer notes - Traversal of a binary treecomputer notes - Traversal of a binary tree
computer notes - Traversal of a binary tree
 
Hook's law experiment (instructor)
Hook's law experiment (instructor)Hook's law experiment (instructor)
Hook's law experiment (instructor)
 
LEC 8-DS ALGO(heaps).pdf
LEC 8-DS  ALGO(heaps).pdfLEC 8-DS  ALGO(heaps).pdf
LEC 8-DS ALGO(heaps).pdf
 
MATH: SIMILARITY
MATH: SIMILARITY MATH: SIMILARITY
MATH: SIMILARITY
 
Control chart example
Control chart exampleControl chart example
Control chart example
 

More from David Gleich

More from David Gleich (9)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
Matrix methods for Hadoop
Matrix methods for HadoopMatrix methods for Hadoop
Matrix methods for Hadoop
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Graph libraries in Matlab: MatlabBGL and gaimc

  • 1. A tale of two Matlab libraries ! for graph algorithms! MatlabBGL and gaimc David F. Gleich Purdue University
  • 2. The Setting recursive spectral graph partitioning
  • 3. To store an m×n sparse matrix M, Matlab uses compressed column format Compr 2 12 4 The Setting [Gilbert et al., ]. Matlab never stores a 0 value in a sparse matrix. It always 16 20 rp “re-compresses” the data structure in these cases. If M is the adjacency matrix 1 10 corresponds to storing the of a graph, then storing the matrix by columns 4 9 7 6 graph as an in-edge list. recursive spectral graph partitioning 13 4 ci We briey illustrate compressed row and column storage schemes in g- 3 14 5 ai ure .. 1 2 3 4 5 6 1 0 16 13 Compressed sparse row0 0 0 Compr 0 rp2 1 3 0 5 10 9 11 0 0 cp 2 12 4 7 12 11 0 3 0 4 16 20 0 0 14 4 0 0 20 1 10 4 9 7 6 2 3 3 9 20 5 0 6 6 13 4 ci 0 0 0 7 0 5 4 3 4 ri 4 ai 13 10 12 4 14 9 16 20 4 7 6 0 0 0 ai 3 14 5 0 0 0 0 Compressed sparse column 0 16 13 0 0 0 0 cp 0 10 12 0 Most graph algorithms are designe 0 0 1 1 3 6 8 9 11 4 0 0 14 in-edge lists. Before running an algo 0 0 9 0 0 20
  • 4. The Setting recursive spectral graph partitioning A = load_adjacency_matrix; L = speye(sum(A,2)) - A; [V,D] = eigs(L,2,’SA’); f = V(:,2); A1 = A(f=0,f=0); A2 = A(f0, f0);*
  • 5. The Setting recursive spectral graph partitioning A = load_adjacency_matrix; L = speye(sum(A,2)) - A; [V,D] = eigs(L,2,’SA’); f = V(:,2); A1 = A(f=0,f=0); A2 = A(f0, f0);* *Warning Can do much better than this split!
  • 7. The Problem disconnected components C = components(A); ??? Undefined function or method ’components' for input arguments of type 'double’.
  • 8. The Problem disconnected components *Warningthis isn’t a speaking, Strictly problem. However, it’s inefficient to solve larger eigenproblems C = components(A); than required. ??? Undefined function or method ’components' for input arguments of type 'double’.
  • 9. The Rescue disconnected components MESHPART toolkit by John Gilbert and Sheng-hua Teng C = components(A); Uses Matlab’s dmperm function
  • 10. The Failed Rescue disconnected components C = components(A); caused Matlab to randomly crash I wanted a fast max-flow routine too
  • 11. Matlab and the Boost graph library MatlabBGL
  • 12. The Recoup working recursive spectral partitioning code using Boost graph library in C++ including a max-flow heuristic extension Boost graph library has a components function and many other graph algorithms Boost has a “generic” graph data-type
  • 13. The Idea add graph algorithms to Matlab naturally using Boost graph library
  • 14. The Plan graph data type = Matlab sparse matrix results = “natural” Matlab types
  • 15. The Plan A = load_adjacency_matrix d = bfs(A,1); d = dijkstra(A,size(A,1)); T = mst(A); c = components(A); F = maxflow(A,s,t); test_dag(A) [flag,K] = test_planarity(A);
  • 16. The Plan suitable for large problems = 10 million edges circa 2006 = avoid copying data
  • 17. The Catch Boost graph type Matlab sparse type compressed sparse column vertices(G) 1:n edges(G) [i,j,w] = find(A); num_vertices(G) size(A,1) out_edges(G,v) [~,j,w] = find(A(v,:)) adjacenct(G,v) [~,j] = find(A(v,:))
  • 18. graph as an in-edge list. We briey illustrate compressed row and column storage schemes in g- ure .. 2 12 4 Compressed sparse row 16 20 rp 1 3 5 7 9 11 11 1 10 4 9 7 6 13 4 ci 2 3 3 4 2 5 3 6 4 6 3 14 5 ai 16 13 10 12 4 14 9 20 7 4 0 0 Compressed sparse column 16 13 0 0 0 0 cp 0 10 12 0 0 0 1 1 3 6 8 9 11 4 0 0 14 0 20 0 9 0 0 0 4 ri 1 3 1 2 4 2 5 3 4 5 0 0 7 0 0 0 ai 16 4 13 10 9 12 7 14 20 4 0 0 0 0 Most graph algorithms are designed to work with out-edge lists instead of
  • 19. The Compromise make a transpose when its required but let “smart” users by-pass it
  • 20. BGL is largely irrelevant to MatlabBGL. ere is no need for the copy_graph The Details function from Boost, for example. Next, gure . shows the high level architecture of MatlabBGL. ere dfs dfs bfs bfs Sparse Matrix CSR Graph mst primmst M code extern c code mex code c++ code CSR Graph Boost Matlab libmbgl are four main components: m-les, mex-les, libmbgl, and BGL functions.
  • 21. MatlabBGL – Version 1.0 Released April 2006 on Matlab File exchange July ‘06 v2.0 added visitors April ‘07 v2.1 64-bit Matlab April ‘08 v3.0 performance improvement Oct ‘08 v4.0 planarity testing, layout, structural zeros Jan ‘12 v5.0 update forthcoming?
  • 22. Impact Downloaded over 20,000 times Used in over 10 publications by others! including a PNAS article on brain topology Identified numerous bugs in the Boost graph library
  • 24. Network Partitioning … and now for a demo …
  • 25. BGL is largely irrelevant to MatlabBGL. ere is no need for the copy_graph function from Boost, for example. Next, gure . shows the high level architecture of MatlabBGL. ere The Devil of the Details dfs dfs bfs bfs Sparse Matrix CSR Graph mst primmst M code extern c code mex code c++ code CSR Graph Boost Matlab libmbgl Compile mex files on Compile libmbgl on OSX/Linux/Win in OSX/Linux/Win in are four main components: m-les, mex-les, libmbgl, and 64-bit functions. 32-bit and 64-bit mode 32-bit and BGL mode Let’s illustrate a typical call to a MatlabBGL function: dfs for a depth-rst search through the graph.
  • 26. The Devil of the Details Hard to keep up with changes in Matlab Hard for users to compile themselves (changes in Boost and changes in Matlab) Hard to play around with new algorithms Mathworks graph library in bioinformatics toolbox
  • 27. graph algorithms in matlab code gaimc
  • 28. A vision function n=my1norm(x) n = 0; for i=1:numel(x), n=n+abs(x(i)); end x = randn(1e7,1); tic, n1=my1norm(x); toc Note Elapsed time is 0.16 seconds R2007b on 64-bit linux tic, n1 = norm(x,1); toc; Elapsed time is 0.32 seconds
  • 29. A vision function n=my1norm(x) n = 0; for i=1:numel(x), n=n+abs(x(i)); end x = randn(1e7,1); tic, n1=my1norm(x); toc Note Elapsed time is 0.16 seconds R2007b on 64-bit linux tic, n1 = norm(x,1); toc; Elapsed time is 0.32 seconds
  • 30. A vision function n=my1norm(x) n = 0; for i=1:numel(x), n=n+abs(x(i)); end x = randn(1e7,1); tic, n1=my1norm(x); toc Note Elapsed time is 0.15 seconds R2011a on 64-bit osx tic, n1 = norm(x,1); toc; Elapsed time is 0.1 seconds
  • 31. Quite impressed get within spitting distance of vectorized performance using Matlab for loops even faster than some things in python
  • 32. Another idea implement graph algorithms in pure Matlab code should only be “somewhat” slower much more portable
  • 33. More problems function calls make things REALLY slow (unless the function is built-in, e.g. abs) mst and dijkstra need a heap, a heap in Matlab?
  • 34. Problem specifics function n=my1normfunc(x) n = 0;for i=1:numel(x),n=n+abs1(x(i)); end function a=myabs(a), if a0, a=-a; end tic, n1=mynorm1(x); toc Note Elapsed time is 0.15 seconds R2011a on 64-bit osx tic, n1 = my1normfunc(x,1); toc; Elapsed time is 3.16 seconds
  • 35. tation of a heap. ion is inspired by Kahaner []. From a More generally speaking, algorithms ap is a binary tree where smaller elements are written in Fortran are excellent can- A heap in Matlab code upports the following operations: didates for the Matlab just-in-time compiler. nt to the heap; ement from the heap with the smallest e array 5 6 7 1 9 6 Old reference lue of an element in the heap. corresponds to the following tree: D. K. Kahaner s (or vectors), and a common way to store a 5 ociate Algorithm 561: a le child the tree node of index j with index 2 j + 1. See gure . for an example. Fortran implementation Matlab heap will consist of four arrays and one 8 7 of heap programs. ACM TOMS 1980 tores the identiers of the items in the heap. 1 9 6 the element in tree node i and T(1) is the id t of the heap tree. Figure 6.3 – Binary trees as arrays. tores ids of elements in D so that D(T(i)) is
  • 36. Graph access, take 1 Simple, efficient neighbor access At = A’; [v,~,w] = find(At(:,u));
  • 37. Graph access, take 2 Complicated neighbor access [i,j,w] = find(A); [ai,aj,a] = indexed2csr(i,j,w,size(A,1)) v = aj(ai(u):ai(u+1));
  • 38. Graph access bfs, take 1 At=A’; for w=find(A(:,v)) tic, d=bfs(A,1), toc Elapsed time 0.05 seconds bfs, take 2 indexed2csr(A); for ci=rp(v):rp(v+1) … tic, d=bfs(A,1), toc Elapsed time, 0.007 seconds
  • 39. Graph access bfs, take 1 At=A’; for w=find(A(:,v)) tic, d=bfs(A,1), toc Elapsed time 0.05 seconds bfs, take 2 indexed2csr(A); for ci=rp(v):rp(v+1) … tic, d=bfs(A,1), toc Elapsed time, 0.007 seconds
  • 40. gaimc convert input to CSR arrays run graph algorithms on CSR arrays bfs, clustering coeffients, core numbers, cosine knn, dfs, dijkstra, floyd warshall, mst, strong components bipartite_matching (Thanks to Ying Wang)
  • 41. nstances of a random symmetric graph with average degree and 0, and 10000 vertices. e aggregated results of all these tests are sh gure .. The pudding function s=mysumsq(x) 14 Standard 12 = 0; Fast s for i=1:numel(x), s = s + x(i)^2; end 10 x = randn(1e7,1); Slowdown 8 tic, s1 = mysumsq(x); toc; 6 4 tic, s2 = x’*x; toc 2 0 dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs 6.4 – Performance of the gaimc library. An experimental comparison of the perform
  • 42. nstances of a random symmetric graph with average degree and 0, and 10000 vertices. e aggregated results of all these tests are sh gure .. The pudding changes function s=mysumsq(x) 35 14 Standard Standard Fast 12 = 0; Fast s 30 for i=1:numel(x), s = s + x(i)^2; end 25 10 x = randn(1e7,1); Slowdown Slowdown 208 tic, s1 = mysumsq(x); toc; 156 104 tic, s2 = x’*x; toc 52 00 dfs dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs 6.4 – Performance of the gaimc library. An experimental comparison of the perform
  • 43. Afterward “putting the graph into Matlab” Matlab could just as easily have been called “Graphlab” with a few extra functions It’s a great environment to play with graphs as matrices