SlideShare une entreprise Scribd logo
1  sur  47
Télécharger pour lire hors ligne
Fast Convolutional Neural Networks !
for Graph-Structured Data!
Xavier Bresson!
!"#$%&'(&%))*+' ,'
Swiss Federal Institute of Technology (EPFL) !
Joint work with Michaël De"errard (EPFL) !
and Pierre Vandergheynst (EPFL) !
!
Organizers: Xue-Cheng Tai, Egil Bae, !
Marius Lysaker, Alexander Vasiliev !!
Aug 29th 2016!
Nanyang Technological University (NTU) from Dec. 2016 !
Data Science and Artificial Neural Networks !
•! Data Science is an interdisciplinary field that aims at transforming raw data
into meaningful knowledge to provide smart decisions for real-world problems:!
!"#$%&'(&%))*+' -'
•! In the news:!
A Brief History of Artificial Neural Networks !
!"#$%&'(&%))*+' .'
1958'
Perceptron!
Rosenblatt!
1982'
Backprop !
Werbos!
1959'
Visual primary cortex!
Hubel-Wiesel!
1987'
Neocognitron!
Fukushima!
SVM/Kernel techniques!
Vapnik!
1995'
1997'1998'
1999'1999 2006' 2012' 2014' 2015'
Data scientist!
1st Job in US!
Facebook Center!
OpenAI Center!
Deep Learning!
Breakthough !
Hinton!
Auto-encoder!
LeCun, Hinton, Bengio!
First NVIDIA !
GPU!
First Amazon!
Cloud Center!
2012
RNN!
Schmidhuber!
1997
CNN!
LeCun!
1997 19981997
1999
First NVIDIA
AI “Winter” [1960’s-2012]!AI Hope! AI Resurgence!
Hardware!
GPU speed doubles/ year!
Kernel techniques!
Handcrafted Features!
Graphical models!
4th industrial revolution?!
Digital Intelligence !
Revolution!
or new AI bubble?!
Big Data!
Volume doubles/1.5 year!
Google AI !
TensorFlow!
Facebook AI!
Torch!
1962'1962
Birth of!
Data Science!
Split from Statistics!
Tukey!
Perceptron
Rosenblatt
1962
First !
NIPS!
1989'
First !
KDD!
19891989
First
KDD
2010'
Kaggle!
Platform!
What happened in 2012?!
!"#$%&'(&%))*+' /'
•! ImageNet [Fei Fei-et.al.’09]: International Image Classification Challenge !
1,000 object classes and 1,431,167 images!
Error decreased by 2!!
•! Before 2012: Handcrafted filters + SVM classification!
!
!
!
!
!
After 2012: Learned filters !
with Neural Networks!
SIFT !
Lowe, 1999!
(most cited paper in CV)'
Histogram of Gradients (HoG)!
Dalal & Triggs, 2005!
Handcrafted filters + SVM classificationHandcrafted filters + SVM classificationHandcrafted filters + SVM classificationHandcrafted filters + SVM classification
[L. Fei-Fei-et.al.’09]'
[Russakovsky-et.al.’14]'
[Fei-Fei Li-et.al.’15]'
[T.Q.Hoan’13]'
[Karpathy-Johnson’16]'
Outline!
Ø  Data Science and ANN!
Ø  Convolutional Neural Networks !
- Why CNNs are good? !
- Local Stationarity and Multiscale Hierarchical Features!
!
Ø  Convolutional Neural Networks on Graphs !
- CNNs Only Process Euclidean-Structured Data!
- CNNs for Graph-Structured Data !
- Graph Spectral Theory for Convolution on Graphs!
- Balanced Cut Model for Graph Coarsening!
- Fast Graph Pooling!
!
Ø  Numerical Experiments!
!
Ø  Conclusion!
Xavier	Bresson	 5
Outline!
! Data Science
!! Convolutional Neural Networks !
- Why CNNs are good? !
- Local Stationarity and Multiscale Hierarchical Features!
!
! Convolutional Neural Networks on Graphs
- CNNs Only Process Euclidean-Structured Data
- CNNs for Graph-Structured Data
- Graph Spectral Theory for Convolution on Graphs
- Balanced Cut Model for Graph Coarsening
- Fast Graph Pooling
! Numerical Experiments
! Conclusion
!"#$%&'(&%))*+' 1'
Convolutional Neural Networks !
•! CNNs [LeCun-et.al.’98] are very successful for Computer Vision tasks:!
!! Image/object recognition [Krizhevsky-Sutskever-Hinton’12] !
!! Image captioning [Karpathy-FeiFei’15]!
!! Image inpainting [Pathak-Efros-etal’16]!
!! Etc!
!"#$%&'(&%))*+' 2'
•! CNNs are used by several (big) IT companies:!
!! Facebook (Torch software)!
!! Google (TensorFlow software, Google Brain, Deepmind)!
!! Microsoft !
!! Tesla (AI Open)!
!! Amazon (DSSTNE software)!
!! Apple!
!! IBM!
Amazon (DSSTNE software)
Why CNNs are Good?!
•  CNNs are extremely efficient at extracting meaningful statistical patterns in
large-scale and high-dimensional datasets.!
Xavier	Bresson	 8	
•  Key idea: Learn local stationary structures and compose them to form
multiscale hierarchical patterns.!
•  Why? It is open (math) question to prove the efficiency of CNNs.!
Note: Despite the lack of theory, the entire ML and CV communities have
shifted to deep learning techniques! Ex: NIPS’16: 2326 submissions, 328 DL
(14%), Convex Optimization 90 (3.8%). !
Local Stationarity!
•! Assumption: Data are locally stationary " similar local patches are shared
across the data domain:!
!"#$%&'(&%))*+' 4'
•! How to extract local stationary patterns? !
Convolutional filters (filters with !
compact support)!
x ⇤ F3
x ⇤ F2
x ⇤ F1"#$%#&'(#$)&!
*+&,-./!
F1
F2
F3
x
Multiscale Hierarchical Features!
•! Assumption: Local stationary patterns can be composed to form more abstract
complex patterns:!
!"#$%&'(&%))*+' ,5'
•! How to extract multiscale hierarchical patterns? !
Downsampling of data domain (s.a. image grid) with Pooling (s.a. max, average).!
•! Other advantage: Reduce computational complexity while increasing #filters.!
2x2 max!
pooling'
2x2 max !
pooling'
Layer 1' Layer 2' Layer 3' Layer 4'Layer 1 Layer 2 Layer 3 Layer 4
Deep/hierarchical !
Features (simple to abstract)'
…'
[Karpathy-!
Johnson’16]'
Classification Function!
•  Classifier: After extracting multiscale locally stationary features, use them to
design a classification function with the training labels.!
Xavier	Bresson	 11	
•  How to design a (linear) classifier? !
Fully connected neural networks.!
Output	signal	
Class	labels	
Class	1	
Class	2	
Class	K	
Features	
xclass = Wxfeat
Full Architecture of CNNs!
!"#$%&'(&%))*+' ,-'
7-8'!)9(%)(#$!
:!
;.+<!<#=$/)>1&+$2!
:!
?##&+$2!!
:!
?##&+$2!
"#$%#&'(#$)&!
*+&,-./!
@'
A$1',!/+2$)&! "#$%#&'(#$)&!&)B-./!
C-D,.)9,!&#9)&!/,)(#$).B!E-),'.-/!)$<!9#>1#/-!,F->!!
%+)!<#=$/)>1&+$2!)$<!1##&+$2G!
*'&&B!9#$$-9,-<!&)B-./!
C"&)//+H9)(#$!E'$9(#$G!
0',1',!/+2$)&!
"&)//!&)3-&/!
y 2 Rnc
F1
F2
F3
x ⇤ F3
x ⇤ F2
x ⇤ F1
x
xl=0
= x xl=1
xl=1
convxl=0
conv xl
IJ2J!+>)2-!
CNNs Only Process Euclidean-Structured Data!
•! CNNs are designed for Data lying on Euclidean spaces:!
(1) Convolution on Euclidean grids (FFT)!
(2) Downsampling on Euclidean grids!
(3) Pooling on Euclidean grids!
Everything mathematically well defined and computationally fast!!
!"#$%&'(&%))*+' ,.'
•! But not all data lie on Euclidean grids!!
Images (2D, 3D) !
videos (2+1D)'
Sound (1D)'
Non-Euclidean Data!
!"#$%&'(&%))*+' ,/'
•! Examples of irregular/graph-structured data: !
(1) Social networks (Facebook, Twitter)!
(2) Biological networks (genes, brain connectivity)!
(3) Communication networks (wireless, tra"c)!
•! Main challenges: !
(1) How to define convolution, downsampling and pooling on graphs?!
(2) And how to make them numerically fast?!
•! Current solution: Map graph-structured data to regular/Euclidean grids with
e.g. kernel methods and apply standard CNNs. !
Limitation: Handcrafting the mapping is (to my opinion) against CNN principle! !
;.)1FK$-,=#.LM!
!/,.'9,'.-<!<),)!!
N#9+)&!$-,=#.L/! O.)+$!/,.'9,'.-!
P'
Q-&-9#>>'$+9)(#$!
$-,=#.L/!
Outline!
! Data Science and ANN
! Convolutional Neural Networks
- Why CNNs are good?
- Local Stationarity and Multiscale Hierarchical FeaturesStationarity and Multiscale Hierarchical FeaturesStationarity
!
!! Convolutional Neural Networks on Graphs !
- CNNs Only Process Euclidean-Structured Data
- CNNs for Graph-Structured Data !
- Graph Spectral Theory for Convolution on Graphs!
- Balanced Cut Model for Graph Coarsening!
- Fast Graph Pooling!
!
! Numerical Experiments
! Conclusion
!"#$%&'(&%))*+' ,0'
CNNs for Graph-Structured Data!
•  Our contribution: Generalizing CNNs to any graph-structured data with
same computational complexity as standard CNNs!!
Xavier	Bresson	 16	
•  What tools for this generalization?!
(1) Graph spectral theory for convolution on graphs,!
(2) Balanced cut model for graph coarsening,!
(3) Graph pooling with binary tree structure of coarsened graphs.!
Related Works!
!"#$%&'(&%))*+' ,2'
•! Categories of graph CNNs: !
(1) Spatial approach!
(2) Spectral (Fourier) approach !
•! Spatial approach: !
!! Local reception fields [Coates-Ng’11, Gregor-LeCun’10]:!
Find compact groups of similar features, but no defined convolution.!
!! Locally Connected Networks [Bruna-Zaremba-Szlam-LeCun’13]:!
Exploit multiresolution structure of graphs, but no defined convolution.!
!! ShapeNet [Bronstein-et.al.’15’16]:!
Generalization of CNNs to 3D-meshes. Convolution well-defined in these !
smooth low-dimensional non-Euclidean spaces. Handle multiple graphs. !
Obtained state-of-the-art results for 3D shape recognition.!
•! Spectral approach: !
!! Deep Spectral Networks [Hena#-Bruna-LeCun’15]:!
Related to this work. We will compare later.!
Outline!
! Data Science and ANN
! Convolutional Neural Networks
- Why CNNs are good?
- Local Stationarity and Multiscale Hierarchical FeaturesStationarity and Multiscale Hierarchical FeaturesStationarity
!
!! Convolutional Neural Networks on Graphs !
- CNNs Only Process Euclidean-Structured Data
- CNNs for Graph-Structured Data
- Graph Spectral Theory for Convolution on Graphs!
- Balanced Cut Model for Graph Coarsening
- Fast Graph Pooling
!
! Numerical Experiments
! Conclusion
!"#$%&'(&%))*+' ,3'
Convolution on Graphs 1/3!
•! Graphs: G=(V,E,W), with V set of vertices, E set of edges, !
W similarity matrix, and |V|=n.!
!"#$%&'(&%))*+' ,4'
i j
i 2 V j 2 V
Wij = 0.9
eij 2 E
ii
2 V
jjjjj
j
W
G
•! Graph Laplacian (core operator to spectral graph theory [1]): !
2nd order derivative operator on graphs!
⇢
L = D W normalized
L = In D 1/2
WD 1/2
unnormalized
[1] Chung, 1997'
F 1
G
ˆf = U ˆf = UUT
f = f,
Convolution on Graphs 2/3!
•  Fourier transform on graphs [2]: L symmetric positive semidefinite matrix !
⇒ It has a set of orthonormal eigenvectors {ul} known as graph Fourier modes,
associated to nonnegative eigenvalues {λl} known as the graph frequencies.!
Xavier	Bresson	 20	
FGf = ˆf = UT
f 2 Rn
,
The Graph Fourier Transform of f 2 Rn
is
which value at frequency λl is:	
ˆf( l) = ˆfl := hf, uli =
n 1X
i=0
f(i)ul(i)
The inverse GFT is defined as:	
which value at vertex i is:	
(U ˆf)(i) =
n 1X
l=0
ˆflul(i).
[2] Hammond, Vandergheynst, Gribonval, 2011
Convolution on Graphs 3/3!
•  Convolution on graphs (in the Fourier domain) [2]:!
Xavier	Bresson	 21	
f ⇤G g = F 1
G FGf FGg 2 Rn
,
which value at vertex i is:	
(f ⇤G g)(i) =
n 1X
l=0
ˆflˆglul(i)
It is also convenient to see that:	
f ⇤G g = ˆg(L)f,
as	
f ⇤G g = U (UT
f) (UT
g) = U
2
6
4
ˆg( 0)
...
ˆg( n 1)
3
7
5 UT
f
= Uˆg(⇤)UT
f = ˆg(L)f[2] Hammond, Vandergheynst, Gribonval, 2011
Translation on Graphs 1/2!
•  Translation on graphs: A signal f defined on the graph can be translated to
any vertex i using the graph convolution operator [3]:!
Xavier	Bresson	 22	
Tif := f ⇤G i,
where Tif is the graph translation operator with vertex shift i. !
Function f, translated at vertex i, has the following value at vertex j:	
This formula is the graph counterpart of the continuous translation operator:	
(Tsf)(x) = f(x s) = (f ⇤ s)(x) =
Z
R
ˆf(⇠)e 2⇡i⇠s
e2⇡i⇠x
d⇠,
where ˆf(⇠) = hf, e2⇡i⇠x
i, and e2⇡i⇠x
are the eigenfunctions of the continuum
Laplace-Beltrami operator , i.e. the continuum version of the graph Fourier
modes ul.
(Tif)(j) = f(j G i) = (f ⇤G i)(j) =
n 1X
l=0
ˆflul(i)ul(j),
[3] Shuman, Ricaud, Vandergheynst, 2016
Translation on Graphs 2/2!
!"#$%&'(&%))*+' -.'
•! Note: Translation on graphs are easier to carry out with the spectral approach,
than directly in the spatial/graph domain.!
(a) Tsf (b) Ts0 f (c) Ts00 f
(f) Ti00 f(e) Ti0 f(d) Tif
Figure 1: Translated signals in the continuous R2
domain (a-c), and in the graph
domain (d-f). The component of the translated signal at the center vertex is
highlighted in green.
[Shuman-Ricaud-!
Vandergheynst’16]'
ˆg( l) = ˆpK( l) :=
KX
k=0
ak
k
l , (1)
where ˆpK is a Kth
order polynomial function of the Laplacian eigenvalues l.
This class of kernels defines spatially localized filters as proved below:
Theorem 1 Laplacian-based polynomial kernels (1) are K-localized in the sense
that
(TipK)(j) = 0 if dG(i, j) > K, (2)
where dG(i, j) is the discrete geodesic distance on graphs, that is the shortest
path between vertex i and vertex j.
Localized Filters on Graphs 1/3!
•! Laplacian polynomial kernels [2]: We consider a family of spectral
kernels defined as:!
!"#$%&'(&%))*+' -/'
•! Localized convolutional kernels: As standard CNNs, we must !
define localized filters on graphs.!
[2] Hammond, Vandergheynst, Gribonval, 2011'
Localized Filters on Graphs 2/3!
!"#$%&'(&%))*+' -0'
Corollary 1. Consider the function ij such that
ij = TipK (j) = pK ⇤G i (j) = ˆpK(L) i (j) =
KX
k=0
akLk
i (j).
Then ij = 0 if dG(i, j) > K.
Vertex iVertex
Spatial profile of polynomial filter given by ij
BK
i = Support of
polynomial filter at vertex i
Figure 2. Illustration of localized filters on graphs. Laplacian-based polynomial
kernels are exactly localized in a K-ball BK
i centered at vertex i.
Vertex iVertex
BK
i = Support of
polynomial filter at vertex i
Localized Filters on Graphs 3/3!
!"#$%&'(&%))*+' -1'
Corollary 2. Localized filters on graphs are defined according to the principle:
Frequency smoothness , Spatial graph localization
This is obviously the Heisenberg’s uncertainty principle extended to the graph
setting. Recent papers have studied the uncertainty principle on graphs.
Chebyshev polynomials: Let Tk(x) the Chebyshev polynomial of order k gen-
erated by the fundamental recurrence property Tk(x) = 2xTk 1(x) Tk 2(x)
with T0 = 1 and T1 = x. The Chebyshev basis {T0, T1, ..., TK} forms an orthog-
onal basis in [ 1, 1].
Fast Chebyshev Polynomial Kernels 1/2!
•! Graph filtering: Let y be a signal x filtered by a Laplacian-based!
polynomial kernel:!
!"#$%&'(&%))*+' -2'
•! C!
y = x ⇤G pK = ˆpK(L)x =
KX
k=0
akLk
x
The monomial basis {1, x, x2
, x3
, ..., xK
} provides localized spatial filters, but
does not form an orthogonal basis (e.g. h1, xi =
R 1
0
1xdx = x2
2
1
0
= 1
2 ), which
limits its ability to learn good spectral filters.
Figure 3. First six Chebyshev polynomials.
x ⇤ F3
x ⇤ F2
x ⇤ F1"#$%#&'(#$)&!
*+&,-./!
F1
F2
F3
x
Fast Chebyshev Polynomial Kernels 2/2!
•  Graph filtering with Chebyshev [2]: The filtered signal y is defined with the
Chebyshev polynomials is:!
Xavier	Bresson	 28	
•  F!
y = x ⇤G qK =
KX
k=0
✓kTk(L)x,
ˆqK( ) =
KX
k=0
✓kTk( ).
with the Chebyshev spectral kernel:	
Fast filtering: Let denote Xk := Tk(L)x and rewrite y =
PK
k=0 ✓kXk. Then
all {Xk} are generated with the recurrence equation Xk = 2LXk 1 Xk 2.
As L is sparse, all matrix multiplications are done between a sparse matrix
and a vector. The computational complexity is O(EK), and reduces to linear
complexity O(n) for k-NN graphs.
•  GPU parallel implementation: Linear algebra operations can be done in
parallel, allowing a fast GPU implementation of Chebyshev filtering. !
[2] Hammond, Vandergheynst, Gribonval, 2011
Outline!
! Data Science and ANN
! Convolutional Neural Networks
- Why CNNs are good?
- Local Stationarity and Multiscale Hierarchical FeaturesStationarity and Multiscale Hierarchical FeaturesStationarity
!
!! Convolutional Neural Networks on Graphs !
- CNNs Only Process Euclidean-Structured Data
- CNNs for Graph-Structured Data
- Graph Spectral Theory for Convolution on Graphs
- Balanced Cut Model for Graph Coarsening!
- Fast Graph Pooling
!
! Numerical Experiments
! Conclusion
!"#$%&'(&%))*+' -4'
Graph Coarsening!
!"#$%&'(&%))*+' .5'
•! Graph coarsening: As standard CNNs, we must define a grid coarsening
process for graphs. It will be essential for pooling similar features together.!
•! Graph partitioning: Graph coarsening is equivalent to graph clustering, which
is a NP-hard combinatorial problem.!
Gl=0
= G Gl=1
Gl=2
Graph coarsening/
clustering
Graph coarsening/
clustering
Figure 4: Illustration of graph coarsening.'
Graph Partitioning!
!"#$%&'(&%))*+' .,'
•! Balanced Cuts [4]: Two powerful measures of graph clustering are the
Normalized Cut and Normalized Association defined as:!
where Cut(A, B) :=
P
i2A,j2B Wij, Assoc(A) :=
P
i2A,i2B Wij,
Vol(A) :=
P
i2A,j2B di, and di :=
P
j2V Wij is the degree of vertex i.
Figure 5: Equivalence between NCut and NAssoc partitioning.'
min
C1,...,CK
KX
k=1
Cut(Ck, Cc
k)
Vol(Ck)
Equivalence by
complementarity
,
Normalized Cut:
Partitioning by min edge cuts.
Ck Cc
k
max
C1,...,CK
KX
k=1
Assoc(Ck)
Vol(Ck)
Normalized Association:
Partitioning by max vertex matching.
Ck
[4] Shi, Malik, 2000'
Graclus Graph Partitioning!
!"#$%&'(&%))*+' .-'
•! Graclus [5]: It is a greedy (fast) technique that computes clusters that locally
maximize the Normalized Association.!
Matched vertices { i, j} are
merged into a super-vertex
at the next coarsening level.
@' @'
Gl 1
6'
6'
6'
Gl
Gl+1
Gl+2
(P1) Vertex matching:
n
i, j = argmax
j
Wl
ii + 2Wl
ij + Wl
jj
dl
i + dl
j
o
(P2): Gl+1
=
⇢
Wl+1
ij = Cut(Cl
i, Cl
j)
Wl+1
ii = Assoc(Cl
i)
Graph coarsening/
clustering
6'
6'
Matched vertices { i, j} are
merged into a super-vertex
at the next coarsening level.
@ @@
Gl 1
6
6
Gl
Gll+1+1
Gl+2
(P1) Vertex matching:
nn
i, j = argmax
j
= argmax
j
= argmax
Wl
iiWiiW + 2Wl
ijWijW + Wl
jjWjjW
dl
i + dl
j
o
(P2): Gl+1
=
⇢
Wl+1
ijWijW = Cut(Cl
iCiC , Cl
j, Cj, C )
Wl
ij
l
ij
+1
iiWiiW = Assoc(Cl
iCiC )
Graph coarsening/Graph coarsening/
clusteringclustering
6
Matched vertices { }
6
6
6
6
6
66
Partition energy at level l :
X
matched{i,j}
Wl
ii + 2Wl
ij + Wl
jj
dl
i + dl
j
=
Kl
X
k=1
Assoc(Cl
k)
Vol(Cl
k)
,
where Cl
k is a super-vertex computed by (P1), i.e. Cl
k := matched{i, j}.
It is also a cluster with at most 2k+1
original vertices.
i
j
Figure 6: Graph coarsening with Graclus. Graclus proceeds by two successive steps: (P1)
Vertex matching, and (P2) Graph coarsening. These two steps provide a local solution to
the Normalized Association clustering problem at each coarsening level l.'
[5] Dhillon, Guan, Kulis, 2007'
Outline!
! Data Science and ANN
! Convolutional Neural Networks
- Why CNNs are good?
- Local Stationarity and Multiscale Hierarchical FeaturesStationarity and Multiscale Hierarchical FeaturesStationarity
!
!! Convolutional Neural Networks on Graphs !
- CNNs Only Process Euclidean-Structured Data
- CNNs for Graph-Structured Data
- Graph Spectral Theory for Convolution on Graphs
- Balanced Cut Model for Graph Coarsening
- Fast Graph Pooling!
!
! Numerical Experiments
! Conclusion
!"#$%&'(&%))*+' ..'
Fast Graph Pooling 1/2!
•  Unstructured pooling is inefficient: The graph and its coarsened versions
indexed by Graclus require to store a table with all matched vertices. !
⇒ Memory consuming and not (easily) parallelizable. !
Xavier	Bresson	 34	
•  Graph pooling: As standard CNNs, we must define a pooling process such
as max pooling or average pooling. This operation will be done many times
during the optimization task. !
•  Structured pooling is as efficient as a 1D grid pooling: Start from the
coarsest level, then propagate the ordering to the next finer level such that node
k has nodes 2k and 2k+1 as children ⇒ binary tree arrangement of the nodes
such that adjacent nodes are hierarchically merged at the next coarser level. !
Fast Graph Pooling 2/2!
!"#$%&'(&%))*+' .0'
Figure 7: Fast graph pooling using graph coarsening structure. The binary tree arrangement of
vertices allows a very e"cient pooling on graphs, as fast as a regular 1D Euclidean grid pooling.'
0
1
2
3
4 5
6
7
Gl=0
= G
Gl=1
0, 1
2, 3 4, 5
6, 7
0, 1
2, 3
4, 5
6, 7Gl=2
0
1 2
3
0
1
Graph coarsening:
0
1
2
3
4
5
6
7Gl=0
= G
Gl=1
Gl=2
0
1
2
3
0
1
Graph coarsening:
0, 2
1, 4
5, 7
3, 6
0, 2
1, 4
5, 7
3, 6
0 1 2 3 4 5 6 7
0 12 3
0 1
Graph pooling:
Unstructured arrangement of vertices
0 1 2 3 4 5 6 7
0 1 2 3
0 1
Graph pooling:
Binary tree arrangement of vertices
Reindexing w.r.t.
coarsening structure
0
1
2
3
44
5
6
77 5
333
Gl=0
= G
Gl=1
Gl=2
00
1
2
33
00
1
Graph coarsening:
0, 22
11, 4444
55, 7
333333, 66
0, 2
1, 4
55, 77
3, 6
0 1 2 3 4 5 6 7
00 111222 33
00 11
Graph pooling:
Unstructured arrangement of vertices
0
1
22
3
4 5
6
7
4
3 777
Gl=0
= G
Gl=1
0, 11
22, 3333 44, 5
66, 7
44 54
0, 1
2, 3
444, 5
66, 7Gl=2
00
11 22
33
00
11
Graph coarsening:
0 1 2 3 4 5 6 7
00 11 22 33
00 11
Graph pooling:
Binary tree arrangement of vertices
Matched vertices
Full Architecture of Graph CNNs!
!"#$%&'(&%))*+' .1'
xl=5
2 Rn5F5
7-8'!)9(%)(#$!
:!
;.)1F!9#)./-$+$2!
*)9,#.!51!
?.-M9#>1',-<!
:!
?##&+$2!C;?R/G!
✓l=6
2 Rn5nc
:!
;.)1F!
IDS!/#9+)&T!3+#&#2+9)&T!!
,-&-9#>>'$+9)(#$!2.)1F/!
"#$%#&'(#$)&!*+&,-./!
0C6G!1).)>-,-./!
0CIJ6G!#1-.)(#$/!C;?R/G!
@'
A$1',!/+2$)&!
#$!2.)1F/!
;.)1F!9#$%#&'(#$)&!&)B-./!
C-D,.)9,!>'&(/9)&-!&#9)&!/,)(#$).B!E-),'.-/!#$!2.)1F/G!
*'&&B!9#$$-9,-<!&)B-./!
0',1',!/+2$)&!
"&)//!&)3-&/!
x 2 Rn
xl=0
2 Rnl=0
=
✓l=1
2 RK1F1
xl=1
2 Rn1F1
y 2 Rnc
✓l=5
2 RK5F1...F5
n1 = n0/2p1
g1
✓K1
g3
✓K1
G = Gl=0
Gl=2
Gl=1
xl=0
g 2 Rn0F1
g2
✓K1
Figure 8. Architecture of the proposed CNNs on graphs.
Notation: l is the coarsening level, xl
are the down sampled signals at layer
l, Gl
is the coarser graph, g✓Kl are the spectral filters at layer l, xl
g are the
filtered signals, pl is the coarsening exponent, nc is the number of classes, y is
the output signal, and ✓l
is the number of parameters to learn at l.
Optimization!
•! Backpropagation [6] = Chain rule applied to the neurons at each layer.!
!"#$%&'(&%))*+' .2'
Gradient descents:
(
✓n+1
ij = ✓n
ij ⌧. @E
@✓ij
xn+1
i = xn
i ⌧. @E
@xi
Loss function: E =
X
s2S
ls log ys
Local gradients:
@E
@✓ij
=
@E
@yj
@yj
@✓ij
=
X
s2S
h
X0,s, ..., XK 1,s
iT @E
@yj,s
@E
@xi
=
@E
@yj
@yj
@xi
=
FoutX
j=1
g✓ij
(L)
@E
@yj
[6] Werbos 1982 and Rumelhart, Hinton, Williams, 1985'
…' …'
xi
yj
✓ij
@E
@yj
@E
@xi
@yj
@✓ij
E
yjyjy
yj =
FinX
i=1
g✓ij
(L)xi
yj
xi
✓✓✓ijijijij
Local !
computations!
i
Local Accumulative!
computations!
yjyjy
Accumulative
Backpropagation '
Outline!
! Data Science and ANN
! Convolutional Neural Networks
- Why CNNs are good?
- Local Stationarity and Multiscale Hierarchical FeaturesStationarity and Multiscale Hierarchical FeaturesStationarity
! Convolutional Neural Networks on Graphs
- CNNs Only Process Euclidean-Structured Data
- CNNs for Graph-Structured Data
- Graph Spectral Theory for Convolution on Graphs
- Balanced Cut Model for Graph Coarsening
- Fast Graph Pooling
!
!! Numerical Experiments!
!
! Conclusion
!"#$%&'(&%))*+' .3'
Numerical Experiments!
•  Platforms and hardware: All experiments are carried out with:!
(1) TensorFlow (Google AI software) [7]!
(2) GPU NVIDIA K40c (CUDA)!
Xavier	Bresson	 39	
•  Types of experiments:!
(1) Euclidean CNNs!
(2) Non-Euclidean CNNs!
[7] Abadi-et.al. 2015	
•  Code will be released soon!!
Revisiting Euclidean CNNs 1/2!
•! Sanity check: MNIST is the most popular dataset in deep learning [8].!
It is a dataset of 70,000 images represented on a 2D grid of size 28x28
(dim data = 282 = 784) of handwritten digits, from 0 to 9.!
!"#$%&'(&%))*+' /5'
•! Graph: A k-NN graph (k=8) of the Euclidean grid: !
Wij = e kxi xj k2
2/
i
j
ii
jj
kxi xjk2
jj
[8] LeCun, Bottou, Bengio, 1998'
Revisiting Euclidean CNNs 2/2!
Xavier	Bresson	 41	
•  Results: Classification rates!
•  CPU vs. GPU: A GPU implementation of graph CNNs is 8x faster than a
CPU implementation, same order as standard CNNs. !
Graph CNN only uses matrix multiplications which are efficiently implemented
in CUDA BLAS.!
Architecture Time Speedup
CNNs with CPU 210ms -
CNNs with GPU NVIDIA K40c 31ms 6.77x
graph CNNs with CPU 200ms -
graph CNNs with GPU NVIDIA K40c 25ms 8.00x
Algorithm Accuracy
Linear SVM 91.76
Softmax 92.36
CNNs [LeNet5] 99.33
graph CNNs: CN32-P4-CN64-P4-FC512-softmax 99.18
Non-Euclidean CNNs 1/2!
•! Text categorization with 20NEWS: It is a benchmark dataset introduced at
CMU [9]. It has 20,000 text documents (dim data = 33,000, #words in
dictionary) across 20 topics.!
!"#$%&'(&%))*+' /-'
Table 1. 20 Topics of 20NEWS!
Instance of document in topic:!
Auto!
Instance of document in topic:!
Medicine!
[9] Leng, 1995'
Non-Euclidean CNNs 1/2!
Xavier	Bresson	 43	
⇒ Recognition rate depends on graph quality.!
•  Results: Classification rates!
Algorithm Word2vec features
Linear SVM 65.90
Multinomial NB 68.51
Softmax 66.28
FC + softmax + dropout 64.64
FC + FC + softmax + dropout 65.76
graph CNNs: CN32-softmax 68.26
•  Influence of graph quality:!
Architecture G1= learned G2= normalized G3= pre-trained G4= ANN G5= random
graph [word2vec] word count graph graph graph graph
CN32-softmax 68.26 67.50 66.98 67.86 67.75
Comparison Our Model vs [Hena#-Bruna-LeCun’15]!
•! Advantages over (inspiring) pioneer work [HBL15]:!
(1) Computational complexity O(n) vs. O(n2):!
!
!
!
!
(2) no EVD O(n3) vs. [HBL15]!
!
(3) Accuracy:!
!
!
!
!
(4) Learn faster:!
!
!"#$%&'(&%))*+' //'
y = x ⇤G hK = ˆhK(L)x = U(ˆhK(⇤)(UT
x)),
where ˆhK are spectral filters based on Kth
-order splines, ⇤, U are the eigenvalue,
eigenvector matrices of the graph Laplacian L, and UT
, U are full O(n2
) matrices.
Architecture Accuracy
CN10 [HBL15] 97.26
CN10 [Ours] 97.48
CN32-P4-CN64-P4-FC512 [HBL15] 97.75
CN32-P4-CN64-P4-FC512 [Ours] 99.18
Outline!
! Data Science and ANN
! Convolutional Neural Networks
- Why CNNs are good?
- Local Stationarity and Multiscale Hierarchical FeaturesStationarity and Multiscale Hierarchical FeaturesStationarity
! Convolutional Neural Networks on Graphs
- CNNs Only Process Euclidean-Structured Data
- CNNs for Graph-Structured Data
- Graph Spectral Theory for Convolution on Graphs
- Balanced Cut Model for Graph Coarsening
- Fast Graph Pooling
! Numerical Experiments
!
!! Conclusion!
!"#$%&'(&%))*+' /0'
Conclusion!
•! Proposed contributions:!
(1) Generalization of CNNs to non-Euclidean domains/graph-structured data!
(2) Localized filters on graphs!
(3) Same learning complexity as CNNs while being universal to any graph!
(4) GPU implementation !
!"#$%&'(&%))*+' /1'
•! Future applications:!
(i) Social networks (Facebook, Twitter)!
(ii) Biological networks (genes, brain connectivity)!
(iii) Communication networks (Internet, wireless, tra"c)!
N#9+)&!$-,=#.L/! O.)+$!/,.'9,'.-! Q-&-9#>>'$+9)(#$!
$-,=#.L/!
•! Paper (accepted at NIPS’16): M. De#errard, X. Bresson, P. Vandergheynst,
“Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering”, arXiv:
1606.09375, 2016!
Thank	you.	
Xavier	Bresson	 47

Contenu connexe

Tendances

Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
 
Deep image generating models
Deep image generating modelsDeep image generating models
Deep image generating modelsLuba Elliott
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksJeremy Nixon
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolutionPrudhvi Raj
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
CONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKCONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKMd Rajib Bhuiyan
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksHannes Hapke
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationNamHyuk Ahn
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesDmytro Mishkin
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...IOSR Journals
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkFerdous ahmed
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingJinwon Lee
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksJinwon Lee
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
 
Introduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable RenderingIntroduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable RenderingPreferred Networks
 

Tendances (20)

Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
Deep image generating models
Deep image generating modelsDeep image generating models
Deep image generating models
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
 
Deep learning
Deep learningDeep learning
Deep learning
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Cnn
CnnCnn
Cnn
 
CONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKCONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORK
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
Introduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable RenderingIntroduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable Rendering
 

Similaire à Talk Norway Aug2016

Exploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image RecognitionExploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image RecognitionYongsu Baek
 
KDD17Tutorial_final (1).pdf
KDD17Tutorial_final (1).pdfKDD17Tutorial_final (1).pdf
KDD17Tutorial_final (1).pdfssuserf2f0fe
 
AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhereinside-BigData.com
 
CVPR 2012 Review Seminar - Multi-View Hair Capture using Orientation Fields
CVPR 2012 Review Seminar - Multi-View Hair Capture using Orientation FieldsCVPR 2012 Review Seminar - Multi-View Hair Capture using Orientation Fields
CVPR 2012 Review Seminar - Multi-View Hair Capture using Orientation FieldsJun Saito
 
A Neural Network that Understands Handwriting
A Neural Network that Understands HandwritingA Neural Network that Understands Handwriting
A Neural Network that Understands HandwritingShivam Sawhney
 
Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...
Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...
Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...Ashray Bhandare
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...Tulipp. Eu
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural NetworksDean Wyatte
 
lecture_RNN Autoencoder.pdf
lecture_RNN Autoencoder.pdflecture_RNN Autoencoder.pdf
lecture_RNN Autoencoder.pdfssusercc3ff71
 
Web services based workflows to deal with 3D data
Web services based workflows to deal with 3D dataWeb services based workflows to deal with 3D data
Web services based workflows to deal with 3D dataJose Enrique Ruiz
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
 
Skytree big data london meetup - may 2013
Skytree   big data london meetup - may 2013Skytree   big data london meetup - may 2013
Skytree big data london meetup - may 2013bigdatalondon
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...inside-BigData.com
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksBICA Labs
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksMark Scully
 
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...npinto
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Maté Ongenaert
 

Similaire à Talk Norway Aug2016 (20)

Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
 
Exploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image RecognitionExploring Randomly Wired Neural Networks for Image Recognition
Exploring Randomly Wired Neural Networks for Image Recognition
 
KDD17Tutorial_final (1).pdf
KDD17Tutorial_final (1).pdfKDD17Tutorial_final (1).pdf
KDD17Tutorial_final (1).pdf
 
AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhere
 
CVPR 2012 Review Seminar - Multi-View Hair Capture using Orientation Fields
CVPR 2012 Review Seminar - Multi-View Hair Capture using Orientation FieldsCVPR 2012 Review Seminar - Multi-View Hair Capture using Orientation Fields
CVPR 2012 Review Seminar - Multi-View Hair Capture using Orientation Fields
 
A Neural Network that Understands Handwriting
A Neural Network that Understands HandwritingA Neural Network that Understands Handwriting
A Neural Network that Understands Handwriting
 
Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...
Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...
Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
 
lecture_RNN Autoencoder.pdf
lecture_RNN Autoencoder.pdflecture_RNN Autoencoder.pdf
lecture_RNN Autoencoder.pdf
 
Web services based workflows to deal with 3D data
Web services based workflows to deal with 3D dataWeb services based workflows to deal with 3D data
Web services based workflows to deal with 3D data
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
 
Skytree big data london meetup - may 2013
Skytree   big data london meetup - may 2013Skytree   big data london meetup - may 2013
Skytree big data london meetup - may 2013
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
 
Neural
NeuralNeural
Neural
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1
 

Dernier

FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 

Dernier (20)

Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 

Talk Norway Aug2016

  • 1. Fast Convolutional Neural Networks ! for Graph-Structured Data! Xavier Bresson! !"#$%&'(&%))*+' ,' Swiss Federal Institute of Technology (EPFL) ! Joint work with Michaël De"errard (EPFL) ! and Pierre Vandergheynst (EPFL) ! ! Organizers: Xue-Cheng Tai, Egil Bae, ! Marius Lysaker, Alexander Vasiliev !! Aug 29th 2016! Nanyang Technological University (NTU) from Dec. 2016 !
  • 2. Data Science and Artificial Neural Networks ! •! Data Science is an interdisciplinary field that aims at transforming raw data into meaningful knowledge to provide smart decisions for real-world problems:! !"#$%&'(&%))*+' -' •! In the news:!
  • 3. A Brief History of Artificial Neural Networks ! !"#$%&'(&%))*+' .' 1958' Perceptron! Rosenblatt! 1982' Backprop ! Werbos! 1959' Visual primary cortex! Hubel-Wiesel! 1987' Neocognitron! Fukushima! SVM/Kernel techniques! Vapnik! 1995' 1997'1998' 1999'1999 2006' 2012' 2014' 2015' Data scientist! 1st Job in US! Facebook Center! OpenAI Center! Deep Learning! Breakthough ! Hinton! Auto-encoder! LeCun, Hinton, Bengio! First NVIDIA ! GPU! First Amazon! Cloud Center! 2012 RNN! Schmidhuber! 1997 CNN! LeCun! 1997 19981997 1999 First NVIDIA AI “Winter” [1960’s-2012]!AI Hope! AI Resurgence! Hardware! GPU speed doubles/ year! Kernel techniques! Handcrafted Features! Graphical models! 4th industrial revolution?! Digital Intelligence ! Revolution! or new AI bubble?! Big Data! Volume doubles/1.5 year! Google AI ! TensorFlow! Facebook AI! Torch! 1962'1962 Birth of! Data Science! Split from Statistics! Tukey! Perceptron Rosenblatt 1962 First ! NIPS! 1989' First ! KDD! 19891989 First KDD 2010' Kaggle! Platform!
  • 4. What happened in 2012?! !"#$%&'(&%))*+' /' •! ImageNet [Fei Fei-et.al.’09]: International Image Classification Challenge ! 1,000 object classes and 1,431,167 images! Error decreased by 2!! •! Before 2012: Handcrafted filters + SVM classification! ! ! ! ! ! After 2012: Learned filters ! with Neural Networks! SIFT ! Lowe, 1999! (most cited paper in CV)' Histogram of Gradients (HoG)! Dalal & Triggs, 2005! Handcrafted filters + SVM classificationHandcrafted filters + SVM classificationHandcrafted filters + SVM classificationHandcrafted filters + SVM classification [L. Fei-Fei-et.al.’09]' [Russakovsky-et.al.’14]' [Fei-Fei Li-et.al.’15]' [T.Q.Hoan’13]' [Karpathy-Johnson’16]'
  • 5. Outline! Ø  Data Science and ANN! Ø  Convolutional Neural Networks ! - Why CNNs are good? ! - Local Stationarity and Multiscale Hierarchical Features! ! Ø  Convolutional Neural Networks on Graphs ! - CNNs Only Process Euclidean-Structured Data! - CNNs for Graph-Structured Data ! - Graph Spectral Theory for Convolution on Graphs! - Balanced Cut Model for Graph Coarsening! - Fast Graph Pooling! ! Ø  Numerical Experiments! ! Ø  Conclusion! Xavier Bresson 5
  • 6. Outline! ! Data Science !! Convolutional Neural Networks ! - Why CNNs are good? ! - Local Stationarity and Multiscale Hierarchical Features! ! ! Convolutional Neural Networks on Graphs - CNNs Only Process Euclidean-Structured Data - CNNs for Graph-Structured Data - Graph Spectral Theory for Convolution on Graphs - Balanced Cut Model for Graph Coarsening - Fast Graph Pooling ! Numerical Experiments ! Conclusion !"#$%&'(&%))*+' 1'
  • 7. Convolutional Neural Networks ! •! CNNs [LeCun-et.al.’98] are very successful for Computer Vision tasks:! !! Image/object recognition [Krizhevsky-Sutskever-Hinton’12] ! !! Image captioning [Karpathy-FeiFei’15]! !! Image inpainting [Pathak-Efros-etal’16]! !! Etc! !"#$%&'(&%))*+' 2' •! CNNs are used by several (big) IT companies:! !! Facebook (Torch software)! !! Google (TensorFlow software, Google Brain, Deepmind)! !! Microsoft ! !! Tesla (AI Open)! !! Amazon (DSSTNE software)! !! Apple! !! IBM! Amazon (DSSTNE software)
  • 8. Why CNNs are Good?! •  CNNs are extremely efficient at extracting meaningful statistical patterns in large-scale and high-dimensional datasets.! Xavier Bresson 8 •  Key idea: Learn local stationary structures and compose them to form multiscale hierarchical patterns.! •  Why? It is open (math) question to prove the efficiency of CNNs.! Note: Despite the lack of theory, the entire ML and CV communities have shifted to deep learning techniques! Ex: NIPS’16: 2326 submissions, 328 DL (14%), Convex Optimization 90 (3.8%). !
  • 9. Local Stationarity! •! Assumption: Data are locally stationary " similar local patches are shared across the data domain:! !"#$%&'(&%))*+' 4' •! How to extract local stationary patterns? ! Convolutional filters (filters with ! compact support)! x ⇤ F3 x ⇤ F2 x ⇤ F1"#$%#&'(#$)&! *+&,-./! F1 F2 F3 x
  • 10. Multiscale Hierarchical Features! •! Assumption: Local stationary patterns can be composed to form more abstract complex patterns:! !"#$%&'(&%))*+' ,5' •! How to extract multiscale hierarchical patterns? ! Downsampling of data domain (s.a. image grid) with Pooling (s.a. max, average).! •! Other advantage: Reduce computational complexity while increasing #filters.! 2x2 max! pooling' 2x2 max ! pooling' Layer 1' Layer 2' Layer 3' Layer 4'Layer 1 Layer 2 Layer 3 Layer 4 Deep/hierarchical ! Features (simple to abstract)' …' [Karpathy-! Johnson’16]'
  • 11. Classification Function! •  Classifier: After extracting multiscale locally stationary features, use them to design a classification function with the training labels.! Xavier Bresson 11 •  How to design a (linear) classifier? ! Fully connected neural networks.! Output signal Class labels Class 1 Class 2 Class K Features xclass = Wxfeat
  • 12. Full Architecture of CNNs! !"#$%&'(&%))*+' ,-' 7-8'!)9(%)(#$! :! ;.+<!<#=$/)>1&+$2! :! ?##&+$2!! :! ?##&+$2! "#$%#&'(#$)&! *+&,-./! @' A$1',!/+2$)&! "#$%#&'(#$)&!&)B-./! C-D,.)9,!&#9)&!/,)(#$).B!E-),'.-/!)$<!9#>1#/-!,F->!! %+)!<#=$/)>1&+$2!)$<!1##&+$2G! *'&&B!9#$$-9,-<!&)B-./! C"&)//+H9)(#$!E'$9(#$G! 0',1',!/+2$)&! "&)//!&)3-&/! y 2 Rnc F1 F2 F3 x ⇤ F3 x ⇤ F2 x ⇤ F1 x xl=0 = x xl=1 xl=1 convxl=0 conv xl IJ2J!+>)2-!
  • 13. CNNs Only Process Euclidean-Structured Data! •! CNNs are designed for Data lying on Euclidean spaces:! (1) Convolution on Euclidean grids (FFT)! (2) Downsampling on Euclidean grids! (3) Pooling on Euclidean grids! Everything mathematically well defined and computationally fast!! !"#$%&'(&%))*+' ,.' •! But not all data lie on Euclidean grids!! Images (2D, 3D) ! videos (2+1D)' Sound (1D)'
  • 14. Non-Euclidean Data! !"#$%&'(&%))*+' ,/' •! Examples of irregular/graph-structured data: ! (1) Social networks (Facebook, Twitter)! (2) Biological networks (genes, brain connectivity)! (3) Communication networks (wireless, tra"c)! •! Main challenges: ! (1) How to define convolution, downsampling and pooling on graphs?! (2) And how to make them numerically fast?! •! Current solution: Map graph-structured data to regular/Euclidean grids with e.g. kernel methods and apply standard CNNs. ! Limitation: Handcrafting the mapping is (to my opinion) against CNN principle! ! ;.)1FK$-,=#.LM! !/,.'9,'.-<!<),)!! N#9+)&!$-,=#.L/! O.)+$!/,.'9,'.-! P' Q-&-9#>>'$+9)(#$! $-,=#.L/!
  • 15. Outline! ! Data Science and ANN ! Convolutional Neural Networks - Why CNNs are good? - Local Stationarity and Multiscale Hierarchical FeaturesStationarity and Multiscale Hierarchical FeaturesStationarity ! !! Convolutional Neural Networks on Graphs ! - CNNs Only Process Euclidean-Structured Data - CNNs for Graph-Structured Data ! - Graph Spectral Theory for Convolution on Graphs! - Balanced Cut Model for Graph Coarsening! - Fast Graph Pooling! ! ! Numerical Experiments ! Conclusion !"#$%&'(&%))*+' ,0'
  • 16. CNNs for Graph-Structured Data! •  Our contribution: Generalizing CNNs to any graph-structured data with same computational complexity as standard CNNs!! Xavier Bresson 16 •  What tools for this generalization?! (1) Graph spectral theory for convolution on graphs,! (2) Balanced cut model for graph coarsening,! (3) Graph pooling with binary tree structure of coarsened graphs.!
  • 17. Related Works! !"#$%&'(&%))*+' ,2' •! Categories of graph CNNs: ! (1) Spatial approach! (2) Spectral (Fourier) approach ! •! Spatial approach: ! !! Local reception fields [Coates-Ng’11, Gregor-LeCun’10]:! Find compact groups of similar features, but no defined convolution.! !! Locally Connected Networks [Bruna-Zaremba-Szlam-LeCun’13]:! Exploit multiresolution structure of graphs, but no defined convolution.! !! ShapeNet [Bronstein-et.al.’15’16]:! Generalization of CNNs to 3D-meshes. Convolution well-defined in these ! smooth low-dimensional non-Euclidean spaces. Handle multiple graphs. ! Obtained state-of-the-art results for 3D shape recognition.! •! Spectral approach: ! !! Deep Spectral Networks [Hena#-Bruna-LeCun’15]:! Related to this work. We will compare later.!
  • 18. Outline! ! Data Science and ANN ! Convolutional Neural Networks - Why CNNs are good? - Local Stationarity and Multiscale Hierarchical FeaturesStationarity and Multiscale Hierarchical FeaturesStationarity ! !! Convolutional Neural Networks on Graphs ! - CNNs Only Process Euclidean-Structured Data - CNNs for Graph-Structured Data - Graph Spectral Theory for Convolution on Graphs! - Balanced Cut Model for Graph Coarsening - Fast Graph Pooling ! ! Numerical Experiments ! Conclusion !"#$%&'(&%))*+' ,3'
  • 19. Convolution on Graphs 1/3! •! Graphs: G=(V,E,W), with V set of vertices, E set of edges, ! W similarity matrix, and |V|=n.! !"#$%&'(&%))*+' ,4' i j i 2 V j 2 V Wij = 0.9 eij 2 E ii 2 V jjjjj j W G •! Graph Laplacian (core operator to spectral graph theory [1]): ! 2nd order derivative operator on graphs! ⇢ L = D W normalized L = In D 1/2 WD 1/2 unnormalized [1] Chung, 1997'
  • 20. F 1 G ˆf = U ˆf = UUT f = f, Convolution on Graphs 2/3! •  Fourier transform on graphs [2]: L symmetric positive semidefinite matrix ! ⇒ It has a set of orthonormal eigenvectors {ul} known as graph Fourier modes, associated to nonnegative eigenvalues {λl} known as the graph frequencies.! Xavier Bresson 20 FGf = ˆf = UT f 2 Rn , The Graph Fourier Transform of f 2 Rn is which value at frequency λl is: ˆf( l) = ˆfl := hf, uli = n 1X i=0 f(i)ul(i) The inverse GFT is defined as: which value at vertex i is: (U ˆf)(i) = n 1X l=0 ˆflul(i). [2] Hammond, Vandergheynst, Gribonval, 2011
  • 21. Convolution on Graphs 3/3! •  Convolution on graphs (in the Fourier domain) [2]:! Xavier Bresson 21 f ⇤G g = F 1 G FGf FGg 2 Rn , which value at vertex i is: (f ⇤G g)(i) = n 1X l=0 ˆflˆglul(i) It is also convenient to see that: f ⇤G g = ˆg(L)f, as f ⇤G g = U (UT f) (UT g) = U 2 6 4 ˆg( 0) ... ˆg( n 1) 3 7 5 UT f = Uˆg(⇤)UT f = ˆg(L)f[2] Hammond, Vandergheynst, Gribonval, 2011
  • 22. Translation on Graphs 1/2! •  Translation on graphs: A signal f defined on the graph can be translated to any vertex i using the graph convolution operator [3]:! Xavier Bresson 22 Tif := f ⇤G i, where Tif is the graph translation operator with vertex shift i. ! Function f, translated at vertex i, has the following value at vertex j: This formula is the graph counterpart of the continuous translation operator: (Tsf)(x) = f(x s) = (f ⇤ s)(x) = Z R ˆf(⇠)e 2⇡i⇠s e2⇡i⇠x d⇠, where ˆf(⇠) = hf, e2⇡i⇠x i, and e2⇡i⇠x are the eigenfunctions of the continuum Laplace-Beltrami operator , i.e. the continuum version of the graph Fourier modes ul. (Tif)(j) = f(j G i) = (f ⇤G i)(j) = n 1X l=0 ˆflul(i)ul(j), [3] Shuman, Ricaud, Vandergheynst, 2016
  • 23. Translation on Graphs 2/2! !"#$%&'(&%))*+' -.' •! Note: Translation on graphs are easier to carry out with the spectral approach, than directly in the spatial/graph domain.! (a) Tsf (b) Ts0 f (c) Ts00 f (f) Ti00 f(e) Ti0 f(d) Tif Figure 1: Translated signals in the continuous R2 domain (a-c), and in the graph domain (d-f). The component of the translated signal at the center vertex is highlighted in green. [Shuman-Ricaud-! Vandergheynst’16]'
  • 24. ˆg( l) = ˆpK( l) := KX k=0 ak k l , (1) where ˆpK is a Kth order polynomial function of the Laplacian eigenvalues l. This class of kernels defines spatially localized filters as proved below: Theorem 1 Laplacian-based polynomial kernels (1) are K-localized in the sense that (TipK)(j) = 0 if dG(i, j) > K, (2) where dG(i, j) is the discrete geodesic distance on graphs, that is the shortest path between vertex i and vertex j. Localized Filters on Graphs 1/3! •! Laplacian polynomial kernels [2]: We consider a family of spectral kernels defined as:! !"#$%&'(&%))*+' -/' •! Localized convolutional kernels: As standard CNNs, we must ! define localized filters on graphs.! [2] Hammond, Vandergheynst, Gribonval, 2011'
  • 25. Localized Filters on Graphs 2/3! !"#$%&'(&%))*+' -0' Corollary 1. Consider the function ij such that ij = TipK (j) = pK ⇤G i (j) = ˆpK(L) i (j) = KX k=0 akLk i (j). Then ij = 0 if dG(i, j) > K. Vertex iVertex Spatial profile of polynomial filter given by ij BK i = Support of polynomial filter at vertex i Figure 2. Illustration of localized filters on graphs. Laplacian-based polynomial kernels are exactly localized in a K-ball BK i centered at vertex i.
  • 26. Vertex iVertex BK i = Support of polynomial filter at vertex i Localized Filters on Graphs 3/3! !"#$%&'(&%))*+' -1' Corollary 2. Localized filters on graphs are defined according to the principle: Frequency smoothness , Spatial graph localization This is obviously the Heisenberg’s uncertainty principle extended to the graph setting. Recent papers have studied the uncertainty principle on graphs.
  • 27. Chebyshev polynomials: Let Tk(x) the Chebyshev polynomial of order k gen- erated by the fundamental recurrence property Tk(x) = 2xTk 1(x) Tk 2(x) with T0 = 1 and T1 = x. The Chebyshev basis {T0, T1, ..., TK} forms an orthog- onal basis in [ 1, 1]. Fast Chebyshev Polynomial Kernels 1/2! •! Graph filtering: Let y be a signal x filtered by a Laplacian-based! polynomial kernel:! !"#$%&'(&%))*+' -2' •! C! y = x ⇤G pK = ˆpK(L)x = KX k=0 akLk x The monomial basis {1, x, x2 , x3 , ..., xK } provides localized spatial filters, but does not form an orthogonal basis (e.g. h1, xi = R 1 0 1xdx = x2 2 1 0 = 1 2 ), which limits its ability to learn good spectral filters. Figure 3. First six Chebyshev polynomials. x ⇤ F3 x ⇤ F2 x ⇤ F1"#$%#&'(#$)&! *+&,-./! F1 F2 F3 x
  • 28. Fast Chebyshev Polynomial Kernels 2/2! •  Graph filtering with Chebyshev [2]: The filtered signal y is defined with the Chebyshev polynomials is:! Xavier Bresson 28 •  F! y = x ⇤G qK = KX k=0 ✓kTk(L)x, ˆqK( ) = KX k=0 ✓kTk( ). with the Chebyshev spectral kernel: Fast filtering: Let denote Xk := Tk(L)x and rewrite y = PK k=0 ✓kXk. Then all {Xk} are generated with the recurrence equation Xk = 2LXk 1 Xk 2. As L is sparse, all matrix multiplications are done between a sparse matrix and a vector. The computational complexity is O(EK), and reduces to linear complexity O(n) for k-NN graphs. •  GPU parallel implementation: Linear algebra operations can be done in parallel, allowing a fast GPU implementation of Chebyshev filtering. ! [2] Hammond, Vandergheynst, Gribonval, 2011
  • 29. Outline! ! Data Science and ANN ! Convolutional Neural Networks - Why CNNs are good? - Local Stationarity and Multiscale Hierarchical FeaturesStationarity and Multiscale Hierarchical FeaturesStationarity ! !! Convolutional Neural Networks on Graphs ! - CNNs Only Process Euclidean-Structured Data - CNNs for Graph-Structured Data - Graph Spectral Theory for Convolution on Graphs - Balanced Cut Model for Graph Coarsening! - Fast Graph Pooling ! ! Numerical Experiments ! Conclusion !"#$%&'(&%))*+' -4'
  • 30. Graph Coarsening! !"#$%&'(&%))*+' .5' •! Graph coarsening: As standard CNNs, we must define a grid coarsening process for graphs. It will be essential for pooling similar features together.! •! Graph partitioning: Graph coarsening is equivalent to graph clustering, which is a NP-hard combinatorial problem.! Gl=0 = G Gl=1 Gl=2 Graph coarsening/ clustering Graph coarsening/ clustering Figure 4: Illustration of graph coarsening.'
  • 31. Graph Partitioning! !"#$%&'(&%))*+' .,' •! Balanced Cuts [4]: Two powerful measures of graph clustering are the Normalized Cut and Normalized Association defined as:! where Cut(A, B) := P i2A,j2B Wij, Assoc(A) := P i2A,i2B Wij, Vol(A) := P i2A,j2B di, and di := P j2V Wij is the degree of vertex i. Figure 5: Equivalence between NCut and NAssoc partitioning.' min C1,...,CK KX k=1 Cut(Ck, Cc k) Vol(Ck) Equivalence by complementarity , Normalized Cut: Partitioning by min edge cuts. Ck Cc k max C1,...,CK KX k=1 Assoc(Ck) Vol(Ck) Normalized Association: Partitioning by max vertex matching. Ck [4] Shi, Malik, 2000'
  • 32. Graclus Graph Partitioning! !"#$%&'(&%))*+' .-' •! Graclus [5]: It is a greedy (fast) technique that computes clusters that locally maximize the Normalized Association.! Matched vertices { i, j} are merged into a super-vertex at the next coarsening level. @' @' Gl 1 6' 6' 6' Gl Gl+1 Gl+2 (P1) Vertex matching: n i, j = argmax j Wl ii + 2Wl ij + Wl jj dl i + dl j o (P2): Gl+1 = ⇢ Wl+1 ij = Cut(Cl i, Cl j) Wl+1 ii = Assoc(Cl i) Graph coarsening/ clustering 6' 6' Matched vertices { i, j} are merged into a super-vertex at the next coarsening level. @ @@ Gl 1 6 6 Gl Gll+1+1 Gl+2 (P1) Vertex matching: nn i, j = argmax j = argmax j = argmax Wl iiWiiW + 2Wl ijWijW + Wl jjWjjW dl i + dl j o (P2): Gl+1 = ⇢ Wl+1 ijWijW = Cut(Cl iCiC , Cl j, Cj, C ) Wl ij l ij +1 iiWiiW = Assoc(Cl iCiC ) Graph coarsening/Graph coarsening/ clusteringclustering 6 Matched vertices { } 6 6 6 6 6 66 Partition energy at level l : X matched{i,j} Wl ii + 2Wl ij + Wl jj dl i + dl j = Kl X k=1 Assoc(Cl k) Vol(Cl k) , where Cl k is a super-vertex computed by (P1), i.e. Cl k := matched{i, j}. It is also a cluster with at most 2k+1 original vertices. i j Figure 6: Graph coarsening with Graclus. Graclus proceeds by two successive steps: (P1) Vertex matching, and (P2) Graph coarsening. These two steps provide a local solution to the Normalized Association clustering problem at each coarsening level l.' [5] Dhillon, Guan, Kulis, 2007'
  • 33. Outline! ! Data Science and ANN ! Convolutional Neural Networks - Why CNNs are good? - Local Stationarity and Multiscale Hierarchical FeaturesStationarity and Multiscale Hierarchical FeaturesStationarity ! !! Convolutional Neural Networks on Graphs ! - CNNs Only Process Euclidean-Structured Data - CNNs for Graph-Structured Data - Graph Spectral Theory for Convolution on Graphs - Balanced Cut Model for Graph Coarsening - Fast Graph Pooling! ! ! Numerical Experiments ! Conclusion !"#$%&'(&%))*+' ..'
  • 34. Fast Graph Pooling 1/2! •  Unstructured pooling is inefficient: The graph and its coarsened versions indexed by Graclus require to store a table with all matched vertices. ! ⇒ Memory consuming and not (easily) parallelizable. ! Xavier Bresson 34 •  Graph pooling: As standard CNNs, we must define a pooling process such as max pooling or average pooling. This operation will be done many times during the optimization task. ! •  Structured pooling is as efficient as a 1D grid pooling: Start from the coarsest level, then propagate the ordering to the next finer level such that node k has nodes 2k and 2k+1 as children ⇒ binary tree arrangement of the nodes such that adjacent nodes are hierarchically merged at the next coarser level. !
  • 35. Fast Graph Pooling 2/2! !"#$%&'(&%))*+' .0' Figure 7: Fast graph pooling using graph coarsening structure. The binary tree arrangement of vertices allows a very e"cient pooling on graphs, as fast as a regular 1D Euclidean grid pooling.' 0 1 2 3 4 5 6 7 Gl=0 = G Gl=1 0, 1 2, 3 4, 5 6, 7 0, 1 2, 3 4, 5 6, 7Gl=2 0 1 2 3 0 1 Graph coarsening: 0 1 2 3 4 5 6 7Gl=0 = G Gl=1 Gl=2 0 1 2 3 0 1 Graph coarsening: 0, 2 1, 4 5, 7 3, 6 0, 2 1, 4 5, 7 3, 6 0 1 2 3 4 5 6 7 0 12 3 0 1 Graph pooling: Unstructured arrangement of vertices 0 1 2 3 4 5 6 7 0 1 2 3 0 1 Graph pooling: Binary tree arrangement of vertices Reindexing w.r.t. coarsening structure 0 1 2 3 44 5 6 77 5 333 Gl=0 = G Gl=1 Gl=2 00 1 2 33 00 1 Graph coarsening: 0, 22 11, 4444 55, 7 333333, 66 0, 2 1, 4 55, 77 3, 6 0 1 2 3 4 5 6 7 00 111222 33 00 11 Graph pooling: Unstructured arrangement of vertices 0 1 22 3 4 5 6 7 4 3 777 Gl=0 = G Gl=1 0, 11 22, 3333 44, 5 66, 7 44 54 0, 1 2, 3 444, 5 66, 7Gl=2 00 11 22 33 00 11 Graph coarsening: 0 1 2 3 4 5 6 7 00 11 22 33 00 11 Graph pooling: Binary tree arrangement of vertices Matched vertices
  • 36. Full Architecture of Graph CNNs! !"#$%&'(&%))*+' .1' xl=5 2 Rn5F5 7-8'!)9(%)(#$! :! ;.)1F!9#)./-$+$2! *)9,#.!51! ?.-M9#>1',-<! :! ?##&+$2!C;?R/G! ✓l=6 2 Rn5nc :! ;.)1F! IDS!/#9+)&T!3+#&#2+9)&T!! ,-&-9#>>'$+9)(#$!2.)1F/! "#$%#&'(#$)&!*+&,-./! 0C6G!1).)>-,-./! 0CIJ6G!#1-.)(#$/!C;?R/G! @' A$1',!/+2$)&! #$!2.)1F/! ;.)1F!9#$%#&'(#$)&!&)B-./! C-D,.)9,!>'&(/9)&-!&#9)&!/,)(#$).B!E-),'.-/!#$!2.)1F/G! *'&&B!9#$$-9,-<!&)B-./! 0',1',!/+2$)&! "&)//!&)3-&/! x 2 Rn xl=0 2 Rnl=0 = ✓l=1 2 RK1F1 xl=1 2 Rn1F1 y 2 Rnc ✓l=5 2 RK5F1...F5 n1 = n0/2p1 g1 ✓K1 g3 ✓K1 G = Gl=0 Gl=2 Gl=1 xl=0 g 2 Rn0F1 g2 ✓K1 Figure 8. Architecture of the proposed CNNs on graphs. Notation: l is the coarsening level, xl are the down sampled signals at layer l, Gl is the coarser graph, g✓Kl are the spectral filters at layer l, xl g are the filtered signals, pl is the coarsening exponent, nc is the number of classes, y is the output signal, and ✓l is the number of parameters to learn at l.
  • 37. Optimization! •! Backpropagation [6] = Chain rule applied to the neurons at each layer.! !"#$%&'(&%))*+' .2' Gradient descents: ( ✓n+1 ij = ✓n ij ⌧. @E @✓ij xn+1 i = xn i ⌧. @E @xi Loss function: E = X s2S ls log ys Local gradients: @E @✓ij = @E @yj @yj @✓ij = X s2S h X0,s, ..., XK 1,s iT @E @yj,s @E @xi = @E @yj @yj @xi = FoutX j=1 g✓ij (L) @E @yj [6] Werbos 1982 and Rumelhart, Hinton, Williams, 1985' …' …' xi yj ✓ij @E @yj @E @xi @yj @✓ij E yjyjy yj = FinX i=1 g✓ij (L)xi yj xi ✓✓✓ijijijij Local ! computations! i Local Accumulative! computations! yjyjy Accumulative Backpropagation '
  • 38. Outline! ! Data Science and ANN ! Convolutional Neural Networks - Why CNNs are good? - Local Stationarity and Multiscale Hierarchical FeaturesStationarity and Multiscale Hierarchical FeaturesStationarity ! Convolutional Neural Networks on Graphs - CNNs Only Process Euclidean-Structured Data - CNNs for Graph-Structured Data - Graph Spectral Theory for Convolution on Graphs - Balanced Cut Model for Graph Coarsening - Fast Graph Pooling ! !! Numerical Experiments! ! ! Conclusion !"#$%&'(&%))*+' .3'
  • 39. Numerical Experiments! •  Platforms and hardware: All experiments are carried out with:! (1) TensorFlow (Google AI software) [7]! (2) GPU NVIDIA K40c (CUDA)! Xavier Bresson 39 •  Types of experiments:! (1) Euclidean CNNs! (2) Non-Euclidean CNNs! [7] Abadi-et.al. 2015 •  Code will be released soon!!
  • 40. Revisiting Euclidean CNNs 1/2! •! Sanity check: MNIST is the most popular dataset in deep learning [8].! It is a dataset of 70,000 images represented on a 2D grid of size 28x28 (dim data = 282 = 784) of handwritten digits, from 0 to 9.! !"#$%&'(&%))*+' /5' •! Graph: A k-NN graph (k=8) of the Euclidean grid: ! Wij = e kxi xj k2 2/ i j ii jj kxi xjk2 jj [8] LeCun, Bottou, Bengio, 1998'
  • 41. Revisiting Euclidean CNNs 2/2! Xavier Bresson 41 •  Results: Classification rates! •  CPU vs. GPU: A GPU implementation of graph CNNs is 8x faster than a CPU implementation, same order as standard CNNs. ! Graph CNN only uses matrix multiplications which are efficiently implemented in CUDA BLAS.! Architecture Time Speedup CNNs with CPU 210ms - CNNs with GPU NVIDIA K40c 31ms 6.77x graph CNNs with CPU 200ms - graph CNNs with GPU NVIDIA K40c 25ms 8.00x Algorithm Accuracy Linear SVM 91.76 Softmax 92.36 CNNs [LeNet5] 99.33 graph CNNs: CN32-P4-CN64-P4-FC512-softmax 99.18
  • 42. Non-Euclidean CNNs 1/2! •! Text categorization with 20NEWS: It is a benchmark dataset introduced at CMU [9]. It has 20,000 text documents (dim data = 33,000, #words in dictionary) across 20 topics.! !"#$%&'(&%))*+' /-' Table 1. 20 Topics of 20NEWS! Instance of document in topic:! Auto! Instance of document in topic:! Medicine! [9] Leng, 1995'
  • 43. Non-Euclidean CNNs 1/2! Xavier Bresson 43 ⇒ Recognition rate depends on graph quality.! •  Results: Classification rates! Algorithm Word2vec features Linear SVM 65.90 Multinomial NB 68.51 Softmax 66.28 FC + softmax + dropout 64.64 FC + FC + softmax + dropout 65.76 graph CNNs: CN32-softmax 68.26 •  Influence of graph quality:! Architecture G1= learned G2= normalized G3= pre-trained G4= ANN G5= random graph [word2vec] word count graph graph graph graph CN32-softmax 68.26 67.50 66.98 67.86 67.75
  • 44. Comparison Our Model vs [Hena#-Bruna-LeCun’15]! •! Advantages over (inspiring) pioneer work [HBL15]:! (1) Computational complexity O(n) vs. O(n2):! ! ! ! ! (2) no EVD O(n3) vs. [HBL15]! ! (3) Accuracy:! ! ! ! ! (4) Learn faster:! ! !"#$%&'(&%))*+' //' y = x ⇤G hK = ˆhK(L)x = U(ˆhK(⇤)(UT x)), where ˆhK are spectral filters based on Kth -order splines, ⇤, U are the eigenvalue, eigenvector matrices of the graph Laplacian L, and UT , U are full O(n2 ) matrices. Architecture Accuracy CN10 [HBL15] 97.26 CN10 [Ours] 97.48 CN32-P4-CN64-P4-FC512 [HBL15] 97.75 CN32-P4-CN64-P4-FC512 [Ours] 99.18
  • 45. Outline! ! Data Science and ANN ! Convolutional Neural Networks - Why CNNs are good? - Local Stationarity and Multiscale Hierarchical FeaturesStationarity and Multiscale Hierarchical FeaturesStationarity ! Convolutional Neural Networks on Graphs - CNNs Only Process Euclidean-Structured Data - CNNs for Graph-Structured Data - Graph Spectral Theory for Convolution on Graphs - Balanced Cut Model for Graph Coarsening - Fast Graph Pooling ! Numerical Experiments ! !! Conclusion! !"#$%&'(&%))*+' /0'
  • 46. Conclusion! •! Proposed contributions:! (1) Generalization of CNNs to non-Euclidean domains/graph-structured data! (2) Localized filters on graphs! (3) Same learning complexity as CNNs while being universal to any graph! (4) GPU implementation ! !"#$%&'(&%))*+' /1' •! Future applications:! (i) Social networks (Facebook, Twitter)! (ii) Biological networks (genes, brain connectivity)! (iii) Communication networks (Internet, wireless, tra"c)! N#9+)&!$-,=#.L/! O.)+$!/,.'9,'.-! Q-&-9#>>'$+9)(#$! $-,=#.L/! •! Paper (accepted at NIPS’16): M. De#errard, X. Bresson, P. Vandergheynst, “Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering”, arXiv: 1606.09375, 2016!