SlideShare une entreprise Scribd logo
1  sur  105
Télécharger pour lire hors ligne
Convolutional Neural Networks
on Graphs
Xavier Bresson
Xavier	Bresson 2
School of Information Systems
Singapore Management University (SMU)
Oct 19th 2017
School of Computer Science and Engineering
Nanyang Technological University
Joint work with M. Defferrard (EPFL), P. Vandergheynst (EPFL),
M. Bronstein (USI), F. Monti (USI), R. Levie (TAU)
Outline
Ø Deep Learning
- A brief history
- High-dimensional learning
Ø Convolutional Neural Networks
- Architecture
- Non-Euclidean Data
Ø Spectral Graph Theory
- Graphs and operators
- Spectral decomposition
- Fourier analysis
- Convolution
- Coarsening
Ø Spectral ConvNets
- SplineNets
- ChebNets* [NIPS’16]
- GraphConvNets
- CayleyNets*
Ø Spectral ConvNets on Multiple Graphs
- Recommender Systems* [NIPS’17]
Ø Conclusion
Xavier	Bresson 3
Outline
Ø Deep Learning
- A brief history
Xavier	Bresson 4
A brief history of artificial neural networks
Xavier	Bresson 5
1958
Perceptron
Rosenblatt
1982
Backprop
Werbos
1959
Visual primary cortex
Hubel-Wiesel
1987
Neocognitron
Fukushima
SVM/Kernel techniques
Vapnik
1995
1997 1998
1999 2006 2012 2014 2015
Data scientist
1st Job in US
Facebook Center
OpenAI Center
Deep Learning
Breakthough
Hinton
GAN, Goodfellow
ResNet, He
Deep RL, Mnih
First NVIDIA
GPU
First Amazon
Cloud Center
RNN/LSTM
Schmidhuber
CNN
LeCun
AI Winter [1960’s-2012]AI Birth AI Renaissance
Hardware
GPU speed 2x/year
Kernel techniques
Graphical models
Wavelets, numerical PDEs
Handcrafted Features
4th industrial revolution?
Digital intelligence
revolution
or new AI bubble?
Big Data
Volume 2x/1.5 year
Google AI
TensorFlow
Facebook AI
Torch
1962
Birth of
Data Science
Split from Statistics
Tukey
First
NIPS
1989
First
KDD
2010
Kaggle
Platform
Breakthrough in image recognition
Xavier	Bresson 6
4.1%Human	accuracy
Learned	features	(end-to-end	systems)Handcraft	features	(SIFT)
Convolutional neural networks: LeNet-5[1]
2 convolutional + 2 fully connected layers
tanh non-linearity
1M parameters
Training set: MNIST 70K images
Trained on CPU
Xavier	Bresson 7
[1] LeCun et al. 1998
1998
Convolutional neural networks: AlexNet[2]
Xavier	Bresson 8
5 convolutional + 3 fully connected layers
ReLU non-linearity
Dropout regularization
60M parameters
Trained set: ImageNet 1.5M images
Trained on GPU
[2] Krizhevsky, Sutskever, Hinton 2012
2012
More learning capacity
More data
More computational power
Outline
Ø Deep Learning
- High-dimensional learning
Xavier	Bresson 9
High-dimensional learning
Xavier	Bresson 10
NNs are powerful to solve high-dimensional
learning problems.
Prototypal learning: Evaluate function f(x)
given n samples {xi,f(xi)}. How? Suppose f
is regular, use interpolation of close points.
f(x)
f(xi)
xi
1-dimensional
learning
Curse of dimensionality
Xavier	Bresson 11
What about in high dimensions? No interpolation possible because
we can never have enough close points to make a good interpolation.
1
· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·
· · · · · · · · · ·· · · · · · · · · ·
0
1
N samples/dim
" = N 1
How many points to cover the 2D plane? ε-2
d-dim cube [0,1]d requires ε-d points.
dim(image) = 512 x 512 ≈ 106
For N=10 samples/dim 101,000,000 points!
Curse of dim:
The number of x of f is exponential.
All samples are far from each other in high-dim[3].
Optimization: Also need ε-d points to find solution.
Nsamples/dim
[3] Beyer 1998
min
f
L(f) = E[`(f(xi), yi)]
Blessing of structure
Xavier	Bresson 12
In our universe:
Data have structures, they concentrate in (non-)smooth low-dim spaces.
These structures can be expressed as mathematical properties
(statistics, geometry, groups, etc).
Low-dim space
High-dim space
Data
Learning exponential functions
Neural networks can learn parametric exponential functions with linear
number of parameters.
Xavier	Bresson 13
A very large number of parameters can be learned by backpropagation,
order of billions.
Huge capacity to learn complex data[4].
n_in = 2 n_out = 3
|W|= 2.3
Parameters
#configurations=
2n_out =23=8
Number of parameters grow linearly,
but function capacity grows exponentially.
[4] Zhang, Bengio, Hardt, Recht, Vinyals, 2016
Universal function approximation[4]
Xavier	Bresson 14
[5] Cybenko 1989, Hornik 1991
Universal Approximation Theorem Let ⇠ be a non-constant, bounded,
and monotonically-increasing continuous activation function, y : [0, 1]p
! R
continuous function, and ✏ > 0. Then, 9n and parameters a, b 2 Rn
, W 2 Rn⇥p
s.t. nX
i=1
ai⇠(w>
i f + bi) y(f) < ✏ 8f 2 [0, 1]p
Shallow neural nets as universal approximators?
Any continuous function can be approximated arbitrarily well by
a neural network with a single hidden layer. Then,
How many neurons?
How long is the optimization?
Does it generalize well/overfit?
Xavier	Bresson 15
Shallow NNs: Data (s.a. image) requires NNs with an exponential
number of neurons for a single layer, optimizes badly, and does
not generalize.
Deep NNs: Multiple layers and data structure can alleviate the
above issues[6].
[6] Montufar, Pascanu, Cho, Bengio 2014
Fully connected networks
Xavier	Bresson 16
Deep neural network consists of multiple layers.
fl = l-th image feature (R,G,B channels), dim(fl) = n ⇥ 1
g
(k)
l = l-th feature map, dim(g
(k)
l ) = n
(k)
l ⇥ 1
Linear layer g
(k)
l = ⇠
qk 1
X
l0=1
W
(k)
l,l0 ⇠
qk 2
X
l0=1
W
(k 1)
l,l0 ⇠ · · · fl0
!!!
Activation, e.g. ⇠(x) = max{x, 0} rectified linear unit (ReLU)
Fully connected networks
Xavier	Bresson 17
FC networks capture non-linear factorizations of data.
They overfit: Too many parameters to learn.
NN architectures require additional data structures to successfully
analyze image, sound, document, etc.
Outline
Ø Convolutional Neural Networks
- Architecture
Xavier	Bresson 18
Convolutional neural networks[1]
Data (image, video, sound) are compositional, i.e. they are formed
of patterns that are:
Ø Local
Ø Stationary
Ø Multi-scale (/hierarchical)
Xavier	Bresson 19
ConvNets leverage the compositionality structure: They extract
compositional features and feed them to classifier, recommender,
etc (end-to-end).
Computer Vision NLP Drug discovery Games
[1] LeCun et al. 1998
Key properties
Locality: Property inspired by visual cortex neurons[7].
Xavier	Bresson 20
Local receptive fields activate in the presence of (learned) local features.
Neocognitron[8]
[7] Cybenko 1989, Hornik 1991, [8] Fukushima 1980
Key properties
Stationarity Translation invariance
Xavier	Bresson 21
Local stationarity Similar patches
shared across the data domain.
Key properties
Multi-scale: Simple structures combine to compose slightly more
abstract structures, and so on, in a hierarchical way.
Xavier	Bresson 22
Also inspired by brain visual primary cortex (V1 and V2 neurons).
Features learned by a ConvNet become increasingly complex at deeper layers.[9]
[9] Zeiler, Fergus 2013
Implementation
Locality: compact support kernels
O(1) parameters per filter.
Xavier	Bresson 23
Stationarity: convolutional operators
O(nlogn) in general (FFT) and
O(n) for compact kernels.
Multi-scale: downsampling +
pooling O(n)
Compositional features
Xavier	Bresson 24
fl = l-th image feature (R,G,B channels), dim(fl) = n ⇥ 1
g
(k)
l = l-th feature map, dim(g
(k)
l ) = n
(k)
l ⇥ 1
Compositional features consist of multiple convolutional + pooling layers.
Linear layer g
(k)
l = ⇠
qk 1
X
l0=1
W
(k)
l,l0 ? ⇠
qk 2
X
l0=1
W
(k 1)
l,l0 ? ⇠ · · · fl0
!!!
Activation, e.g. ⇠(x) = max{x, 0} rectified linear unit (ReLU)
Pooling g
(k)
l (x) = kg
(k 1)
l (x0
) : x0
2 N(x)kp p = 1, 2, or 1
ConvNets[1]
Xavier	Bresson 25
Filters localized in space (Locality)
Convolutional filters (Stationarity)
Multiple layers (Multi-scale)
O(1) parameters per filter (independent of input image size n)
O(n) complexity per layer (filtering done in the spatial domain)
[1] LeCun et al. 1998
ConvNets in computer vision
Xavier	Bresson 26
Image/object recognition
[Krizhevsky-Sutskever-Hinton’12]
Image inpainting
[Pathak-Efros-etal’16]
Image captioning
[Karpathy-Li’15]
Self-driving cars
Highly successful - ConvNets is the Swiss knife of Computer Vision!
Outline
Ø Convolutional Neural Networks
- Non-Euclidean Data
Xavier	Bresson 27
Data domain for ConvNets
Xavier	Bresson 28
Image, volume, video: 2D,
3D, 2D+1 Euclidean
domains
Sentence, word, character,
sound: 1D Euclidean domain
These domains have nice regular spatial structures.
All CNN operations are math well defined and fast (convolution, pooling).
2D grids
1D grid
Non-Euclidean data
Also NLP, physics, social science, communication networks, etc.
Xavier	Bresson 29
Graphs/
Networks
=
Challenges of graph deep learning
Extend neural network techniques to graph-structured data.
Assumption: Non-Euclidean data are locally stationary and manifest
hierarchical structures.
How to define compositionality on graphs? (convolution and pooling
on graphs)
How to make them fast? (linear complexity)
Xavier	Bresson 30
Outline
Ø Spectral Graph Theory
- Graphs and operators
Xavier	Bresson 31
Graphs[10]
Xavier	Bresson 32
Graph
Vertices
Edges
Vertex weights
Edge weights
Vertex fields
Represented as
Hilbert space with inner product
G = (V, E)
V = {1, . . . , n}
E ✓ V ⇥ V
f = (f1, . . . , fn)
hf, giL2(V) =
P
i2V aifigi
L2
(V) = {f : V ! Rh
}
[10] Chung 1994
bi > 0 for i 2 V
aij 0 for (i, j) 2 E
Calculus on graphs
Xavier	Bresson 33
Gradient operator
Divergence operator
adjoint to the gradient operator
r : L2
(V) ! L2
(E)
div : L2
(E) ! L2
(V)
hF, rfiL2(E) = hr⇤
F, fiL2(V) = h divF, fiL2(V)
(rf)ij =
p
aij(fi fj)
(divF)i =
1
bi
X
j:(i,j)2E
p
aij(Fij Fji)
Graph Laplacian
Xavier	Bresson 34
Laplacian operator
difference between f and its local average (2nd
derivative on graphs)
Core operator in spectral graph theory.
Represented as a positive semi-definite n × n matrix
Unnormalized Laplacian
Normalized Laplacian
Random walk Laplacian
: L2
(V) ! L2
(V)
( f)i =
1
bi
X
j:(i,j)2E
aij(fi fj)
where A = (aij) and D = diag(
P
j6=i aij)
= D A
= I D 1
A
= I D 1/2
AD 1/2
Outline
Ø Spectral Graph Theory
- Spectral decomposition
Xavier	Bresson 35
Spectral decomposition
Xavier	Bresson 36
A Laplacian of a graph of n vertices admits n eigenvectors
Eigenvectors are real and orthonormal
(due to self-adjointness)
Eigenvalues are non-negative
(due to positive-semidefiniteness)
Eigendecomposition of a graph Laplacian
k = k k, k = 1, 2, . . .
h k, k0 iL2(V) = kk0
0 = 1  2  . . .  n
where = ( 1, . . . , n) and ⇤ = diag( 1, . . . , n)
= T
⇤
Interpretation
Xavier	Bresson 37
Find the smoothest orthonormal basis on a graph
where EDir is the Dirichlet energy = measure of smoothness of a function
Solution: first n Laplacian eigenvectors
= ( 1, . . . , n)
min
k
EDir( k) s.t. k kk = 1, k = 2, 3, . . . n
k ? span{ 1, . . . , k 1}
EDir(f) = f>
f
min
2Rn⇥n
trace( >
)
| {z }
k kG Dirichlet norm
s.t. >
= I
Laplacian eigenvectors
Xavier	Bresson 38
Euclidean domain:
Graph domain:
Lap eigenvectors related to graph geometry
(s.a. communities, hubs, etc), spectral clustering[10]
[10] Von Luxburg 2007
Outline
Ø Spectral Graph Theory
- Fourier analysis
Xavier	Bresson 39
Fourier analysis on Euclidean spaces
Xavier	Bresson 40
A function can be written as Fourier series
Fourier basis = Laplace-Beltrami eigenfunctions:
f : [ ⇡, ⇡] ! R
f(x) =
X
k 0
1
2⇡
Z ⇡
⇡
f(x0
)e ikx0
dx0
| {z }
ˆfk=hf,e ikxiL2([ ⇡,⇡])
e ikx
k = Fourier mode
k = frequency of Fourier mode
e ikx
k = k2
k
f
Fourier analysis on graphs[11]
Xavier	Bresson 41
[11] Hammond, Vandergheynst, Gribonval, 2011
A function can be written as Fourier series
is the k-th graph Fourier coefficient.
In matrix-vector notation, with the n x n Fourier matrix
Graph Fourier basis = Laplacian eigenvectors :
ˆfk
fi =
nX
k=1
hf, kiL2(V)
| {z }
ˆfk
k,i
=
⇥
1, ..., n
⇤
ˆf = >
f and f = ˆf
k
k = k k
k = graph Fourier mode
k = (square) frequency
f : V ! R
Fourier
transform
Inverse Fourier
transform
Outline
Ø Spectral Graph Theory
- Convolution
Xavier	Bresson 42
Convolution on Euclidean spaces
Xavier	Bresson 43
Given two functions their convolution is a function
Shift-invariance:
Convolution operator commutes with Laplacian:
Convolution theorem: Convolution can be computed in the
Fourier domain as
Efficient computation using FFT: O(nlogn)
f, g : [ ⇡, ⇡] ! R
(f ? g)(x) =
Z ⇡
⇡
f(x0
)g(x x0
)dx0
f(x x0) ? g(x) = (f ? g)(x x0)
( f) ? g = (f ? g)
(f ? g) = ˆf · ˆg
Convolution on discrete Euclidean spaces
Xavier	Bresson 44
Convolution of two vectors f = (f1, . . . , fn)>
and g = (g1, . . . , gn)>
(f ? g)i =
X
m
g(i m) mod n · fm
f ? g =
2
6
6
6
6
6
4
g1 g2 . . . . . . gn
gn g1 g2 . . . gn 1
...
...
...
...
...
g3 g4 . . . g1 g2
g2 g3 . . . . . . g1
3
7
7
7
7
7
5
| {z }
Circulant matrix
diagonalised by Fourier basis (Toeplitz)
2
6
4
f1
...
fn
3
7
5
=
2
6
4
ˆg1
...
ˆgn
3
7
5
>
f = >
g >
f
f ? g = ( >
g >
f)
= diag(ˆg1, . . . , ˆgn) >
| {z }
G
f
= ˆg(⇤) >
f = ˆg( ⇤ >
)f = ˆg( )f
Convolution on graphs[11]
Xavier	Bresson 45
Spectral convolution of can be defined by analogy
In matrix-vector notation
Not shift-invariant! (G has no circulant structure)
Commutes with Laplacian:
Filter coefficients depend on basis
Expensive computation (no FFT): O(n2)
(f ? g)i =
X
k 1
hf, kiL2(V)hg, kiL2(V)
| {z }
product in the Fourier domain
k,i
| {z }
inverse Fourier transform
f, g 2 L2
(V)
G f = Gf
1, . . . , n
[11] Hammond, Vandergheynst, Gribonval, 2011
(f) Ti00 f(e) Ti0 f(d) Tif
No shift invariance on graphs
Xavier	Bresson 46
A signal f on graph can be translated to vertex i:
Euclidean domain
Graph domain[12]
[12] Shuman et al. 2016
Tif = f ? i
(a) Tsf (b) Ts0 f (c) Ts00 f
Outline
Ø Spectral Graph Theory
- Coarsening
Xavier	Bresson 47
Graph coarsening for graph pooling
Goals:
Pool similar local features (max pooling).
Series of pooling layers create invariance to global geometric
deformations.
Challenges:
Design a multi-scale coarsening algorithm that preserves non-
linear graph structures.
How to make graph pooling fast?
Xavier	Bresson 48
Graph coarsening = graph partitioning
Xavier	Bresson 49
Graph partitioning decomposes G into smaller meaningful clusters.
NP-hard Approximation
Powerful combinatorial graph partitioning models:
Normalized cut[13]
Normalized association
Both models are equivalent, but lead to different algorithms.
Balanced cuts
Xavier	Bresson 50
Partitioning by min edge cuts.
Ck Cc
k
Partitioning by max vertex matching.
Ck
[13] Shi, Malik, 2000
max
C1,...,CK
KX
k=1
Assoc(Ck)
Vol(Ck)
min
C1,...,CK
KX
k=1
Cut(Ck, Cc
k)
Vol(Ck)
where Cut(A, B) :=
P
i2A,j2B aij, Assoc(A) :=
P
i2A,i2B aij,
Vol(A) :=
P
i2A,j2B di, and di :=
P
j2V aij.
Balanced cuts
Xavier	Bresson 51
Balanced cuts are NP-hard most popular approximation
techniques focus on linear spectral relaxation (eigenproblem
with global solution).
Graph geometry are generally not linear Graclus[14] algorithm
computes non-linear clusters that locally maximize the
Normalized Association.
Graclus algorithm offers a control of the coarsening ratio of ≈	2,
like image grid using heavy-edge matching[15].
[14] Dhillon, Guan, Kulis 2007
[15] Karypis, Kumar 1995
Heavy-Edge Matching (HEM)
Xavier	Bresson 52
HEM proceeds by two successive steps, vertex matching and
graph coarsening (that guarantees a local solution of Norm
assoc):
Matched vertices { i, j} are
merged into a super-vertex
at the next coarsening level.
… …
Gl 1
…
…
…
Gl
Gl+1
Gl+2
Graph coarsening/
clustering
…
…
i
j
(1) Vertex matching:
n
i, j = argmax
j
al
ii + 2al
ij + al
jj
dl
i + dl
j
o
(2): Gl+1
=
⇢
al+1
ij = Cut(Cl
i, Cl
j)
al+1
ii = Assoc(Cl
i)
Unstructured pooling
Sequence of coarsened graphs produced by HEM:
Xavier	Bresson 53
Stores a table of indices for graph and all its coarsened versions
Computationally inefficient
Fast graph pooling[18]
Structured pooling: Arrangement of the node indexing such that
adjacent nodes are hierarchically merged at the next coarser level.
Xavier	Bresson 54
As efficient as 1D-Euclidean grid pooling.
[18] Defferrard, Bresson, Vandergheynst 2016
Outline
Ø Spectral ConvNets
- SplineNets
Xavier	Bresson 55
Vanilla spectral graph ConvNets[16]
Graph convolutional layer
Xavier	Bresson 56
[16] Bruna, Zaremba, Szlam, LeCun 2014; Henaff, Bruna, LeCun 2015
fl = l-th data feature on graphs, dim(fl) = n ⇥ 1
gl = l-th feature map, dim(gl) = n ⇥ 1
Conv. layer gl = ⇠
pX
l0=1
Wl,l0 ? fl0
!
Activation, e.g. ⇠(x) = max{x, 0} rectified linear unit (ReLU)
Spectral graph convolution
Xavier	Bresson 57
Convolutional layer in the spatial domain:
where = matrix of graph spatial filter,
can also be expressed in the spectral domain
where = n x n diagonal matrix of graph spectral filter.
We will denote the spectral filter without the hat symbol, i.e.
gl = ⇠
pX
l0=1
Wl,l0 ? fl0
!
,
Wl,l0
(using g ? f = ˆg(⇤) >
f)
ˆWl,l0
Wl,l0
gl = ⇠
pX
l0=1
ˆWl,l0
>
fl0
!
,
Vanilla spectral graph ConvNets[16]
Xavier	Bresson 58
Series of spectral convolutional layers
with spectral coefficients to be learned at each layer.
g
(k)
l = ⇠
0
@
q(k 1)
X
l0=1
W
(k)
l,l0
>
g
(k 1)
l0
1
A ,
W
(k)
l,l0
First Graph CNN architecture
No guarantee of spatial localization of filters
O(n) parameters per layer
O(n2) computation of forward and inverse Fourier transforms
φ,	φT (no FFT on graphs)
Filters are basis-dependent does not generalize across graphs
[16] Bruna, Zaremba, Szlam, LeCun 2014; Henaff, Bruna, LeCun 2015
Spatial localization and spectral smoothness
Xavier	Bresson 59
In the Euclidean setting (by Parseval's identity)
Localization in space = smoothness in frequency domain
Smooth spectral filter function
,
Spatial localization
Z +1
1
|x|2k
|w(x)|2
dx =
Z +1
1
@k
ˆw( )
@ k
2
d
w(x) ˆw( )
Smooth parametric spectral filter
Xavier	Bresson 60
[17] (Litman, Bronstein, 2014); Henaff, Bruna, LeCun 2015
Parametrize the smooth spectral filter function with a linear
combination of smooth kernel functions , e.g. splines[17]
where is the vector of filter parameters
1( ), . . . , r( )
↵ = (↵1, . . . , ↵r)>)
) W = Diag(B↵)
w↵( ) =
rX
j=1
↵j j( )
w↵( i) =
rX
j=1
↵j j( i) = (B↵)i
w( )
w( )
Smooth parametric spectral filter
Xavier	Bresson 61
Graph convolutional layer: Apply the spectral filter to a feature signal f
Application of the spectral filter to a feature signal with
learnable parameters
f
↵
w( )f = w(⇤) >
f
=
0
B
@
w( 1)
...
w( n)
1
C
A
>
f
w↵( )f =
0
B
@
w↵( 1)
...
w↵( n)
1
C
A
>
f
w
SplineNets[17]
Xavier	Bresson 62
Series of spectral convolutional layers
with smooth spectral parametric coefficients to be learned at each layer.
g
(k)
l = ⇠
0
@
q(k 1)
X
l0=1
W
(k)
l,l0
>
g
(k 1)
l0
1
A ,
W
(k)
l,l0
Fast-decaying filters in space
O(n) parameters per layer
O(n2) computation of forward and inverse Fourier transforms
φ,	φT (no FFT on graphs)
Filters are basis-dependent does not generalize across graphs
[17] Henaff, Bruna, LeCun 2015
Outline
Ø Spectral ConvNets
- ChebNets* [NIPS’16]
Xavier	Bresson 63
Spectral polynomial filters[18]
Xavier	Bresson 64
Represent smooth spectral functions with polynomials of Laplacian eigenvalues
where is the vector of filter parameters.
Convolutional layer: Apply spectral filter to feature signal f
w↵( ) =
rX
j=0
↵j
j
↵ = (↵1, . . . , ↵r)>
w↵( )f =
Pr
j=0 ↵j
j
f
Key observation: Each Laplacian
operation increases the support of a
function by 1-hop Exact control
the size of Laplacian-based filters.
s=heat
source
1-hop
2-hop· · ·| {z }
j
[18] Defferrard, Bresson, Vandergheynst 2016
Linear complexity
Xavier	Bresson 65
Application of the filter to a feature signal f
Denote and define and the sequence
Two important observations:
1. *No* need to compute the eigendecomposition of the Laplacian (φ,Λ).
2. Observe that {Xj} are generated by multiplication of a sparse
matrix and a vector Complexity is O(Er)=O(n) for sparse graphs.
Graph convolutional layers are GPU friendly.
w↵( )f =
Pr
j=0 ↵j
j
f
Xj = Xj 1X0 = f
w↵( )f =
Pr
j=0 ↵jXj
X1 = X0 = f
Spectral graph ConvNets with polynomial filters
Xavier	Bresson 66
Series of spectral convolutional layers
with spectral polynomial coefficients to be learned at each layer.
g
(k)
l = ⇠
0
@
q(k 1)
X
l0=1
W
(k)
l,l0
>
g
(k 1)
l0
1
A ,
W
(k)
l,l0
Filters are exactly localized in r-hops support
O(1) parameters per layer
No computation of φ,	φT O(n) computational complexity
(assuming sparsely-connected graphs)
Unstable under coefficients perturbation (hard to optimize)
Filters are basis-dependent does not generalize across graphs
[18] Defferrard, Bresson, Vandergheynst 2016
Chebyshev polynomials
Xavier	Bresson 67
Graph convolution with (non-orthogonal) monomial basis
Graph convolution with (orthogonal) Chebyshev polynomials
w↵( )f =
Pr
j=0 ↵j
j
f
1, x, x2
, x3
, · · ·
Orthonormal on
Stable under perturbation of
coefficients
L2
([ 1, +1]) w.r.t. hf, gi =
R +1
1
f(˜)g(˜) d˜
p
1 ˜2
T0, T1, T2, T3
w↵( ˜ )f =
Pr
j=0 ↵jTj( ˜ )f
ChebNets[18]
Xavier	Bresson 68
[18] Defferrard, Bresson, Vandergheynst 2016
Application of the filter with the scaled Laplacian
with
˜ = 2 1
n I
X(j)
= Tj( ˜ )f
= 2 ˜ X(j 1)
X(j 2)
, X(0)
= f, X(1)
= ˜ f
Filters are exactly localized in r-hops support
O(1) parameters per layer
No computation of φ,	φT O(n) computational complexity
(assuming sparsely-connected graphs)
Stable under coefficients perturbation
Filters are basis-dependent does not generalize across graphs
w↵( ˜ )f =
Pr
j=0 ↵jTj( ˜ )f =
Pr
j=0 ↵jX(j)
Numerical experiments
Xavier	Bresson 69
Running time Accuracy
Optimization
Graph: a 8-NN
graph of the
Euclidean grid
i
j
Outline
Ø Spectral ConvNets
- GraphConvNets
Xavier	Bresson 70
Graph convolutional nets[19]: simplified ChebNets
Xavier	Bresson 71
[19] Kipf, Welling 2016
Use Chebychev polynomials of degree r=2 and assume
Further constrain to obtain a single-parameter filter
Caveat: The eigenvalues of are now in [0,2]
repeated application of the filter results in numerical instability
Fix: Apply a renormalization
with
n ⇡ 2
↵ = ↵0 = ↵1
w↵( )f = ↵0f + ↵1( I)f
= ↵0f ↵1D 1/2
WD 1/2
f
w↵( )f = ↵(I + D 1/2
WD 1/2
)f
I + D 1/2
WD 1/2
w↵( )f = ↵ ˜D 1/2 ˜W ˜D 1/2
f
˜W = W + I and ˜D = diag(
P
j6=i ˜wij)
Example: citation networks
Xavier	Bresson 72
Outline
Ø Spectral ConvNets
- CayleyNets*
Xavier	Bresson 73
Community graphs
Xavier	Bresson 74
Synthetic graph with 15 communities Normalized Laplacian eigenvalues
Chebyshev filters learned on the 15-communities graph
ChebNet is unstable to produce filters with frequency bands of interest
Spectral graph ConvNets with Cayley polynomials[20]
Xavier	Bresson 75
Cayley polynomials are a family of real-valued rational functions
(w/ complex coefficients)
Cayley transform
Expressivity:
we can rewrite as a real trigonometric polynomial w.r.t.
p( ) = c0 + 2Re
n rX
j=1
cj
⇣ i
+ i
| {z }
C( )
⌘jo
p( ) C( )
p( ) = c0 +
rX
j=1
cj Cj
( )
| {z }
eij!
+ ¯cj C j
( )
| {z }
e ij!
= 2
rX
j=1
Re{cj} cos(j! ) Im{cj} sin(j! )
[20] (Cayley 1846); Levie, Monti, Bresson, Bronstein 2017
C( ) = i
+i is a smooth bijection from R to eiR
 {1}
Since z 1
= ¯z for z 2 eiR
and 2Re{z} = z + ¯z
Spectral zoom[20]
Xavier	Bresson 76
Applying Cayley transform to the scaled Laplacian
results in a non-linear transformation of the eigenvalues
(spectral zoom)
C(h ) = (h iI)(h + iI) 1
h
[20] Levie, Monti, Bresson, Bronstein 2017
Cayley transform of the 15-communities graph Laplacian spectrumC(h )
Laplacian spectrum
h = 0.1 h = 1 h = 10
Fast inversion
Xavier	Bresson 77
Application of Cayley filter requires the solution of
recursive linear system
Exact (stable) solution: O(n3) complexity
Approximate solution using K Jacobi iterations
with
Application of the approximate Cayley filter
w(h )f
y0 = f
(h + iI)yj = (h iI)yj 1 j = 1, . . . , r
˜y
(k+1)
j = J˜y
(k)
j + Diag 1
(h + iI)(h iI)˜yj 1
˜y
(0)
j = Diag 1
(h + iI)(h iI)˜yj 1
J = Diag 1
(h + iI)O↵(h + iI)
rX
j=0
cj ˜yj ⇡ ⌧(h )f
Complexity: O(ErK) = O(n)
CayleyNets[30]
Xavier	Bresson 78
[20] Levie, Monti, Bresson, Bronstein 2017
Represent spectral transfer function as a Cayley polynomial of order r
where the filter parameters are the vector of complex coefficients
and the spectral zoom h.
wc,h( ) = c0 + 2Re
n rX
j=1
cj(h i)j
(h + i) j
o
c = (c0, . . . , cr)>
Filters have guaranteed exponential spatial decay
O(1) parameters per layer
O(n) computational complexity with Jacobi approximate
inversion (assuming sparsely-connected graph)
Spectral zoom property allowing to better localize in frequency
Richer class of filters than Chebyshev for the same order
Filters are basis-dependent does not generalize across graphs
Chebyshev vs Cayley
Xavier	Bresson 79
Chebyshev vs Cayley: community graphs example
Xavier	Bresson 80
Chebyshev vs Cayley: Cora example
Xavier	Bresson 81
Accuracy of ChebNet (blue) and CayleyNet (orange)
on Cora dataset
Approximate inversion
Xavier	Bresson 82
Accuracy of
approximate inversion
Speed of approximate
inversion
Outline
Ø Spectral ConvNets on Multiple Graphs
- Recommender Systems* [NIPS’17]
Xavier	Bresson 83
Matrix completion - Netflix challenge
Xavier	Bresson 84
minX2Rm⇥n rank(X) s.t. xij = aij 8ij 2 ⌦
minX2Rm⇥n kXk⇤ s.t. xij = aij 8ij 2 ⌦
minX2Rm⇥n kXk⇤ + µk⌦ (X A)k2
F
[21] Candes et al. 2008
Convex relaxation[21]
)
Matrix completion on single graph[22]
Xavier	Bresson 85
minX2Rm⇥n kXk⇤ + µk⌦ (X A)k2
F + µc tr(X cX>
)
| {z }
kXk2
Gc
[22] Kalofolias, Bresson, Bronstein, Vandergheynst, 2014
Matrix completion on multiple graphs[22]
Xavier	Bresson 86
[22] Kalofolias, Bresson, Bronstein, Vandergheynst, 2014
minX2Rm⇥n kXk⇤ + µk⌦ (X A)k2
F + µc tr(X cX>
)
| {z }
kXk2
Gc
+µr tr(X>
rX)
| {z }
kXk2
Gr
Factorized matrix completion models[23]
Xavier	Bresson 87
min
W2Rm⇥s
H2Rn⇥s
µk⌦ (X A)k2
F + µckWk2
F + µrkHk2
F
O(n+m) variables instead of O(nm)
rank(X) = rank(WHT) ≤ s by construction
[23] Srebro, Rennie, Jaakkola 2004
Factorized matrix completion on graphs[24]
Xavier	Bresson 88
O(n+m) variables instead of O(nm)
rank(X) = rank(WHT) ≤ s by construction
[24] Rao et al. 2015
min
W2Rm⇥s
H2Rn⇥s
µk⌦ (X A)k2
F + µctr(H>
cH) + µrtr(W>
rW)
1D Fourier transform
Xavier	Bresson 89
Column-wise transform
2D Fourier transform
Xavier	Bresson 90
Column-wise transform + Row-wise transform = 2D transform
From 1D signal processing to 2D image processing
Multi-graph Fourier transform and convolution[25]
Xavier	Bresson 91
Multi-graph Fourier transform
where are the eigenvectors of the column- and row-
graph Laplacians , respectively.
Multi-graph spectral convolution
ˆX = >
r X c
c, r
c, r
X ? Y = r(( >
r X c) ( >
r Y c)) >
c
= r( ˆX ˆY) >
c
[25] Monti, Bresson, Bronstein 2017
Parametrize the multi-graph filter as a smooth function of column-
and row frequencies w/ e.g. bivariate Chebyshev polynomial
where is matrix of coefficients and
Multi-graph convolutional layer: Apply filter to matrix X
where the scaled Laplacians
Multi-graph spectral filters
Xavier	Bresson 92
w⇥(˜c, ˜r) =
rX
j,j0=1
✓jj0 Tj(˜c)Tj0 (˜r)
⇥ = (✓jj0 ) 1  ˜c, ˜r  1
Y =
rX
j,j0=1
✓jj0 Tj( ˜ c)XTj0 ( ˜ r)
˜ c = 2 1
c,n c I and ˜ r = 2 1
r,m r I
Multi-graph ConvNets[25]
Xavier	Bresson 93
[25] Monti, Bresson, Bronstein 2017
Multi-graph spectral filters
applied to p input channels (m x n matrices ) and
producing q output channels
Y
(k)
l = ⇠
0
@
pX
l0=1
rX
j,j0=0
✓jj0ll0 Tj( ˜ r)Y
(k 1)
l0 Tj0 ( ˜ c)
1
A l = 1, . . . , q
l0
= 1, . . . , p
Y
(k 1)
1 , . . . , Y
(k 1)
p
Y
(k)
1 , . . . , Y
(k)
q
Filters have guaranteed r-hops support on both graphs
O(1) parameters per layer
O(nm) computational complexity (assuming sparse graphs)
Factorized multi-graph ConvNets
Xavier	Bresson 94
Two spectral convolutional layers applied to each of the factors W, H
ul = ⇠
0
@
pX
l0=1
rX
j=0
✓r,jll0 Tj( ˜ r)wl0
1
A l = 1, . . . , q
l0
= 1, . . . , p
vl = ⇠
0
@
pX
l0=1
rX
j0=0
✓c,j0ll0 Tj0 ( ˜ c)hl0
1
A
Filters have guaranteed r-hops support on both graphs
O(1) parameters per layer
O(n+m) computational complexity (assuming sparse graphs)
May loose some coupling information
Matrix completion with Recurrent Multi-Graph ConvNets[25]
Xavier	Bresson 95
Complexity: O(nm)
Extract
compositional
features from
graphs of users
and items
[25] Monti, Bresson, Bronstein 2017
Factorized version
Xavier	Bresson 96
Complexity: O(n+m)
[25] Monti, Bresson, Bronstein 2017
Incremental updates with RNNs
The RNN/LSTM[26] cast the completion problem as non-linear
diffusion process.
Replace multi-layer CNNs by simple RNN + diffusion
much less parameters to learn (reduce overfitting).
more favorable for highly sparse entries (better info diffusion).
Fix #parameters of the model independently of entry sparsity.
Allows to learn temporal dynamics of entries if such information is
available.
Xavier	Bresson 97
dX(t)RNN
X(t + 1) = X(t) + dX(t)
X(t)
[26] Hochreiter, Schmidhuber 1997
Matrix completion methods comparison: synthetic
Xavier	Bresson 98
Matrix completion methods comparison: real
Xavier	Bresson 99
Outline
Ø Conclusion
Xavier	Bresson 100
Related works
Xavier	Bresson 101
Classes of graph ConvNets:
(1) Spectral approach
(2) Spatial approach
Spatial approaches
Ø Geodesic CNN: Masci, Boscaini, Bronstein, Vandergheynst 2015
Ø Anisotropic CNN: Boscaini, Masci, Rodola, Bronstein 2016
Ø MoNet: Monti, Boscaini, Masci, Rodola, Svoboda, Bronstein 2017
Graph Neural Networks:
Ø Gori, Monfardini, Scarselli 2005
Ø Li, Tarlow, Brockschmidt, Zemel 2015
Ø Sukhbaatar, Szlam, Fergus 2016
Several other works
Conclusion
Contributions:
Generalization of ConvNets to data on graphs (fixed)
Exact/tight localized filters on graphs
Linear complexity for sparse graphs
GPU implementation
Rich expressivity
Multiple graphs
Several potential applications in network sciences.
Xavier	Bresson 102
Papers & code
Papers
Convolutional neural networks on graphs with fast localized spectral
filtering, M Defferrard, X Bresson, P Vandergheynst, NIPS, 2016,
arXiv:1606.09375
Geometric matrix completion with recurrent multi-graph neural
networks, F Monti, MM Bronstein, X Bresson, NIPS, 2017,
arXiv:1704.06803
CayleyNets: Graph Convolutional Neural Networks with Complex
Rational Spectral Filters, R Levie, F Monti, X Bresson, MM Bronstein,
2017, arXiv:1705.07664
Xavier	Bresson 103
Code
https://github.com/xbresson/
graph_convnets_pytorch
Xavier	Bresson 104
Tutorials
Computer Vision and Pattern Recognition (CVPR)
Geometric deep learning on graphs and manifolds
M. Bronstein​​, J. Bruna​, A. Szlam, X. Bresson, Y. LeCun
July 21, 2017
Slides: https://www.dropbox.com/s/appafg1fumb6u7f/tutorial_CVPR17_DL_graphs.pdf?dl=0
Video will be posted on YouTube/ComputerVisionFoundation Videos
Neural Information Processing Systems (NIPS)
Geometric deep learning on graphs and manifolds
M. Bronstein​​, J. Bruna​, A. Szlam, X. Bresson, Y. LeCun
Dec 4, 2017
Xavier	Bresson 105
Workshop
New Deep Learning Techniques
Feb 5-9, 2018
Organizers
Xavier Bresson, NTU
Michael Bronstein, USI/Intel
Joan Bruna, NYU/Berkeley
Yann LeCun, NYU/Facebook
Stanley Osher, UCLA
Arthur Szlam, Facebook
http://www.ipam.ucla.edu/programs/work
shops/new-deep-learning-techniques
Thank you
Xavier	Bresson 106
http://www.ntu.edu.sg/home/xbresson
https://github.com/xbresson
https://twitter.com/xbresson

Contenu connexe

Tendances

今さら聞けないカーネル法とサポートベクターマシン
今さら聞けないカーネル法とサポートベクターマシン今さら聞けないカーネル法とサポートベクターマシン
今さら聞けないカーネル法とサポートベクターマシン
Shinya Shimizu
 

Tendances (20)

第52回SWO研究会チュートリアル資料
第52回SWO研究会チュートリアル資料第52回SWO研究会チュートリアル資料
第52回SWO研究会チュートリアル資料
 
PRML 5.5.6-5.6
PRML 5.5.6-5.6PRML 5.5.6-5.6
PRML 5.5.6-5.6
 
深層学習(講談社)のまとめ(1章~2章)
深層学習(講談社)のまとめ(1章~2章)深層学習(講談社)のまとめ(1章~2章)
深層学習(講談社)のまとめ(1章~2章)
 
High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文
High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文
High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文
 
PRML11章
PRML11章PRML11章
PRML11章
 
[DL輪読会]SOM-VAE: Interpretable Discrete Representation Learning on Time Series
[DL輪読会]SOM-VAE: Interpretable Discrete Representation Learning on Time Series[DL輪読会]SOM-VAE: Interpretable Discrete Representation Learning on Time Series
[DL輪読会]SOM-VAE: Interpretable Discrete Representation Learning on Time Series
 
今さら聞けないカーネル法とサポートベクターマシン
今さら聞けないカーネル法とサポートベクターマシン今さら聞けないカーネル法とサポートベクターマシン
今さら聞けないカーネル法とサポートベクターマシン
 
Graph Neural Networks
Graph Neural NetworksGraph Neural Networks
Graph Neural Networks
 
20190706cvpr2019_3d_shape_representation
20190706cvpr2019_3d_shape_representation20190706cvpr2019_3d_shape_representation
20190706cvpr2019_3d_shape_representation
 
Prml 最尤推定からベイズ曲線フィッティング
Prml 最尤推定からベイズ曲線フィッティングPrml 最尤推定からベイズ曲線フィッティング
Prml 最尤推定からベイズ曲線フィッティング
 
情報統計力学のすすめ
情報統計力学のすすめ情報統計力学のすすめ
情報統計力学のすすめ
 
機械学習による統計的実験計画(ベイズ最適化を中心に)
機械学習による統計的実験計画(ベイズ最適化を中心に)機械学習による統計的実験計画(ベイズ最適化を中心に)
機械学習による統計的実験計画(ベイズ最適化を中心に)
 
【DL輪読会】Emergence of maps in the memories of blind navigation agents
【DL輪読会】Emergence of maps in the memories of blind navigation agents【DL輪読会】Emergence of maps in the memories of blind navigation agents
【DL輪読会】Emergence of maps in the memories of blind navigation agents
 
三次元表現まとめ(深層学習を中心に)
三次元表現まとめ(深層学習を中心に)三次元表現まとめ(深層学習を中心に)
三次元表現まとめ(深層学習を中心に)
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
Jokyokai
JokyokaiJokyokai
Jokyokai
 
PRMLの線形回帰モデル(線形基底関数モデル)
PRMLの線形回帰モデル(線形基底関数モデル)PRMLの線形回帰モデル(線形基底関数モデル)
PRMLの線形回帰モデル(線形基底関数モデル)
 
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
Laplacian Pyramid of Generative Adversarial Networks (LAPGAN) - NIPS2015読み会 #...
Laplacian Pyramid of Generative Adversarial Networks (LAPGAN) - NIPS2015読み会 #...Laplacian Pyramid of Generative Adversarial Networks (LAPGAN) - NIPS2015読み会 #...
Laplacian Pyramid of Generative Adversarial Networks (LAPGAN) - NIPS2015読み会 #...
 

Similaire à Spectral convnets

VoxelNet
VoxelNetVoxelNet
VoxelNet
taeseon ryu
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze
 
Dl1 deep learning_algorithms
Dl1 deep learning_algorithmsDl1 deep learning_algorithms
Dl1 deep learning_algorithms
Armando Vieira
 

Similaire à Spectral convnets (20)

Towards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
 
Convolution neural networks
Convolution neural networksConvolution neural networks
Convolution neural networks
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep Learning
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
VoxelNet
VoxelNetVoxelNet
VoxelNet
 
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
 
Mnist report
Mnist reportMnist report
Mnist report
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
CNN Structure: From LeNet to ShuffleNet
CNN Structure: From LeNet to ShuffleNetCNN Structure: From LeNet to ShuffleNet
CNN Structure: From LeNet to ShuffleNet
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Dl1 deep learning_algorithms
Dl1 deep learning_algorithmsDl1 deep learning_algorithms
Dl1 deep learning_algorithms
 
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
 

Dernier

Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 

Dernier (20)

Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 

Spectral convnets

  • 1. Convolutional Neural Networks on Graphs Xavier Bresson Xavier Bresson 2 School of Information Systems Singapore Management University (SMU) Oct 19th 2017 School of Computer Science and Engineering Nanyang Technological University Joint work with M. Defferrard (EPFL), P. Vandergheynst (EPFL), M. Bronstein (USI), F. Monti (USI), R. Levie (TAU)
  • 2. Outline Ø Deep Learning - A brief history - High-dimensional learning Ø Convolutional Neural Networks - Architecture - Non-Euclidean Data Ø Spectral Graph Theory - Graphs and operators - Spectral decomposition - Fourier analysis - Convolution - Coarsening Ø Spectral ConvNets - SplineNets - ChebNets* [NIPS’16] - GraphConvNets - CayleyNets* Ø Spectral ConvNets on Multiple Graphs - Recommender Systems* [NIPS’17] Ø Conclusion Xavier Bresson 3
  • 3. Outline Ø Deep Learning - A brief history Xavier Bresson 4
  • 4. A brief history of artificial neural networks Xavier Bresson 5 1958 Perceptron Rosenblatt 1982 Backprop Werbos 1959 Visual primary cortex Hubel-Wiesel 1987 Neocognitron Fukushima SVM/Kernel techniques Vapnik 1995 1997 1998 1999 2006 2012 2014 2015 Data scientist 1st Job in US Facebook Center OpenAI Center Deep Learning Breakthough Hinton GAN, Goodfellow ResNet, He Deep RL, Mnih First NVIDIA GPU First Amazon Cloud Center RNN/LSTM Schmidhuber CNN LeCun AI Winter [1960’s-2012]AI Birth AI Renaissance Hardware GPU speed 2x/year Kernel techniques Graphical models Wavelets, numerical PDEs Handcrafted Features 4th industrial revolution? Digital intelligence revolution or new AI bubble? Big Data Volume 2x/1.5 year Google AI TensorFlow Facebook AI Torch 1962 Birth of Data Science Split from Statistics Tukey First NIPS 1989 First KDD 2010 Kaggle Platform
  • 5. Breakthrough in image recognition Xavier Bresson 6 4.1%Human accuracy Learned features (end-to-end systems)Handcraft features (SIFT)
  • 6. Convolutional neural networks: LeNet-5[1] 2 convolutional + 2 fully connected layers tanh non-linearity 1M parameters Training set: MNIST 70K images Trained on CPU Xavier Bresson 7 [1] LeCun et al. 1998 1998
  • 7. Convolutional neural networks: AlexNet[2] Xavier Bresson 8 5 convolutional + 3 fully connected layers ReLU non-linearity Dropout regularization 60M parameters Trained set: ImageNet 1.5M images Trained on GPU [2] Krizhevsky, Sutskever, Hinton 2012 2012 More learning capacity More data More computational power
  • 8. Outline Ø Deep Learning - High-dimensional learning Xavier Bresson 9
  • 9. High-dimensional learning Xavier Bresson 10 NNs are powerful to solve high-dimensional learning problems. Prototypal learning: Evaluate function f(x) given n samples {xi,f(xi)}. How? Suppose f is regular, use interpolation of close points. f(x) f(xi) xi 1-dimensional learning
  • 10. Curse of dimensionality Xavier Bresson 11 What about in high dimensions? No interpolation possible because we can never have enough close points to make a good interpolation. 1 · · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · · · · · · · · · · · ·· · · · · · · · · · 0 1 N samples/dim " = N 1 How many points to cover the 2D plane? ε-2 d-dim cube [0,1]d requires ε-d points. dim(image) = 512 x 512 ≈ 106 For N=10 samples/dim 101,000,000 points! Curse of dim: The number of x of f is exponential. All samples are far from each other in high-dim[3]. Optimization: Also need ε-d points to find solution. Nsamples/dim [3] Beyer 1998 min f L(f) = E[`(f(xi), yi)]
  • 11. Blessing of structure Xavier Bresson 12 In our universe: Data have structures, they concentrate in (non-)smooth low-dim spaces. These structures can be expressed as mathematical properties (statistics, geometry, groups, etc). Low-dim space High-dim space Data
  • 12. Learning exponential functions Neural networks can learn parametric exponential functions with linear number of parameters. Xavier Bresson 13 A very large number of parameters can be learned by backpropagation, order of billions. Huge capacity to learn complex data[4]. n_in = 2 n_out = 3 |W|= 2.3 Parameters #configurations= 2n_out =23=8 Number of parameters grow linearly, but function capacity grows exponentially. [4] Zhang, Bengio, Hardt, Recht, Vinyals, 2016
  • 13. Universal function approximation[4] Xavier Bresson 14 [5] Cybenko 1989, Hornik 1991 Universal Approximation Theorem Let ⇠ be a non-constant, bounded, and monotonically-increasing continuous activation function, y : [0, 1]p ! R continuous function, and ✏ > 0. Then, 9n and parameters a, b 2 Rn , W 2 Rn⇥p s.t. nX i=1 ai⇠(w> i f + bi) y(f) < ✏ 8f 2 [0, 1]p
  • 14. Shallow neural nets as universal approximators? Any continuous function can be approximated arbitrarily well by a neural network with a single hidden layer. Then, How many neurons? How long is the optimization? Does it generalize well/overfit? Xavier Bresson 15 Shallow NNs: Data (s.a. image) requires NNs with an exponential number of neurons for a single layer, optimizes badly, and does not generalize. Deep NNs: Multiple layers and data structure can alleviate the above issues[6]. [6] Montufar, Pascanu, Cho, Bengio 2014
  • 15. Fully connected networks Xavier Bresson 16 Deep neural network consists of multiple layers. fl = l-th image feature (R,G,B channels), dim(fl) = n ⇥ 1 g (k) l = l-th feature map, dim(g (k) l ) = n (k) l ⇥ 1 Linear layer g (k) l = ⇠ qk 1 X l0=1 W (k) l,l0 ⇠ qk 2 X l0=1 W (k 1) l,l0 ⇠ · · · fl0 !!! Activation, e.g. ⇠(x) = max{x, 0} rectified linear unit (ReLU)
  • 16. Fully connected networks Xavier Bresson 17 FC networks capture non-linear factorizations of data. They overfit: Too many parameters to learn. NN architectures require additional data structures to successfully analyze image, sound, document, etc.
  • 17. Outline Ø Convolutional Neural Networks - Architecture Xavier Bresson 18
  • 18. Convolutional neural networks[1] Data (image, video, sound) are compositional, i.e. they are formed of patterns that are: Ø Local Ø Stationary Ø Multi-scale (/hierarchical) Xavier Bresson 19 ConvNets leverage the compositionality structure: They extract compositional features and feed them to classifier, recommender, etc (end-to-end). Computer Vision NLP Drug discovery Games [1] LeCun et al. 1998
  • 19. Key properties Locality: Property inspired by visual cortex neurons[7]. Xavier Bresson 20 Local receptive fields activate in the presence of (learned) local features. Neocognitron[8] [7] Cybenko 1989, Hornik 1991, [8] Fukushima 1980
  • 20. Key properties Stationarity Translation invariance Xavier Bresson 21 Local stationarity Similar patches shared across the data domain.
  • 21. Key properties Multi-scale: Simple structures combine to compose slightly more abstract structures, and so on, in a hierarchical way. Xavier Bresson 22 Also inspired by brain visual primary cortex (V1 and V2 neurons). Features learned by a ConvNet become increasingly complex at deeper layers.[9] [9] Zeiler, Fergus 2013
  • 22. Implementation Locality: compact support kernels O(1) parameters per filter. Xavier Bresson 23 Stationarity: convolutional operators O(nlogn) in general (FFT) and O(n) for compact kernels. Multi-scale: downsampling + pooling O(n)
  • 23. Compositional features Xavier Bresson 24 fl = l-th image feature (R,G,B channels), dim(fl) = n ⇥ 1 g (k) l = l-th feature map, dim(g (k) l ) = n (k) l ⇥ 1 Compositional features consist of multiple convolutional + pooling layers. Linear layer g (k) l = ⇠ qk 1 X l0=1 W (k) l,l0 ? ⇠ qk 2 X l0=1 W (k 1) l,l0 ? ⇠ · · · fl0 !!! Activation, e.g. ⇠(x) = max{x, 0} rectified linear unit (ReLU) Pooling g (k) l (x) = kg (k 1) l (x0 ) : x0 2 N(x)kp p = 1, 2, or 1
  • 24. ConvNets[1] Xavier Bresson 25 Filters localized in space (Locality) Convolutional filters (Stationarity) Multiple layers (Multi-scale) O(1) parameters per filter (independent of input image size n) O(n) complexity per layer (filtering done in the spatial domain) [1] LeCun et al. 1998
  • 25. ConvNets in computer vision Xavier Bresson 26 Image/object recognition [Krizhevsky-Sutskever-Hinton’12] Image inpainting [Pathak-Efros-etal’16] Image captioning [Karpathy-Li’15] Self-driving cars Highly successful - ConvNets is the Swiss knife of Computer Vision!
  • 26. Outline Ø Convolutional Neural Networks - Non-Euclidean Data Xavier Bresson 27
  • 27. Data domain for ConvNets Xavier Bresson 28 Image, volume, video: 2D, 3D, 2D+1 Euclidean domains Sentence, word, character, sound: 1D Euclidean domain These domains have nice regular spatial structures. All CNN operations are math well defined and fast (convolution, pooling). 2D grids 1D grid
  • 28. Non-Euclidean data Also NLP, physics, social science, communication networks, etc. Xavier Bresson 29 Graphs/ Networks =
  • 29. Challenges of graph deep learning Extend neural network techniques to graph-structured data. Assumption: Non-Euclidean data are locally stationary and manifest hierarchical structures. How to define compositionality on graphs? (convolution and pooling on graphs) How to make them fast? (linear complexity) Xavier Bresson 30
  • 30. Outline Ø Spectral Graph Theory - Graphs and operators Xavier Bresson 31
  • 31. Graphs[10] Xavier Bresson 32 Graph Vertices Edges Vertex weights Edge weights Vertex fields Represented as Hilbert space with inner product G = (V, E) V = {1, . . . , n} E ✓ V ⇥ V f = (f1, . . . , fn) hf, giL2(V) = P i2V aifigi L2 (V) = {f : V ! Rh } [10] Chung 1994 bi > 0 for i 2 V aij 0 for (i, j) 2 E
  • 32. Calculus on graphs Xavier Bresson 33 Gradient operator Divergence operator adjoint to the gradient operator r : L2 (V) ! L2 (E) div : L2 (E) ! L2 (V) hF, rfiL2(E) = hr⇤ F, fiL2(V) = h divF, fiL2(V) (rf)ij = p aij(fi fj) (divF)i = 1 bi X j:(i,j)2E p aij(Fij Fji)
  • 33. Graph Laplacian Xavier Bresson 34 Laplacian operator difference between f and its local average (2nd derivative on graphs) Core operator in spectral graph theory. Represented as a positive semi-definite n × n matrix Unnormalized Laplacian Normalized Laplacian Random walk Laplacian : L2 (V) ! L2 (V) ( f)i = 1 bi X j:(i,j)2E aij(fi fj) where A = (aij) and D = diag( P j6=i aij) = D A = I D 1 A = I D 1/2 AD 1/2
  • 34. Outline Ø Spectral Graph Theory - Spectral decomposition Xavier Bresson 35
  • 35. Spectral decomposition Xavier Bresson 36 A Laplacian of a graph of n vertices admits n eigenvectors Eigenvectors are real and orthonormal (due to self-adjointness) Eigenvalues are non-negative (due to positive-semidefiniteness) Eigendecomposition of a graph Laplacian k = k k, k = 1, 2, . . . h k, k0 iL2(V) = kk0 0 = 1  2  . . .  n where = ( 1, . . . , n) and ⇤ = diag( 1, . . . , n) = T ⇤
  • 36. Interpretation Xavier Bresson 37 Find the smoothest orthonormal basis on a graph where EDir is the Dirichlet energy = measure of smoothness of a function Solution: first n Laplacian eigenvectors = ( 1, . . . , n) min k EDir( k) s.t. k kk = 1, k = 2, 3, . . . n k ? span{ 1, . . . , k 1} EDir(f) = f> f min 2Rn⇥n trace( > ) | {z } k kG Dirichlet norm s.t. > = I
  • 37. Laplacian eigenvectors Xavier Bresson 38 Euclidean domain: Graph domain: Lap eigenvectors related to graph geometry (s.a. communities, hubs, etc), spectral clustering[10] [10] Von Luxburg 2007
  • 38. Outline Ø Spectral Graph Theory - Fourier analysis Xavier Bresson 39
  • 39. Fourier analysis on Euclidean spaces Xavier Bresson 40 A function can be written as Fourier series Fourier basis = Laplace-Beltrami eigenfunctions: f : [ ⇡, ⇡] ! R f(x) = X k 0 1 2⇡ Z ⇡ ⇡ f(x0 )e ikx0 dx0 | {z } ˆfk=hf,e ikxiL2([ ⇡,⇡]) e ikx k = Fourier mode k = frequency of Fourier mode e ikx k = k2 k f
  • 40. Fourier analysis on graphs[11] Xavier Bresson 41 [11] Hammond, Vandergheynst, Gribonval, 2011 A function can be written as Fourier series is the k-th graph Fourier coefficient. In matrix-vector notation, with the n x n Fourier matrix Graph Fourier basis = Laplacian eigenvectors : ˆfk fi = nX k=1 hf, kiL2(V) | {z } ˆfk k,i = ⇥ 1, ..., n ⇤ ˆf = > f and f = ˆf k k = k k k = graph Fourier mode k = (square) frequency f : V ! R Fourier transform Inverse Fourier transform
  • 41. Outline Ø Spectral Graph Theory - Convolution Xavier Bresson 42
  • 42. Convolution on Euclidean spaces Xavier Bresson 43 Given two functions their convolution is a function Shift-invariance: Convolution operator commutes with Laplacian: Convolution theorem: Convolution can be computed in the Fourier domain as Efficient computation using FFT: O(nlogn) f, g : [ ⇡, ⇡] ! R (f ? g)(x) = Z ⇡ ⇡ f(x0 )g(x x0 )dx0 f(x x0) ? g(x) = (f ? g)(x x0) ( f) ? g = (f ? g) (f ? g) = ˆf · ˆg
  • 43. Convolution on discrete Euclidean spaces Xavier Bresson 44 Convolution of two vectors f = (f1, . . . , fn)> and g = (g1, . . . , gn)> (f ? g)i = X m g(i m) mod n · fm f ? g = 2 6 6 6 6 6 4 g1 g2 . . . . . . gn gn g1 g2 . . . gn 1 ... ... ... ... ... g3 g4 . . . g1 g2 g2 g3 . . . . . . g1 3 7 7 7 7 7 5 | {z } Circulant matrix diagonalised by Fourier basis (Toeplitz) 2 6 4 f1 ... fn 3 7 5 = 2 6 4 ˆg1 ... ˆgn 3 7 5 > f = > g > f
  • 44. f ? g = ( > g > f) = diag(ˆg1, . . . , ˆgn) > | {z } G f = ˆg(⇤) > f = ˆg( ⇤ > )f = ˆg( )f Convolution on graphs[11] Xavier Bresson 45 Spectral convolution of can be defined by analogy In matrix-vector notation Not shift-invariant! (G has no circulant structure) Commutes with Laplacian: Filter coefficients depend on basis Expensive computation (no FFT): O(n2) (f ? g)i = X k 1 hf, kiL2(V)hg, kiL2(V) | {z } product in the Fourier domain k,i | {z } inverse Fourier transform f, g 2 L2 (V) G f = Gf 1, . . . , n [11] Hammond, Vandergheynst, Gribonval, 2011
  • 45. (f) Ti00 f(e) Ti0 f(d) Tif No shift invariance on graphs Xavier Bresson 46 A signal f on graph can be translated to vertex i: Euclidean domain Graph domain[12] [12] Shuman et al. 2016 Tif = f ? i (a) Tsf (b) Ts0 f (c) Ts00 f
  • 46. Outline Ø Spectral Graph Theory - Coarsening Xavier Bresson 47
  • 47. Graph coarsening for graph pooling Goals: Pool similar local features (max pooling). Series of pooling layers create invariance to global geometric deformations. Challenges: Design a multi-scale coarsening algorithm that preserves non- linear graph structures. How to make graph pooling fast? Xavier Bresson 48
  • 48. Graph coarsening = graph partitioning Xavier Bresson 49 Graph partitioning decomposes G into smaller meaningful clusters. NP-hard Approximation
  • 49. Powerful combinatorial graph partitioning models: Normalized cut[13] Normalized association Both models are equivalent, but lead to different algorithms. Balanced cuts Xavier Bresson 50 Partitioning by min edge cuts. Ck Cc k Partitioning by max vertex matching. Ck [13] Shi, Malik, 2000 max C1,...,CK KX k=1 Assoc(Ck) Vol(Ck) min C1,...,CK KX k=1 Cut(Ck, Cc k) Vol(Ck) where Cut(A, B) := P i2A,j2B aij, Assoc(A) := P i2A,i2B aij, Vol(A) := P i2A,j2B di, and di := P j2V aij.
  • 50. Balanced cuts Xavier Bresson 51 Balanced cuts are NP-hard most popular approximation techniques focus on linear spectral relaxation (eigenproblem with global solution). Graph geometry are generally not linear Graclus[14] algorithm computes non-linear clusters that locally maximize the Normalized Association. Graclus algorithm offers a control of the coarsening ratio of ≈ 2, like image grid using heavy-edge matching[15]. [14] Dhillon, Guan, Kulis 2007 [15] Karypis, Kumar 1995
  • 51. Heavy-Edge Matching (HEM) Xavier Bresson 52 HEM proceeds by two successive steps, vertex matching and graph coarsening (that guarantees a local solution of Norm assoc): Matched vertices { i, j} are merged into a super-vertex at the next coarsening level. … … Gl 1 … … … Gl Gl+1 Gl+2 Graph coarsening/ clustering … … i j (1) Vertex matching: n i, j = argmax j al ii + 2al ij + al jj dl i + dl j o (2): Gl+1 = ⇢ al+1 ij = Cut(Cl i, Cl j) al+1 ii = Assoc(Cl i)
  • 52. Unstructured pooling Sequence of coarsened graphs produced by HEM: Xavier Bresson 53 Stores a table of indices for graph and all its coarsened versions Computationally inefficient
  • 53. Fast graph pooling[18] Structured pooling: Arrangement of the node indexing such that adjacent nodes are hierarchically merged at the next coarser level. Xavier Bresson 54 As efficient as 1D-Euclidean grid pooling. [18] Defferrard, Bresson, Vandergheynst 2016
  • 54. Outline Ø Spectral ConvNets - SplineNets Xavier Bresson 55
  • 55. Vanilla spectral graph ConvNets[16] Graph convolutional layer Xavier Bresson 56 [16] Bruna, Zaremba, Szlam, LeCun 2014; Henaff, Bruna, LeCun 2015 fl = l-th data feature on graphs, dim(fl) = n ⇥ 1 gl = l-th feature map, dim(gl) = n ⇥ 1 Conv. layer gl = ⇠ pX l0=1 Wl,l0 ? fl0 ! Activation, e.g. ⇠(x) = max{x, 0} rectified linear unit (ReLU)
  • 56. Spectral graph convolution Xavier Bresson 57 Convolutional layer in the spatial domain: where = matrix of graph spatial filter, can also be expressed in the spectral domain where = n x n diagonal matrix of graph spectral filter. We will denote the spectral filter without the hat symbol, i.e. gl = ⇠ pX l0=1 Wl,l0 ? fl0 ! , Wl,l0 (using g ? f = ˆg(⇤) > f) ˆWl,l0 Wl,l0 gl = ⇠ pX l0=1 ˆWl,l0 > fl0 ! ,
  • 57. Vanilla spectral graph ConvNets[16] Xavier Bresson 58 Series of spectral convolutional layers with spectral coefficients to be learned at each layer. g (k) l = ⇠ 0 @ q(k 1) X l0=1 W (k) l,l0 > g (k 1) l0 1 A , W (k) l,l0 First Graph CNN architecture No guarantee of spatial localization of filters O(n) parameters per layer O(n2) computation of forward and inverse Fourier transforms φ, φT (no FFT on graphs) Filters are basis-dependent does not generalize across graphs [16] Bruna, Zaremba, Szlam, LeCun 2014; Henaff, Bruna, LeCun 2015
  • 58. Spatial localization and spectral smoothness Xavier Bresson 59 In the Euclidean setting (by Parseval's identity) Localization in space = smoothness in frequency domain Smooth spectral filter function , Spatial localization Z +1 1 |x|2k |w(x)|2 dx = Z +1 1 @k ˆw( ) @ k 2 d w(x) ˆw( )
  • 59. Smooth parametric spectral filter Xavier Bresson 60 [17] (Litman, Bronstein, 2014); Henaff, Bruna, LeCun 2015 Parametrize the smooth spectral filter function with a linear combination of smooth kernel functions , e.g. splines[17] where is the vector of filter parameters 1( ), . . . , r( ) ↵ = (↵1, . . . , ↵r)>) ) W = Diag(B↵) w↵( ) = rX j=1 ↵j j( ) w↵( i) = rX j=1 ↵j j( i) = (B↵)i w( ) w( )
  • 60. Smooth parametric spectral filter Xavier Bresson 61 Graph convolutional layer: Apply the spectral filter to a feature signal f Application of the spectral filter to a feature signal with learnable parameters f ↵ w( )f = w(⇤) > f = 0 B @ w( 1) ... w( n) 1 C A > f w↵( )f = 0 B @ w↵( 1) ... w↵( n) 1 C A > f w
  • 61. SplineNets[17] Xavier Bresson 62 Series of spectral convolutional layers with smooth spectral parametric coefficients to be learned at each layer. g (k) l = ⇠ 0 @ q(k 1) X l0=1 W (k) l,l0 > g (k 1) l0 1 A , W (k) l,l0 Fast-decaying filters in space O(n) parameters per layer O(n2) computation of forward and inverse Fourier transforms φ, φT (no FFT on graphs) Filters are basis-dependent does not generalize across graphs [17] Henaff, Bruna, LeCun 2015
  • 62. Outline Ø Spectral ConvNets - ChebNets* [NIPS’16] Xavier Bresson 63
  • 63. Spectral polynomial filters[18] Xavier Bresson 64 Represent smooth spectral functions with polynomials of Laplacian eigenvalues where is the vector of filter parameters. Convolutional layer: Apply spectral filter to feature signal f w↵( ) = rX j=0 ↵j j ↵ = (↵1, . . . , ↵r)> w↵( )f = Pr j=0 ↵j j f Key observation: Each Laplacian operation increases the support of a function by 1-hop Exact control the size of Laplacian-based filters. s=heat source 1-hop 2-hop· · ·| {z } j [18] Defferrard, Bresson, Vandergheynst 2016
  • 64. Linear complexity Xavier Bresson 65 Application of the filter to a feature signal f Denote and define and the sequence Two important observations: 1. *No* need to compute the eigendecomposition of the Laplacian (φ,Λ). 2. Observe that {Xj} are generated by multiplication of a sparse matrix and a vector Complexity is O(Er)=O(n) for sparse graphs. Graph convolutional layers are GPU friendly. w↵( )f = Pr j=0 ↵j j f Xj = Xj 1X0 = f w↵( )f = Pr j=0 ↵jXj X1 = X0 = f
  • 65. Spectral graph ConvNets with polynomial filters Xavier Bresson 66 Series of spectral convolutional layers with spectral polynomial coefficients to be learned at each layer. g (k) l = ⇠ 0 @ q(k 1) X l0=1 W (k) l,l0 > g (k 1) l0 1 A , W (k) l,l0 Filters are exactly localized in r-hops support O(1) parameters per layer No computation of φ, φT O(n) computational complexity (assuming sparsely-connected graphs) Unstable under coefficients perturbation (hard to optimize) Filters are basis-dependent does not generalize across graphs [18] Defferrard, Bresson, Vandergheynst 2016
  • 66. Chebyshev polynomials Xavier Bresson 67 Graph convolution with (non-orthogonal) monomial basis Graph convolution with (orthogonal) Chebyshev polynomials w↵( )f = Pr j=0 ↵j j f 1, x, x2 , x3 , · · · Orthonormal on Stable under perturbation of coefficients L2 ([ 1, +1]) w.r.t. hf, gi = R +1 1 f(˜)g(˜) d˜ p 1 ˜2 T0, T1, T2, T3 w↵( ˜ )f = Pr j=0 ↵jTj( ˜ )f
  • 67. ChebNets[18] Xavier Bresson 68 [18] Defferrard, Bresson, Vandergheynst 2016 Application of the filter with the scaled Laplacian with ˜ = 2 1 n I X(j) = Tj( ˜ )f = 2 ˜ X(j 1) X(j 2) , X(0) = f, X(1) = ˜ f Filters are exactly localized in r-hops support O(1) parameters per layer No computation of φ, φT O(n) computational complexity (assuming sparsely-connected graphs) Stable under coefficients perturbation Filters are basis-dependent does not generalize across graphs w↵( ˜ )f = Pr j=0 ↵jTj( ˜ )f = Pr j=0 ↵jX(j)
  • 68. Numerical experiments Xavier Bresson 69 Running time Accuracy Optimization Graph: a 8-NN graph of the Euclidean grid i j
  • 69. Outline Ø Spectral ConvNets - GraphConvNets Xavier Bresson 70
  • 70. Graph convolutional nets[19]: simplified ChebNets Xavier Bresson 71 [19] Kipf, Welling 2016 Use Chebychev polynomials of degree r=2 and assume Further constrain to obtain a single-parameter filter Caveat: The eigenvalues of are now in [0,2] repeated application of the filter results in numerical instability Fix: Apply a renormalization with n ⇡ 2 ↵ = ↵0 = ↵1 w↵( )f = ↵0f + ↵1( I)f = ↵0f ↵1D 1/2 WD 1/2 f w↵( )f = ↵(I + D 1/2 WD 1/2 )f I + D 1/2 WD 1/2 w↵( )f = ↵ ˜D 1/2 ˜W ˜D 1/2 f ˜W = W + I and ˜D = diag( P j6=i ˜wij)
  • 72. Outline Ø Spectral ConvNets - CayleyNets* Xavier Bresson 73
  • 73. Community graphs Xavier Bresson 74 Synthetic graph with 15 communities Normalized Laplacian eigenvalues Chebyshev filters learned on the 15-communities graph ChebNet is unstable to produce filters with frequency bands of interest
  • 74. Spectral graph ConvNets with Cayley polynomials[20] Xavier Bresson 75 Cayley polynomials are a family of real-valued rational functions (w/ complex coefficients) Cayley transform Expressivity: we can rewrite as a real trigonometric polynomial w.r.t. p( ) = c0 + 2Re n rX j=1 cj ⇣ i + i | {z } C( ) ⌘jo p( ) C( ) p( ) = c0 + rX j=1 cj Cj ( ) | {z } eij! + ¯cj C j ( ) | {z } e ij! = 2 rX j=1 Re{cj} cos(j! ) Im{cj} sin(j! ) [20] (Cayley 1846); Levie, Monti, Bresson, Bronstein 2017 C( ) = i +i is a smooth bijection from R to eiR {1} Since z 1 = ¯z for z 2 eiR and 2Re{z} = z + ¯z
  • 75. Spectral zoom[20] Xavier Bresson 76 Applying Cayley transform to the scaled Laplacian results in a non-linear transformation of the eigenvalues (spectral zoom) C(h ) = (h iI)(h + iI) 1 h [20] Levie, Monti, Bresson, Bronstein 2017 Cayley transform of the 15-communities graph Laplacian spectrumC(h ) Laplacian spectrum h = 0.1 h = 1 h = 10
  • 76. Fast inversion Xavier Bresson 77 Application of Cayley filter requires the solution of recursive linear system Exact (stable) solution: O(n3) complexity Approximate solution using K Jacobi iterations with Application of the approximate Cayley filter w(h )f y0 = f (h + iI)yj = (h iI)yj 1 j = 1, . . . , r ˜y (k+1) j = J˜y (k) j + Diag 1 (h + iI)(h iI)˜yj 1 ˜y (0) j = Diag 1 (h + iI)(h iI)˜yj 1 J = Diag 1 (h + iI)O↵(h + iI) rX j=0 cj ˜yj ⇡ ⌧(h )f Complexity: O(ErK) = O(n)
  • 77. CayleyNets[30] Xavier Bresson 78 [20] Levie, Monti, Bresson, Bronstein 2017 Represent spectral transfer function as a Cayley polynomial of order r where the filter parameters are the vector of complex coefficients and the spectral zoom h. wc,h( ) = c0 + 2Re n rX j=1 cj(h i)j (h + i) j o c = (c0, . . . , cr)> Filters have guaranteed exponential spatial decay O(1) parameters per layer O(n) computational complexity with Jacobi approximate inversion (assuming sparsely-connected graph) Spectral zoom property allowing to better localize in frequency Richer class of filters than Chebyshev for the same order Filters are basis-dependent does not generalize across graphs
  • 79. Chebyshev vs Cayley: community graphs example Xavier Bresson 80
  • 80. Chebyshev vs Cayley: Cora example Xavier Bresson 81 Accuracy of ChebNet (blue) and CayleyNet (orange) on Cora dataset
  • 81. Approximate inversion Xavier Bresson 82 Accuracy of approximate inversion Speed of approximate inversion
  • 82. Outline Ø Spectral ConvNets on Multiple Graphs - Recommender Systems* [NIPS’17] Xavier Bresson 83
  • 83. Matrix completion - Netflix challenge Xavier Bresson 84 minX2Rm⇥n rank(X) s.t. xij = aij 8ij 2 ⌦ minX2Rm⇥n kXk⇤ s.t. xij = aij 8ij 2 ⌦ minX2Rm⇥n kXk⇤ + µk⌦ (X A)k2 F [21] Candes et al. 2008 Convex relaxation[21] )
  • 84. Matrix completion on single graph[22] Xavier Bresson 85 minX2Rm⇥n kXk⇤ + µk⌦ (X A)k2 F + µc tr(X cX> ) | {z } kXk2 Gc [22] Kalofolias, Bresson, Bronstein, Vandergheynst, 2014
  • 85. Matrix completion on multiple graphs[22] Xavier Bresson 86 [22] Kalofolias, Bresson, Bronstein, Vandergheynst, 2014 minX2Rm⇥n kXk⇤ + µk⌦ (X A)k2 F + µc tr(X cX> ) | {z } kXk2 Gc +µr tr(X> rX) | {z } kXk2 Gr
  • 86. Factorized matrix completion models[23] Xavier Bresson 87 min W2Rm⇥s H2Rn⇥s µk⌦ (X A)k2 F + µckWk2 F + µrkHk2 F O(n+m) variables instead of O(nm) rank(X) = rank(WHT) ≤ s by construction [23] Srebro, Rennie, Jaakkola 2004
  • 87. Factorized matrix completion on graphs[24] Xavier Bresson 88 O(n+m) variables instead of O(nm) rank(X) = rank(WHT) ≤ s by construction [24] Rao et al. 2015 min W2Rm⇥s H2Rn⇥s µk⌦ (X A)k2 F + µctr(H> cH) + µrtr(W> rW)
  • 88. 1D Fourier transform Xavier Bresson 89 Column-wise transform
  • 89. 2D Fourier transform Xavier Bresson 90 Column-wise transform + Row-wise transform = 2D transform From 1D signal processing to 2D image processing
  • 90. Multi-graph Fourier transform and convolution[25] Xavier Bresson 91 Multi-graph Fourier transform where are the eigenvectors of the column- and row- graph Laplacians , respectively. Multi-graph spectral convolution ˆX = > r X c c, r c, r X ? Y = r(( > r X c) ( > r Y c)) > c = r( ˆX ˆY) > c [25] Monti, Bresson, Bronstein 2017
  • 91. Parametrize the multi-graph filter as a smooth function of column- and row frequencies w/ e.g. bivariate Chebyshev polynomial where is matrix of coefficients and Multi-graph convolutional layer: Apply filter to matrix X where the scaled Laplacians Multi-graph spectral filters Xavier Bresson 92 w⇥(˜c, ˜r) = rX j,j0=1 ✓jj0 Tj(˜c)Tj0 (˜r) ⇥ = (✓jj0 ) 1  ˜c, ˜r  1 Y = rX j,j0=1 ✓jj0 Tj( ˜ c)XTj0 ( ˜ r) ˜ c = 2 1 c,n c I and ˜ r = 2 1 r,m r I
  • 92. Multi-graph ConvNets[25] Xavier Bresson 93 [25] Monti, Bresson, Bronstein 2017 Multi-graph spectral filters applied to p input channels (m x n matrices ) and producing q output channels Y (k) l = ⇠ 0 @ pX l0=1 rX j,j0=0 ✓jj0ll0 Tj( ˜ r)Y (k 1) l0 Tj0 ( ˜ c) 1 A l = 1, . . . , q l0 = 1, . . . , p Y (k 1) 1 , . . . , Y (k 1) p Y (k) 1 , . . . , Y (k) q Filters have guaranteed r-hops support on both graphs O(1) parameters per layer O(nm) computational complexity (assuming sparse graphs)
  • 93. Factorized multi-graph ConvNets Xavier Bresson 94 Two spectral convolutional layers applied to each of the factors W, H ul = ⇠ 0 @ pX l0=1 rX j=0 ✓r,jll0 Tj( ˜ r)wl0 1 A l = 1, . . . , q l0 = 1, . . . , p vl = ⇠ 0 @ pX l0=1 rX j0=0 ✓c,j0ll0 Tj0 ( ˜ c)hl0 1 A Filters have guaranteed r-hops support on both graphs O(1) parameters per layer O(n+m) computational complexity (assuming sparse graphs) May loose some coupling information
  • 94. Matrix completion with Recurrent Multi-Graph ConvNets[25] Xavier Bresson 95 Complexity: O(nm) Extract compositional features from graphs of users and items [25] Monti, Bresson, Bronstein 2017
  • 95. Factorized version Xavier Bresson 96 Complexity: O(n+m) [25] Monti, Bresson, Bronstein 2017
  • 96. Incremental updates with RNNs The RNN/LSTM[26] cast the completion problem as non-linear diffusion process. Replace multi-layer CNNs by simple RNN + diffusion much less parameters to learn (reduce overfitting). more favorable for highly sparse entries (better info diffusion). Fix #parameters of the model independently of entry sparsity. Allows to learn temporal dynamics of entries if such information is available. Xavier Bresson 97 dX(t)RNN X(t + 1) = X(t) + dX(t) X(t) [26] Hochreiter, Schmidhuber 1997
  • 97. Matrix completion methods comparison: synthetic Xavier Bresson 98
  • 98. Matrix completion methods comparison: real Xavier Bresson 99
  • 100. Related works Xavier Bresson 101 Classes of graph ConvNets: (1) Spectral approach (2) Spatial approach Spatial approaches Ø Geodesic CNN: Masci, Boscaini, Bronstein, Vandergheynst 2015 Ø Anisotropic CNN: Boscaini, Masci, Rodola, Bronstein 2016 Ø MoNet: Monti, Boscaini, Masci, Rodola, Svoboda, Bronstein 2017 Graph Neural Networks: Ø Gori, Monfardini, Scarselli 2005 Ø Li, Tarlow, Brockschmidt, Zemel 2015 Ø Sukhbaatar, Szlam, Fergus 2016 Several other works
  • 101. Conclusion Contributions: Generalization of ConvNets to data on graphs (fixed) Exact/tight localized filters on graphs Linear complexity for sparse graphs GPU implementation Rich expressivity Multiple graphs Several potential applications in network sciences. Xavier Bresson 102
  • 102. Papers & code Papers Convolutional neural networks on graphs with fast localized spectral filtering, M Defferrard, X Bresson, P Vandergheynst, NIPS, 2016, arXiv:1606.09375 Geometric matrix completion with recurrent multi-graph neural networks, F Monti, MM Bronstein, X Bresson, NIPS, 2017, arXiv:1704.06803 CayleyNets: Graph Convolutional Neural Networks with Complex Rational Spectral Filters, R Levie, F Monti, X Bresson, MM Bronstein, 2017, arXiv:1705.07664 Xavier Bresson 103 Code https://github.com/xbresson/ graph_convnets_pytorch
  • 103. Xavier Bresson 104 Tutorials Computer Vision and Pattern Recognition (CVPR) Geometric deep learning on graphs and manifolds M. Bronstein​​, J. Bruna​, A. Szlam, X. Bresson, Y. LeCun July 21, 2017 Slides: https://www.dropbox.com/s/appafg1fumb6u7f/tutorial_CVPR17_DL_graphs.pdf?dl=0 Video will be posted on YouTube/ComputerVisionFoundation Videos Neural Information Processing Systems (NIPS) Geometric deep learning on graphs and manifolds M. Bronstein​​, J. Bruna​, A. Szlam, X. Bresson, Y. LeCun Dec 4, 2017
  • 104. Xavier Bresson 105 Workshop New Deep Learning Techniques Feb 5-9, 2018 Organizers Xavier Bresson, NTU Michael Bronstein, USI/Intel Joan Bruna, NYU/Berkeley Yann LeCun, NYU/Facebook Stanley Osher, UCLA Arthur Szlam, Facebook http://www.ipam.ucla.edu/programs/work shops/new-deep-learning-techniques