SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Tensor Train in machine learning
Alexander Novikov
October 11, 2016
Alexander Novikov Tensor Train in machine learning October 11, 2016 1 / 26
Recommender systems
Assume low-rank structure.
Alexander Novikov Tensor Train in machine learning October 11, 2016 2 / 26
Tensor Train summary
Tensor Train (TT) decomposition [Oseledets 2011]:
A compact representation for tensors (=multidimensional array);
Allows for efficient application of linear algebra operations.
Alexander Novikov Tensor Train in machine learning October 11, 2016 3 / 26
Low-rank decomposition
A23 =
G1 G2
i2 = 3i1 = 2
Ai1i2 = G1[i1]
1×r
G2[i2]
r×1
A = G1G2
G1 – collection of rows, G2 – collection of columns:
Alexander Novikov Tensor Train in machine learning October 11, 2016 4 / 26
Tensor Train decomposition
A2423 =
G1 G2 G3 G4
i2 = 4 i3 = 2 i4 = 3
i1 = 2
Ai1...id
= G1[i1]
1×r
G2[i2]
r×r
. . . Gd [id ]
r×1
An example of computing one element of 4-dimensional tensor:
Alexander Novikov Tensor Train in machine learning October 11, 2016 5 / 26
Tensor Train decomposition Cont’d
Tensor A is said to be in the TT-format, if
Ai1,...,id
= G1[i1] G2[i2] · · · Gd [id ], ik ∈ {1, . . . , n},
where Gk[ik] — is a matrix of size rk−1 × rk, r0 = rd = 1.
Notation & terminology:
Gk — TT-cores;
rk — TT-ranks;
r = max
k=0,...,d
rk — the maximal TT-rank.
The TT-format uses O ndr2 memory to store nd elements. Efficient only
if the TT-rank is small.
Alexander Novikov Tensor Train in machine learning October 11, 2016 6 / 26
TT-format: example
Ai1,i2,i3 = i1 + i2 + i3,
i1 ∈ {1, 2, 3}, i2 ∈ {1, 2, 3, 4}, i3 ∈ {1, 2, 3, 4, 5}.
Ai1,i2,i3 = G1[i1]G2[i2]G3[i3],
Alexander Novikov Tensor Train in machine learning October 11, 2016 7 / 26
TT-format: example
Ai1,i2,i3 = i1 + i2 + i3,
i1 ∈ {1, 2, 3}, i2 ∈ {1, 2, 3, 4}, i3 ∈ {1, 2, 3, 4, 5}.
Ai1,i2,i3 = G1[i1]G2[i2]G3[i3],
G1[i1] = i1 1 G2[i2] =
1 0
i2 1
G3[i3] =
1
i3
Lets check:
A(i1, i2, i3) = i1 1
1 0
i2 1
1
i3
=
= i1 + i2 1
1
i3
= i1 + i2 + i3.
Alexander Novikov Tensor Train in machine learning October 11, 2016 7 / 26
TT-format: example
Ai1,i2,i3 = i1 + i2 + i3,
i1 ∈ {1, 2, 3}, i2 ∈ {1, 2, 3, 4}, i3 ∈ {1, 2, 3, 4, 5}.
Ai1,i2,i3 = G1[i1]G2[i2]G3[i3],
G1 = 1 1 , 2 1 , 3 1
G2 =
1 0
1 1
,
1 0
2 1
,
1 0
3 1
,
1 0
4 1
G3 =
1
1
,
1
2
,
1
3
,
1
4
,
1
5
The tensor has 3 · 4 · 5 = 60 elements.
The TT-format use 32 parameters to describe it.
Alexander Novikov Tensor Train in machine learning October 11, 2016 8 / 26
Sum of tensors
Tensors A and B are in the TT-format:
Ai1...id
= GA
1 [i1] · · · GA
d [id ], Bi1...id
= GB
1 [i1] · · · GB
d [id ].
Find the TT-format of
C = A + B,
Ci1...id
= Ai1...id
+ Bi1...id
.
Alexander Novikov Tensor Train in machine learning October 11, 2016 9 / 26
Sum of tensors
Tensors A and B are in the TT-format:
Ai1...id
= GA
1 [i1] · · · GA
d [id ], Bi1...id
= GB
1 [i1] · · · GB
d [id ].
Find the TT-format of
C = A + B,
Ci1...id
= Ai1...id
+ Bi1...id
.
TT-cores of the result:
GC
k [ik] =
GA
k [ik] 0
0 GB
k [ik]
, k = 2, . . . , d − 1,
GC
1 [i1] = GA
1 [i1] GB
1 [i1] , GC
d [id ] =
GA
d [id ]
GB
d [id ]
.
TT-ranks of the result are sums of the TT-ranks.
Alexander Novikov Tensor Train in machine learning October 11, 2016 9 / 26
TT-rounding
Given a tensor A in the TT-format with rank r, the TT-rounding
[Oseledets, 2011]:
A = tt-round(A, ε), ε > 0
finds the tensor A such that
1 A − A F ≤ ε A F ;
2 TT-rank of A is minimal among all B:
A − B F ≤ ε√
d−1
A F .
Where A F = i1,...,id
A2
i1,...,id
.
Alexander Novikov Tensor Train in machine learning October 11, 2016 10 / 26
How to find TT-decomposition of a given tensor
Analytical formulas for special cases;
An exact algorithm based on SVD for medium tensor. E.g. for a
58 ≈ 400 000 tensor takes 8 ms on my laptop;
For large tensors (e.g. 250), approximate algorithms that look at a
fraction of the tensor elements: DMRG-cross [Savostyanov and
Oseledets, 2011], AMEn-cross [Dolgov and Savostyanov, 2013].
Alexander Novikov Tensor Train in machine learning October 11, 2016 11 / 26
TT-format operations
Operation Rank of the result
C = c · A r(C) = r(A)
C = A + c r(C) = r(A)+1
C = A + B r(C) ≤ r(A)+r(B)
C = A B r(C) ≤ r(A)r(B)
C = round(A, ε) r(C) ≤ r(A)
sum A –
A F –
(Ask me about differential equations)
Alexander Novikov Tensor Train in machine learning October 11, 2016 12 / 26
Example application: TensorNet
1 Neural networks use fully-connected layers: y = f (W x + b).
2 The matrix W is of millions parameters.
3 Lets store and train the matrix W in the TT-format.
Can’t work for general matrices, but for VGG-16 net we compressed
4048 × 4048 matrix to 320 params without loss of accuracy.
Alexander Novikov Tensor Train in machine learning October 11, 2016 13 / 26
Linear model
Model
y(x) = w x + b,
b ∈ R, w ∈ Rd
Loss function
N
k=1
w x(k)
+ b, y(k)
.
Linear regression
Logistic regression
Linear SVM
...
Alexander Novikov Tensor Train in machine learning October 11, 2016 14 / 26
Need for interactions
Linear models give everyone same recommendations
Same story e.g. in bag-of-words text tasks
Use interactions (products of features)!
Alexander Novikov Tensor Train in machine learning October 11, 2016 15 / 26
Models with interactions
y(x) = b + w x +
i,j
Pijxi xj,
b ∈ R, w ∈ Rd
, P ∈ Rd×d
For d features d2 parameters: overfitting on sparse data
Complexity is also d2
For recommender systems d is millions
SVM with polynomial kernel has same drawbacks
Alexander Novikov Tensor Train in machine learning October 11, 2016 16 / 26
Factorization machines
y(x) = b + w x +
i,j
Pijxi xj
Factorization machines [Rendle 2010] use rank r for P
y(x) =b + w x +
i,j
r
f =1
Vif Vjf xi xj,
b ∈ R, w ∈ Rd
, V ∈ Rd×r
Matrix P = VV is not sparse, but structured (low rank)
Control the number of parameters with r
Can represent almost any matrix with large r
Alexander Novikov Tensor Train in machine learning October 11, 2016 17 / 26
High order analysis
Factorization machines model (3rd order)
y(x) =b + w x +
i,j
r
f =1
Vif Vjf xi xj
+
i,j,k
r
f =1
Uif Ujf Ukf xi xjxk.
In fact, Factorization machines just use CP-decomposition for the weight
tensor Pi,j,k:
Pijk =
r
f =1
Uif Ujf Ukf
But
Converge poorly with high order
Complexity of inference and learning
Alexander Novikov Tensor Train in machine learning October 11, 2016 18 / 26
Exponential machines
Lets encode interactions by binary code. Every bit indicates if
corresponded feature is included or not in current interaction.
Exponential machines example (d = 3):
y(x) = W000 + W100 x1 + W010 x2 + W001x3
+ W110 x1x2 + W101 x1x3 + W011 x2x3
+ W111 x1x2x3.
Alexander Novikov Tensor Train in machine learning October 11, 2016 19 / 26
Exponential machines
Lets encode interactions by binary code. Every bit indicates if
corresponded feature is included or not in current interaction.
Exponential machines example (d = 3):
y(x) = W000 + W100 x1 + W010 x2 + W001x3
+ W110 x1x2 + W101 x1x3 + W011 x2x3
+ W111 x1x2x3.
In general:
y(x) =
1
i1=0
. . .
1
id =0
Wi1,...,id
xi1
1 . . . xid
d ,
W ∈ R2×...×2
with TT-rank r
Captures all 2d interactions
Control the number of parameters with TT-rank r
Can represent any polynomial function with large r
Alexander Novikov Tensor Train in machine learning October 11, 2016 19 / 26
Exponential machines inference
Linear O(r2d) inference:
y(x) =
i1,...,id
G1[i1] . . . Gd [id ]
d
k=1
xik
k
=
i1,...,id
xi1
1 G1[i1] . . . xid
d Gd [id ]
=


1
i1=0
xi1
1 G1[i1]

 . . .


1
id =0
xid
d Gd [id ]


= A1
1×r
A2
r×r
. . . Ad
r×1
,
Alexander Novikov Tensor Train in machine learning October 11, 2016 20 / 26
Exponential machines learning
minimize
W
N
k=1
W, X(k)
, y(k)
,
subject to TT-rank(W) = r0,
1 Autodiff to compute gradients with respect to TT-cores G
2 OR Riemannian optimization
Theorem [Holtz, 2012]
The set of all d-dimensional tensors with fixed TT-rank r
Mr = {W ∈ R2×...×2
: TT-rank(W) = r}
forms a Riemannian manifold.
Alexander Novikov Tensor Train in machine learning October 11, 2016 21 / 26
Riemannian optimization
− ∂L
∂Wt
TW Mr
−Gt
TT-roundWt+1
Mr
projection
Wt
Alexander Novikov Tensor Train in machine learning October 11, 2016 22 / 26
Riemannian optimization Cont’d
Loss function
L(W) =
N
k=1
W, X(k)
, y(k)
Gradient
∂L
∂W
=
N
k=1
∂
∂y
X(k)
.
Where X is of TT-rank 1!
Xi1...id
=
d
k=1
xik
k .
Alexander Novikov Tensor Train in machine learning October 11, 2016 23 / 26
Experiments: optimization
10-1 100 101 102
time (s)
10-17
10-15
10-13
10-11
10-9
10-7
10-5
10-3
10-1
trainloss
Cores GD
Cores SGD 100
Cores SGD 500
Riemann GD
Riemann 100
Riemann 500
Riemann GD rand init
(a) Car dataset
10-1 100 101 102 103 104
time (s)
10-16
10-14
10-12
10-10
10-8
10-6
10-4
10-2
100
trainloss
Cores GD
Cores SGD 100
Cores SGD 500
Riemann GD
Riemann 100
Riemann 500
Riemann GD rand init
(b) HIV dataset
Alexander Novikov Tensor Train in machine learning October 11, 2016 24 / 26
Experiments: classification
1 We generated 105 train and 105 test objects and d = 30 features.
2 Xij ∼ U{−1, +1}.
3 Ground truth for 3 interactions of order 2:
y(x) = ε1x1x5 + ε2x3x8 + ε3x4x5; ε1, ε2, ε3 ∼ U(−1, 1).
4 We used 20 interactions of order 6.
Method Test AUC Training time (s) Inference time (s)
Log. reg. 0.50 ± 0.0 0.4 0.0
RF 0.55 ± 0.0 21.4 1.3
SVM RBF 0.50 ± 0.0 2262.6 1076.1
SVM poly. 2 0.50 ± 0.0 1152.6 852.0
SVM poly. 6 0.56 ± 0.0 4090.9 754.8
2-nd order FM 0.50 ± 0.0 638.2 0.1
6-th order FM 0.57 ± 0.05 1412.0 0.2
ExM rank 2 0.54 ± 0.05 198.4 0.1
ExM rank 4 0.69 ± 0.02 443.0 0.1
ExM rank 8 0.75 ± 0.02 998.3 0.2
Alexander Novikov Tensor Train in machine learning October 11, 2016 25 / 26
Conclusion
Tensor Train decomposition compactly represent tensors.
Can parametrize machine learning models with TT-tensors.
E.g. the weights of a neural network.
Or modeling all 2d interactions (products of features).
Control the number of underlying parameters via TT-rank.
Riemannian optimization learning sometimes outperforms SGD.
There is a Python code for everything: TT, TensorNet, and
Exponential Machines.
Alexander Novikov Tensor Train in machine learning October 11, 2016 26 / 26

Contenu connexe

Tendances

Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Numpy scipyで独立成分分析
Numpy scipyで独立成分分析
Shintaro Fukushima
 

Tendances (20)

LDA入門
LDA入門LDA入門
LDA入門
 
深層学習 - 画像認識のための深層学習 ②
深層学習 - 画像認識のための深層学習 ②深層学習 - 画像認識のための深層学習 ②
深層学習 - 画像認識のための深層学習 ②
 
Taylor’s series
Taylor’s   seriesTaylor’s   series
Taylor’s series
 
iOS平台应用详解:《Siri:I,robot! Siri语音识别系统详解》| 新浪 张俊林
iOS平台应用详解:《Siri:I,robot! Siri语音识别系统详解》| 新浪 张俊林iOS平台应用详解:《Siri:I,robot! Siri语音识别系统详解》| 新浪 张俊林
iOS平台应用详解:《Siri:I,robot! Siri语音识别系统详解》| 新浪 张俊林
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習
 
Sparse Codingをなるべく数式を使わず理解する(PCAやICAとの関係)
Sparse Codingをなるべく数式を使わず理解する(PCAやICAとの関係)Sparse Codingをなるべく数式を使わず理解する(PCAやICAとの関係)
Sparse Codingをなるべく数式を使わず理解する(PCAやICAとの関係)
 
Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Numpy scipyで独立成分分析
Numpy scipyで独立成分分析
 
[DL輪読会]GANとエネルギーベースモデル
[DL輪読会]GANとエネルギーベースモデル[DL輪読会]GANとエネルギーベースモデル
[DL輪読会]GANとエネルギーベースモデル
 
射頻電子 - [第四章] 散射參數網路
射頻電子 - [第四章] 散射參數網路射頻電子 - [第四章] 散射參數網路
射頻電子 - [第四章] 散射參數網路
 
論文紹介 Anomaly Detection using One-Class Neural Networks (修正版
論文紹介 Anomaly Detection using One-Class Neural Networks (修正版論文紹介 Anomaly Detection using One-Class Neural Networks (修正版
論文紹介 Anomaly Detection using One-Class Neural Networks (修正版
 
A brief survey of tensors
A brief survey of tensorsA brief survey of tensors
A brief survey of tensors
 
射頻電子 - [實驗第三章] 濾波器設計
射頻電子 - [實驗第三章] 濾波器設計射頻電子 - [實驗第三章] 濾波器設計
射頻電子 - [實驗第三章] 濾波器設計
 
FEPチュートリアル2021 講義3 「潜在変数が連続値、生成モデルが正規分布の場合」の改良版
FEPチュートリアル2021 講義3 「潜在変数が連続値、生成モデルが正規分布の場合」の改良版FEPチュートリアル2021 講義3 「潜在変数が連続値、生成モデルが正規分布の場合」の改良版
FEPチュートリアル2021 講義3 「潜在変数が連続値、生成モデルが正規分布の場合」の改良版
 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer Vision
 
総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム
総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム
総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム
 
最急降下法
最急降下法最急降下法
最急降下法
 
[Dl輪読会]A simple neural network module for relational reasoning
[Dl輪読会]A simple neural network module for relational reasoning[Dl輪読会]A simple neural network module for relational reasoning
[Dl輪読会]A simple neural network module for relational reasoning
 
第10回 配信講義 計算科学技術特論A(2021)
第10回 配信講義 計算科学技術特論A(2021)第10回 配信講義 計算科学技術特論A(2021)
第10回 配信講義 計算科学技術特論A(2021)
 
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
 
深層学習を用いたコンピュータビジョン技術と運転行動モニタリングへの応用
深層学習を用いたコンピュータビジョン技術と運転行動モニタリングへの応用深層学習を用いたコンピュータビジョン技術と運転行動モニタリングへの応用
深層学習を用いたコンピュータビジョン技術と運転行動モニタリングへの応用
 

Similaire à Tensor Train decomposition in machine learning

New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...
Alexander Litvinenko
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniques
Krishna Gali
 
Talk on Resource Allocation Strategies for Layered Multimedia Multicast Services
Talk on Resource Allocation Strategies for Layered Multimedia Multicast ServicesTalk on Resource Allocation Strategies for Layered Multimedia Multicast Services
Talk on Resource Allocation Strategies for Layered Multimedia Multicast Services
Andrea Tassi
 

Similaire à Tensor Train decomposition in machine learning (20)

New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...
 
QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...
QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...
QMC: Undergraduate Workshop, Introduction to Monte Carlo Methods with 'R' Sof...
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
 
Low Power Adaptive FIR Filter Based on Distributed Arithmetic
Low Power Adaptive FIR Filter Based on Distributed ArithmeticLow Power Adaptive FIR Filter Based on Distributed Arithmetic
Low Power Adaptive FIR Filter Based on Distributed Arithmetic
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniques
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
Talk on Resource Allocation Strategies for Layered Multimedia Multicast Services
Talk on Resource Allocation Strategies for Layered Multimedia Multicast ServicesTalk on Resource Allocation Strategies for Layered Multimedia Multicast Services
Talk on Resource Allocation Strategies for Layered Multimedia Multicast Services
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 
Design and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsDesign and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation Algorithms
 
Linear regression without tears
Linear regression without tearsLinear regression without tears
Linear regression without tears
 
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
 
Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03
 
Section4 stochastic
Section4 stochasticSection4 stochastic
Section4 stochastic
 
preTEST3A Double Integrals Solved
preTEST3A Double Integrals SolvedpreTEST3A Double Integrals Solved
preTEST3A Double Integrals Solved
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
preTEST3A Double Integrals
preTEST3A Double IntegralspreTEST3A Double Integrals
preTEST3A Double Integrals
 
2020 preTEST3A
2020 preTEST3A2020 preTEST3A
2020 preTEST3A
 
Small updates of matrix functions used for network centrality
Small updates of matrix functions used for network centralitySmall updates of matrix functions used for network centrality
Small updates of matrix functions used for network centrality
 
Efficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsEfficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formats
 

Dernier

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

Tensor Train decomposition in machine learning

  • 1. Tensor Train in machine learning Alexander Novikov October 11, 2016 Alexander Novikov Tensor Train in machine learning October 11, 2016 1 / 26
  • 2. Recommender systems Assume low-rank structure. Alexander Novikov Tensor Train in machine learning October 11, 2016 2 / 26
  • 3. Tensor Train summary Tensor Train (TT) decomposition [Oseledets 2011]: A compact representation for tensors (=multidimensional array); Allows for efficient application of linear algebra operations. Alexander Novikov Tensor Train in machine learning October 11, 2016 3 / 26
  • 4. Low-rank decomposition A23 = G1 G2 i2 = 3i1 = 2 Ai1i2 = G1[i1] 1×r G2[i2] r×1 A = G1G2 G1 – collection of rows, G2 – collection of columns: Alexander Novikov Tensor Train in machine learning October 11, 2016 4 / 26
  • 5. Tensor Train decomposition A2423 = G1 G2 G3 G4 i2 = 4 i3 = 2 i4 = 3 i1 = 2 Ai1...id = G1[i1] 1×r G2[i2] r×r . . . Gd [id ] r×1 An example of computing one element of 4-dimensional tensor: Alexander Novikov Tensor Train in machine learning October 11, 2016 5 / 26
  • 6. Tensor Train decomposition Cont’d Tensor A is said to be in the TT-format, if Ai1,...,id = G1[i1] G2[i2] · · · Gd [id ], ik ∈ {1, . . . , n}, where Gk[ik] — is a matrix of size rk−1 × rk, r0 = rd = 1. Notation & terminology: Gk — TT-cores; rk — TT-ranks; r = max k=0,...,d rk — the maximal TT-rank. The TT-format uses O ndr2 memory to store nd elements. Efficient only if the TT-rank is small. Alexander Novikov Tensor Train in machine learning October 11, 2016 6 / 26
  • 7. TT-format: example Ai1,i2,i3 = i1 + i2 + i3, i1 ∈ {1, 2, 3}, i2 ∈ {1, 2, 3, 4}, i3 ∈ {1, 2, 3, 4, 5}. Ai1,i2,i3 = G1[i1]G2[i2]G3[i3], Alexander Novikov Tensor Train in machine learning October 11, 2016 7 / 26
  • 8. TT-format: example Ai1,i2,i3 = i1 + i2 + i3, i1 ∈ {1, 2, 3}, i2 ∈ {1, 2, 3, 4}, i3 ∈ {1, 2, 3, 4, 5}. Ai1,i2,i3 = G1[i1]G2[i2]G3[i3], G1[i1] = i1 1 G2[i2] = 1 0 i2 1 G3[i3] = 1 i3 Lets check: A(i1, i2, i3) = i1 1 1 0 i2 1 1 i3 = = i1 + i2 1 1 i3 = i1 + i2 + i3. Alexander Novikov Tensor Train in machine learning October 11, 2016 7 / 26
  • 9. TT-format: example Ai1,i2,i3 = i1 + i2 + i3, i1 ∈ {1, 2, 3}, i2 ∈ {1, 2, 3, 4}, i3 ∈ {1, 2, 3, 4, 5}. Ai1,i2,i3 = G1[i1]G2[i2]G3[i3], G1 = 1 1 , 2 1 , 3 1 G2 = 1 0 1 1 , 1 0 2 1 , 1 0 3 1 , 1 0 4 1 G3 = 1 1 , 1 2 , 1 3 , 1 4 , 1 5 The tensor has 3 · 4 · 5 = 60 elements. The TT-format use 32 parameters to describe it. Alexander Novikov Tensor Train in machine learning October 11, 2016 8 / 26
  • 10. Sum of tensors Tensors A and B are in the TT-format: Ai1...id = GA 1 [i1] · · · GA d [id ], Bi1...id = GB 1 [i1] · · · GB d [id ]. Find the TT-format of C = A + B, Ci1...id = Ai1...id + Bi1...id . Alexander Novikov Tensor Train in machine learning October 11, 2016 9 / 26
  • 11. Sum of tensors Tensors A and B are in the TT-format: Ai1...id = GA 1 [i1] · · · GA d [id ], Bi1...id = GB 1 [i1] · · · GB d [id ]. Find the TT-format of C = A + B, Ci1...id = Ai1...id + Bi1...id . TT-cores of the result: GC k [ik] = GA k [ik] 0 0 GB k [ik] , k = 2, . . . , d − 1, GC 1 [i1] = GA 1 [i1] GB 1 [i1] , GC d [id ] = GA d [id ] GB d [id ] . TT-ranks of the result are sums of the TT-ranks. Alexander Novikov Tensor Train in machine learning October 11, 2016 9 / 26
  • 12. TT-rounding Given a tensor A in the TT-format with rank r, the TT-rounding [Oseledets, 2011]: A = tt-round(A, ε), ε > 0 finds the tensor A such that 1 A − A F ≤ ε A F ; 2 TT-rank of A is minimal among all B: A − B F ≤ ε√ d−1 A F . Where A F = i1,...,id A2 i1,...,id . Alexander Novikov Tensor Train in machine learning October 11, 2016 10 / 26
  • 13. How to find TT-decomposition of a given tensor Analytical formulas for special cases; An exact algorithm based on SVD for medium tensor. E.g. for a 58 ≈ 400 000 tensor takes 8 ms on my laptop; For large tensors (e.g. 250), approximate algorithms that look at a fraction of the tensor elements: DMRG-cross [Savostyanov and Oseledets, 2011], AMEn-cross [Dolgov and Savostyanov, 2013]. Alexander Novikov Tensor Train in machine learning October 11, 2016 11 / 26
  • 14. TT-format operations Operation Rank of the result C = c · A r(C) = r(A) C = A + c r(C) = r(A)+1 C = A + B r(C) ≤ r(A)+r(B) C = A B r(C) ≤ r(A)r(B) C = round(A, ε) r(C) ≤ r(A) sum A – A F – (Ask me about differential equations) Alexander Novikov Tensor Train in machine learning October 11, 2016 12 / 26
  • 15. Example application: TensorNet 1 Neural networks use fully-connected layers: y = f (W x + b). 2 The matrix W is of millions parameters. 3 Lets store and train the matrix W in the TT-format. Can’t work for general matrices, but for VGG-16 net we compressed 4048 × 4048 matrix to 320 params without loss of accuracy. Alexander Novikov Tensor Train in machine learning October 11, 2016 13 / 26
  • 16. Linear model Model y(x) = w x + b, b ∈ R, w ∈ Rd Loss function N k=1 w x(k) + b, y(k) . Linear regression Logistic regression Linear SVM ... Alexander Novikov Tensor Train in machine learning October 11, 2016 14 / 26
  • 17. Need for interactions Linear models give everyone same recommendations Same story e.g. in bag-of-words text tasks Use interactions (products of features)! Alexander Novikov Tensor Train in machine learning October 11, 2016 15 / 26
  • 18. Models with interactions y(x) = b + w x + i,j Pijxi xj, b ∈ R, w ∈ Rd , P ∈ Rd×d For d features d2 parameters: overfitting on sparse data Complexity is also d2 For recommender systems d is millions SVM with polynomial kernel has same drawbacks Alexander Novikov Tensor Train in machine learning October 11, 2016 16 / 26
  • 19. Factorization machines y(x) = b + w x + i,j Pijxi xj Factorization machines [Rendle 2010] use rank r for P y(x) =b + w x + i,j r f =1 Vif Vjf xi xj, b ∈ R, w ∈ Rd , V ∈ Rd×r Matrix P = VV is not sparse, but structured (low rank) Control the number of parameters with r Can represent almost any matrix with large r Alexander Novikov Tensor Train in machine learning October 11, 2016 17 / 26
  • 20. High order analysis Factorization machines model (3rd order) y(x) =b + w x + i,j r f =1 Vif Vjf xi xj + i,j,k r f =1 Uif Ujf Ukf xi xjxk. In fact, Factorization machines just use CP-decomposition for the weight tensor Pi,j,k: Pijk = r f =1 Uif Ujf Ukf But Converge poorly with high order Complexity of inference and learning Alexander Novikov Tensor Train in machine learning October 11, 2016 18 / 26
  • 21. Exponential machines Lets encode interactions by binary code. Every bit indicates if corresponded feature is included or not in current interaction. Exponential machines example (d = 3): y(x) = W000 + W100 x1 + W010 x2 + W001x3 + W110 x1x2 + W101 x1x3 + W011 x2x3 + W111 x1x2x3. Alexander Novikov Tensor Train in machine learning October 11, 2016 19 / 26
  • 22. Exponential machines Lets encode interactions by binary code. Every bit indicates if corresponded feature is included or not in current interaction. Exponential machines example (d = 3): y(x) = W000 + W100 x1 + W010 x2 + W001x3 + W110 x1x2 + W101 x1x3 + W011 x2x3 + W111 x1x2x3. In general: y(x) = 1 i1=0 . . . 1 id =0 Wi1,...,id xi1 1 . . . xid d , W ∈ R2×...×2 with TT-rank r Captures all 2d interactions Control the number of parameters with TT-rank r Can represent any polynomial function with large r Alexander Novikov Tensor Train in machine learning October 11, 2016 19 / 26
  • 23. Exponential machines inference Linear O(r2d) inference: y(x) = i1,...,id G1[i1] . . . Gd [id ] d k=1 xik k = i1,...,id xi1 1 G1[i1] . . . xid d Gd [id ] =   1 i1=0 xi1 1 G1[i1]   . . .   1 id =0 xid d Gd [id ]   = A1 1×r A2 r×r . . . Ad r×1 , Alexander Novikov Tensor Train in machine learning October 11, 2016 20 / 26
  • 24. Exponential machines learning minimize W N k=1 W, X(k) , y(k) , subject to TT-rank(W) = r0, 1 Autodiff to compute gradients with respect to TT-cores G 2 OR Riemannian optimization Theorem [Holtz, 2012] The set of all d-dimensional tensors with fixed TT-rank r Mr = {W ∈ R2×...×2 : TT-rank(W) = r} forms a Riemannian manifold. Alexander Novikov Tensor Train in machine learning October 11, 2016 21 / 26
  • 25. Riemannian optimization − ∂L ∂Wt TW Mr −Gt TT-roundWt+1 Mr projection Wt Alexander Novikov Tensor Train in machine learning October 11, 2016 22 / 26
  • 26. Riemannian optimization Cont’d Loss function L(W) = N k=1 W, X(k) , y(k) Gradient ∂L ∂W = N k=1 ∂ ∂y X(k) . Where X is of TT-rank 1! Xi1...id = d k=1 xik k . Alexander Novikov Tensor Train in machine learning October 11, 2016 23 / 26
  • 27. Experiments: optimization 10-1 100 101 102 time (s) 10-17 10-15 10-13 10-11 10-9 10-7 10-5 10-3 10-1 trainloss Cores GD Cores SGD 100 Cores SGD 500 Riemann GD Riemann 100 Riemann 500 Riemann GD rand init (a) Car dataset 10-1 100 101 102 103 104 time (s) 10-16 10-14 10-12 10-10 10-8 10-6 10-4 10-2 100 trainloss Cores GD Cores SGD 100 Cores SGD 500 Riemann GD Riemann 100 Riemann 500 Riemann GD rand init (b) HIV dataset Alexander Novikov Tensor Train in machine learning October 11, 2016 24 / 26
  • 28. Experiments: classification 1 We generated 105 train and 105 test objects and d = 30 features. 2 Xij ∼ U{−1, +1}. 3 Ground truth for 3 interactions of order 2: y(x) = ε1x1x5 + ε2x3x8 + ε3x4x5; ε1, ε2, ε3 ∼ U(−1, 1). 4 We used 20 interactions of order 6. Method Test AUC Training time (s) Inference time (s) Log. reg. 0.50 ± 0.0 0.4 0.0 RF 0.55 ± 0.0 21.4 1.3 SVM RBF 0.50 ± 0.0 2262.6 1076.1 SVM poly. 2 0.50 ± 0.0 1152.6 852.0 SVM poly. 6 0.56 ± 0.0 4090.9 754.8 2-nd order FM 0.50 ± 0.0 638.2 0.1 6-th order FM 0.57 ± 0.05 1412.0 0.2 ExM rank 2 0.54 ± 0.05 198.4 0.1 ExM rank 4 0.69 ± 0.02 443.0 0.1 ExM rank 8 0.75 ± 0.02 998.3 0.2 Alexander Novikov Tensor Train in machine learning October 11, 2016 25 / 26
  • 29. Conclusion Tensor Train decomposition compactly represent tensors. Can parametrize machine learning models with TT-tensors. E.g. the weights of a neural network. Or modeling all 2d interactions (products of features). Control the number of underlying parameters via TT-rank. Riemannian optimization learning sometimes outperforms SGD. There is a Python code for everything: TT, TensorNet, and Exponential Machines. Alexander Novikov Tensor Train in machine learning October 11, 2016 26 / 26