Generalization of Tensor Factorization and Applications for Anomaly Detection

Generalization of Tensor Factorization
and Applications

Kohei Hayashi
Collaborators:
T. Takenouchi, T. Shibata, Y. Kamiya, D. Kato, K. Kunieda, K. Yamada, K. Ikeda,
R. Tomioka, H. Kashima

May 14, 2012

1 / 29

Relational data
is a collection of relationships among multiple objects.

relationships of pair ⇔ matrix

2 / 29

Relational data

relationships of pair ⇔ matrix
relationships of 3-tuple ⇔ 3 dim. array

3 / 29

Relational data


relationships of pair ⇔ matrix 
relationships of 3-tuple ⇔ 3 dim. array tensor
.
. .
. 
. .

can represent as a tensor with missing values.
4 / 29

Issue of tensor representation

Tensor representation is generally high-dimensional and
large-scale
• e.g. 1, 000 users × 1, 000 items × 1, 000 times
= total 1, 000, 000, 000 relationships

Dimensional reduction techniques such as tensor
factorization is used.

5 / 29

Tensor factorization

Tucker decomposition [Tucker 1966]: a tensor factorization
method assuming
1 observation noise is Gaussian
.

2 underlying tensor is low-dimensional
.

These assumptions are general ... but are not always
true.

6 / 29

Contributions
Generalize Tucker decomposition and propose two new
models:
1 Exponential family tensor factorization (ETF)
.
[Joint work with Takenouchi, Shibata, Kamiya, Kunieda, Yamada, and
Ikeda]
• Generalize the noise distribution.
• Can handle a tensor containing mixed discrete and
continuous values.
2. Full-rank tensor completion (FTC)
[Joint work with Tomioka and Kashima]
• Kernelize Tucker decomposition
• Complete missing values without reducing the
dimensionality.

7 / 29

Outline

1. Tucker decomposition
2. Exponential family tensor factorization (ETF)
4. Conclusion

8 / 29

Outline

4. Conclusion

9 / 29

Tucker decomposition

K1 K2 K3
(1) (2) (3)
xijk = uiq ujr uks zqrs + εijk (1)
q=1 r=1 s=1

• ε: i.i.d Gaussian noise

10 / 29


K1 K2 K3
(1) (2) (3)
q=1 r=1 s=1


11 / 29


K1 K2 K3
(1) (2) (3)
q=1 r=1 s=1


12 / 29

Vectorized form

Let x ∈ RD and z ∈ RK denote vectorized X and Z,
resp., then

x = Wz + ε where W ≡ U(3) ⊗ U(2) ⊗ U(1) .

• D ≡ D1 D2 D3 , K ≡ K1 K2 K3
• ⊗: the Kronecker product

Tucker decomposition = A linear Gaussian model

x ∼ N (x | Wz, σ 2 I)

13 / 29

Rank of tensor
Call dimensionalities of the core tensor rank of tensor .

• Rank of X is (K1 , K2 , K3 ).

Rank of tensor represents its complexity
• low-rank tensor has less information
14 / 29

Outline

4. Conclusion

15 / 29

Motivation: heterogeneous tensor

Tucker decomposition assumes Gaussian noise
• not appropriate for such data

16 / 29

Motivation: heterogeneous tensor

Tucker decomposition assumes Gaussian noise
• not appropriate for such data
.
Approach
.
generalize Tucker decomposition, assuming a diﬀerent
distribution for each element
.
17 / 29

Exponential family tensor factorization
.
Likelihood
.
D
x∼ Expond (xd | θd ), θ ≡ Wz
d=1 exp. family Tucker decomp.

. where Expon(x | θ) ≡ exp [xθ − ψ(θ) + F (x)]

exponential family : a class of distributions
Gaussian Poisson Bernoulli
0.4

0.7
Density
Density

0.2
0.2

0.5
(θ = 0)
0.0

0.0

0.3

ψ(θ) θ2−4 −2
/2 0 2 4
exp[θ] 4
0 2 6 8
ln(10.0 0.5 1.0 1.5
−0.5
+ exp[θ])
x

18 / 29

.
Priors
.
• for z: a Gaussian prior N (0, I)
(m) −1
. • for U : a Gaussian prior N (0, αm I)

.
Joint log-likelihood
.
D
L =x Wz − ψhd (wd z)
d=1
M
1 αm 2
− ||z||2 − U(m) + const.
2 m=1
2 Fro

.
Estimate parameters by Bayesian inference.
19 / 29

Bayesian inference
Marginal-MAP estimator:

argmax exp[L(z, U(1) , . . . , U(M ) )]dz
U(1) ,...,U(M )

• The integral is not analytical.

Develop eﬃcient yet accurate approximation with
Laplace approximation and Gaussian process.
• Computational cost is still higher than Tucker
dcomp.
• For a long and thin tensor (e.g. time series), online
algorithm is applicable (see thesis.)
20 / 29

Experiments

21 / 29

Anomaly detection by ETF
Purpose Find irregular parts of tensor data
Method Apply distance-based outlier
DB (p, D) [Knorr+ VLDB’00] to the estimated
factor matrix U(m)
.
Deﬁnition of DB (p, D)
.
“An object O is a DB (p, D) outlier if at least fraction p
. the objects lies at a distance greater than D from O.”
of

22 / 29

Data set

A time series of multisensor measurements
• Each sensor recorded the human behavior (e.g.
position) of researchers in NEC lab. for 8 month.
• 6 (sensors) × 21 (persons) × 1927 (hours)

Sensor Type Min Max
X1 # of sent emails Count 0.00 14.00
X2 # of received emails Count 0.00 15.00
X3 # of typed keys Count 0.00 50422.00
X4 X coordinate Real −550.17 3444.15
X5 Y coordinate Real 128.71 2353.55
X6 Movement distance Non-negative 0.00 203136.96

23 / 29

Evaluation
List of irregular events are provided.
Examples of irregular events
Date Time Description
Dec 21 All day Private incident
Dec 22 15:00 Monthly seminar
16:00
Jun 15 13:00 Visiting tour
.
. .
. .
.
. . .

• Evaluate as a binary classiﬁcation
• 50% are missing, 10 trials
• Apply ETF with online algorithm

24 / 29

Result: classiﬁcation performance

0.7

method
0.6 PARAFAC
AUC

Tucker
pTucker (EM)

0.5 ETF online

0.4

2x2x2 3x3x3 4x4x4 5x5x5 6x6x6 6x7x7
Rank

25 / 29

Result: classiﬁcation performance

0.7

method
0.6 PARAFAC
AUC

Tucker
pTucker (EM)

0.5 ETF online

0.4

2x2x2 3x3x3 4x4x4 5x5x5 6x6x6 6x7x7
Rank

ETF well detect anomaly events
26 / 29

Skip the rest

27 / 29

Generalization of Tensor Factorization and Applications for Anomaly Detection

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Generalization of Tensor Factorization and Applications for Anomaly Detection

Similaire à Generalization of Tensor Factorization and Applications for Anomaly Detection (20)

Dernier

Dernier (20)

Generalization of Tensor Factorization and Applications for Anomaly Detection