SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
EXPLORING TEMPORAL GRAPH DATA WITH PYTHON

A STUDY ON TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA
ANDRÉ PANISSON
@apanisson
ISI Foundation, Torino, Italy & New York City
WHY TENSOR FACTORIZATION + PYTHON?
▸ Matrix Factorization is already used in many fields
▸ Tensor Factorization is becoming very popular

for multiway data analysis
▸ TF is very useful to explore temporal graph data
▸ But still, the most used tool is Matlab
▸ There’s room for improvement in 

the Python libraries for TF
▸ Study: NTF of wearable sensor data
TENSORS AND TENSOR DECOMPOSITION
FACTOR ANALYSIS
Spearman ~1900
X≈WH
Xtests x subjects ≈ Wtests x intelligences Hintelligences x subjects
Spearman, 1927: The abilities of man.
≈
tests
subjects subjects
tests
Int.
Int.
X W
H
TOPIC MODELING / LATENT SEMANTIC ANALYSIS
Blei, David M. "Probabilistic topic models." Communications of the ACM 55.4 (2012): 77-84.
. , ,
. , ,
. . .
gene
dna
genetic
life
evolve
organism
brai n
neuron
nerve
data
number
computer
. , ,
Topics Documents
Topic proportions and
assignments
0.04
0.02
0.01
0.04
0.02
0.01
0.02
0.01
0.01
0.02
0.02
0.01
data
number
computer
. , ,
0.02
0.02
0.01
TOPIC MODELING / LATENT SEMANTIC ANALYSIS
X≈WH
Non-negative Matrix Factorization (NMF):
(~1970 Lawson, ~1995 Paatero, ~2000 Lee & Seung)
2005 Gaussier et al. "Relation between PLSA and NMF and implications."
arg min
W,H
kX WHk s. t. W, H 0
≈
documents
terms terms
documents
topic
topic
Sparse

Matrix!
NON-NEGATIVE MATRIX FACTORIZATION (NMF)
NMF gives Part based representation

(Lee & Seung – Nature 1999)
NMF
=×
Original
PCA
×
=
NMF is equivalent to Spectral Clustering

(Ding et al. - SDM 2005)
W W •
VHT
WHHT
H H •
WT
V
WTWH
arg min
W,H
kX WHk s. t. W, H 0
from sklearn import datasets, decomposition
digits = datasets.load_digits()
A = digits.data
nmf = decomposition.NMF(n_components=10)
W = nmf.fit_transform(A)
H = nmf.components_
plt.rc("image", cmap="binary")
plt.figure(figsize=(8,4))
for i in range(10):
plt.subplot(2,5,i+1)
plt.imshow(H[i].reshape(8,8))
plt.xticks(())
plt.yticks(())
plt.tight_layout()
BEYOND MATRICES: HIGH DIMENSIONAL DATASETS
Cichocki et al. Nonnegative Matrix and Tensor Factorizations
Environmental analysis
▸ Measurement as a function of (Location, Time, Variable)
Sensory analysis
▸ Score as a function of (Food sample, Judge, Attribute)
Process analysis
▸ Measurement as a function of (Batch, Variable, time)
Spectroscopy
▸ Intensity as a function of (Wavelength, Retention, Sample, Time,
Location, …)
…
MULTIWAY DATA ANALYSIS
DIGITAL TRACES FROM SENSORS AND IOT
USER
POSITION
TIME
…
Sidiropoulos,
Giannakis and Bro,
IEEE Trans. Signal
Processing, 2000.
Mørup, Hansen and Arnfred,
Journal of Neuroscience
Methods, 2007.
Hazan, Polak and
Shashua, ICCV 2005.
Bader, Berry, Browne,
Survey of Text Mining:
Clustering, Classification,
and Retrieval, 2nd Ed.,
2007.
Doostan and Iaccarino, Journal of
Computational Physics, 2009.
Andersen and Bro, Journal
of Chemometrics, 2003.
• Chemometrics
– Fluorescence Spectroscopy
– Chromatographic Data
Analysis
• Neuroscience
– Epileptic Seizure Localization
– Analysis of EEG and ERP
• Signal Processing
• Computer Vision
– Image compression,
classification
– Texture analysis
• Social Network Analysis
– Web link analysis
– Conversation detection in
emails
– Text analysis
• Approximation of PDEs
data reconstruction, cluster analysis, compression, 

dimensionality reduction, latent semantic analysis, …
TENSORS
WHAT IS A TENSOR?
A tensor is a multidimensional array

E.g., three-way tensor:
Mode-1
Mode-2
Mode-3
651a
FIBERS AND SLICES
Cichocki et al. Nonnegative Matrix and Tensor Factorizations
Column (Mode-1) Fibers Row (Mode-2) Fibers Tube (Mode-3) Fibers
Horizontal Slices Lateral Slices Frontal Slices
A[:, 4, 1] A[:, 1, 4] A[1, 3, :]
A[1, :, :] A[:, :, 1]A[:, 1, :]
TENSOR UNFOLDINGS: MATRICIZATION AND VECTORIZATION
Matricization: convert a tensor to a matrix
Vectorization: convert a tensor to a vector
>>> T = np.arange(0, 24).reshape((3, 4, 2))
>>> T
array([[[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11],
[12, 13],
[14, 15]],
[[16, 17],
[18, 19],
[20, 21],
[22, 23]]])
OK for dense tensors: use a combination 

of transpose() and reshape()
Not simple for sparse datasets (e.g.: <authors, terms, time>)
for j in range(2):
for i in range(4):
print T[:, i, j]
[ 0 8 16]
[ 2 10 18]
[ 4 12 20]
[ 6 14 22]
[ 1 9 17]
[ 3 11 19]
[ 5 13 21]
[ 7 15 23]
# supposing the existence of unfold
>>> T.unfold(0)
array([[ 0, 2, 4, 6, 1, 3, 5, 7],
[ 8, 10, 12, 14, 9, 11, 13, 15],
[16, 18, 20, 22, 17, 19, 21, 23]])
>>> T.unfold(1)
array([[ 0, 8, 16, 1, 9, 17],
[ 2, 10, 18, 3, 11, 19],
[ 4, 12, 20, 5, 13, 21],
[ 6, 14, 22, 7, 15, 23]])
>>> T.unfold(2)
array([[ 0, 8, 16, 2, 10, 18, 4, 12, 20, 6, 14, 22],
[ 1, 9, 17, 3, 11, 19, 5, 13, 21, 7, 15, 23]])
RANK-1 TENSOR
The outer product of N vectors results in a rank-1 tensor
array([[[ 1., 2.],
[ 2., 4.],
[ 3., 6.],
[ 4., 8.]],
[[ 2., 4.],
[ 4., 8.],
[ 6., 12.],
[ 8., 16.]],
[[ 3., 6.],
[ 6., 12.],
[ 9., 18.],
[ 12., 24.]]])
a = np.array([1, 2, 3])
b = np.array([1, 2, 3, 4])
c = np.array([1, 2])
T = np.zeros((a.shape[0], b.shape[0], c.shape[0]))
for i in range(a.shape[0]):
for j in range(b.shape[0]):
for k in range(c.shape[0]):
T[i, j, k] = a[i] * b[j] * c[k]
T = a(1)
· · · a(N)
=
a
c
b
Ti,j,k = aibjck
TENSOR RANK
▸ Every tensor can be written as a sum of rank-1 tensors
=
a1 aJ
c1 cJ
b1 bJ
+ +
▸ Tensor rank: smallest number of rank-1 tensors 

that can generate it by summing up
X ⇡
RX
r=1
a(1)
r a(2)
r · · · a(N)
r ⌘ JA(1)
, A(2)
, · · · , A(N)
K
T ⇡
RX
r=1
ar br cr ⌘ JA, B, CK
array([[[ 61., 82.],
[ 74., 100.],
[ 87., 118.],
[ 100., 136.]],
[[ 77., 104.],
[ 94., 128.],
[ 111., 152.],
[ 128., 176.]],
[[ 93., 126.],
[ 114., 156.],
[ 135., 186.],
[ 156., 216.]]])
A = np.array([[1, 2, 3],
[4, 5, 6]]).T
B = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]]).T
C = np.array([[1, 2],
[3, 4]]).T
T = np.zeros((A.shape[0], B.shape[0], C.shape[0]))
for i in range(A.shape[0]):
for j in range(B.shape[0]):
for k in range(C.shape[0]):
for r in range(A.shape[1]):
T[i, j, k] += A[i, r] * B[j, r] * C[k, r]
T = np.einsum('ir,jr,kr->ijk', A, B, C)
: Kruskal Tensorbr cr ⌘ JA, B, CK
TENSOR FACTORIZATION
▸ CANDECOMP/PARAFAC factorization (CP)
▸ extensions of SVD / PCA / NMF of matrices
NON-NEGATIVE TENSOR FACTORIZATION
▸ Decompose a non-negative tensor to 

a sum of R non-negative rank-1 tensors
arg min
A,B,C
kT JA, B, CKk
with JA, B, CK ⌘
RX
r=1
ar br cr
subject to A 0, B 0, C 0
TENSOR FACTORIZATION: HOW TO
Alternating Least Squares(ALS):

Fix all but one factor matrix to which LS is applied
min
A 0
kT(1) A(C B)T
k
min
B 0
kT(2) B(C A)T
k
min
C 0
kT(3) C(B A)T
k
denotes the Khatri-Rao product, which is a
column-wise Kronecker product, i.e., C B = [c1 ⌦ b1, c2 ⌦ b2, . . . , cr ⌦ br]
T(1) = ˆA(ˆC ˆB)T
T(2) = ˆB(ˆC ˆA)T
T(3) = ˆC(ˆB ˆA)T
Unfolded Tensor

on the kth mode
F = [zeros(n, r), zeros(m, r), zeros(t, r)]
FF_init = np.rand((len(F), r, r))
def iter_solver(T, F, FF_init):
# Update each factor
for k in range(len(F)):
# Compute the inner-product matrix
FF = ones((r, r))
for i in range(k) + range(k+1, len(F)):
FF = FF * FF_init[i]
# unfolded tensor times Khatri-Rao product
XF = T.uttkrp(F, k)
F[k] = F[k]*XF/(F[k].dot(FF))
# F[k] = nnls(FF, XF.T).T
FF_init[k] = (F[k].T.dot(F[k]))
return F, FF_init
W W •
VHT
WHHT
H H •
WT
V
WTWH
min
A 0
kT(1) A(C B)T
k
min
B 0
kT(2) B(C A)T
k
min
C 0
kT(3) C(B A)T
k
arg min
W,H
kX WHk s.
J. Kim and H. Park. Fast Nonnegative Tensor Factorization with an Active-set-like Method. In High-
Performance Scientific Computing: Algorithms and Applications, Springer, 2012, pp. 311-326.
HOW TO INTERPRET: USER X TERM X TIME
X is a 3-way tensor in which xnmt is 1 if the term m was used by user
n at interval t, 0 otherwise
ANxK
is the the association of each user n to a factor k
BMxK
is the association of each term m to a factor k
CTxK
shows the time activity of each factor
users
users
C
=
X
A
B
(N×M×T)
(T×K)
(N×K)
(M×K)
terms
tim
e
tim
e
terms
factors
http://www.datainterfaces.org/2013/06/twitter-topic-explorer/
TOOLS FOR TENSOR DECOMPOSITION
TOOLS FOR TENSOR FACTORIZATION
TOOLS: THE PYTHON WORLD
NumPy SciPy
Scikit-Tensor (under development):
github.com/mnick/scikit-tensor
NTF: gist.github.com/panisson/7719245
TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA
recorded proximity data
direct proximity
sensing
primary
school
Lyon, France
primary school
231 students
10 teachers
Hong Kong
primary school
900 students
65 teachers
SocioPatterns.org
7 years, 30+ deployments, 10 countries, 50,000+ persons
• Mongan Institute for Health Policy, Boston

• US Army Medical Component of the Armed Forces, Bangkok

• School of Public Health of the University of Hong Kong

• KEMRI Wellcome Trust, Kenya

• London School for Hygiene and Tropical Medicine, London

• Public Health England, London

• Saw Swee Hock School of Public Health, Singapore
TENSORS
0 1 0
1 0 1
0 1 0
FROM TEMPORAL GRAPHS TO 3-WAY TENSORS
temporal network
tensorial
representation
tensor factorization
factors
communities temporal activity
factorization
quality
A,B C
tuning the complexity
of the model
nodes
communities
1B
5A
3B
5B
2B
2A
3A
4A
1A
4B
50
60
70
80
0
10
20
30
404050
60
70
80
0
10
20
30
404050
60
70
80
0
10
20
30
4040
structures in temporal networks
components
nodes
time
time interval
quality metrics
component
L. Gauvin et al., PLoS ONE 9(1), e86028 (2014)
1B
5A
3B
5B
2B
2A
3A
4A
1A
4B
TENSOR DECOMPOSITION OF SCHOOL NETWORK
https://github.com/panisson/ntf-school
ANOMALY DETECTION
IN TEMPORAL NETWORKS
ANOMALY DETECTION IN TEMPORAL NETWORKS
A. Sapienza et al. ”Detecting anomalies in time-varying networks using tensor decomposition”, ICDM Data Mining in Networks
anomaly detection in temporal networks
Laetitia Gauvin Ciro Cattuto Anna Sapienza
.fit().predict()
( )
@apanisson
panisson@gmail.com
thank you

Contenu connexe

Tendances

Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
Dustin Smith
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
NAVER Engineering
 

Tendances (20)

(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer Vision
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법
 
Supply Chain Twin Demo - Companion Deck
Supply Chain Twin Demo - Companion DeckSupply Chain Twin Demo - Companion Deck
Supply Chain Twin Demo - Companion Deck
 
Data Visualization(s) Using Python
Data Visualization(s) Using PythonData Visualization(s) Using Python
Data Visualization(s) Using Python
 
Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)
 
機械学習と主成分分析
機械学習と主成分分析機械学習と主成分分析
機械学習と主成分分析
 
Vector database
Vector databaseVector database
Vector database
 
Deep Learningと他の分類器をRで比べてみよう in Japan.R 2014
Deep Learningと他の分類器をRで比べてみよう in Japan.R 2014Deep Learningと他の分類器をRで比べてみよう in Japan.R 2014
Deep Learningと他の分類器をRで比べてみよう in Japan.R 2014
 
Google charts
Google chartsGoogle charts
Google charts
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Pythonで時系列のデータを分析してみよう
Pythonで時系列のデータを分析してみようPythonで時系列のデータを分析してみよう
Pythonで時系列のデータを分析してみよう
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
多変量解析の背景理論
多変量解析の背景理論多変量解析の背景理論
多変量解析の背景理論
 
[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks
[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks
[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks
 
Statistical classification: A review on some techniques
Statistical classification: A review on some techniquesStatistical classification: A review on some techniques
Statistical classification: A review on some techniques
 
多変量解析を用いたメタボロームデータ解析
多変量解析を用いたメタボロームデータ解析多変量解析を用いたメタボロームデータ解析
多変量解析を用いたメタボロームデータ解析
 
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례 Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
 
제 16회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보야져 팀] : 기업연계프로젝트 3종세트 [마케팅시각화/서비스기획/분석시스템 구축]
제 16회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보야져 팀] : 기업연계프로젝트 3종세트 [마케팅시각화/서비스기획/분석시스템 구축]제 16회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보야져 팀] : 기업연계프로젝트 3종세트 [마케팅시각화/서비스기획/분석시스템 구축]
제 16회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보야져 팀] : 기업연계프로젝트 3종세트 [마케팅시각화/서비스기획/분석시스템 구축]
 

Similaire à Exploring temporal graph data with Python: 
a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

Thesis seminar
Thesis seminarThesis seminar
Thesis seminar
gvesom
 
An efficient fuzzy classifier with feature selection based
An efficient fuzzy classifier with feature selection basedAn efficient fuzzy classifier with feature selection based
An efficient fuzzy classifier with feature selection based
ssairayousaf
 

Similaire à Exploring temporal graph data with Python: 
a study on tensor decomposition of wearable sensor data (PyData NYC 2015) (20)

Ds2 statistics
Ds2 statisticsDs2 statistics
Ds2 statistics
 
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...
 
Teaching Population Genetics with R
Teaching Population Genetics with RTeaching Population Genetics with R
Teaching Population Genetics with R
 
総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム
総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム
総和伝搬法を用いた分散近似メッセージ伝搬アルゴリズム
 
Thesis seminar
Thesis seminarThesis seminar
Thesis seminar
 
Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classification
 
A Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierA Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target Classifier
 
multiscale_tutorial.pdf
multiscale_tutorial.pdfmultiscale_tutorial.pdf
multiscale_tutorial.pdf
 
An efficient fuzzy classifier with feature selection based
An efficient fuzzy classifier with feature selection basedAn efficient fuzzy classifier with feature selection based
An efficient fuzzy classifier with feature selection based
 
Fractal Image Compression By Range Block Classification
Fractal Image Compression By Range Block ClassificationFractal Image Compression By Range Block Classification
Fractal Image Compression By Range Block Classification
 
BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...
 
Tenser Product of Representation for the Group Cn
Tenser Product of Representation for the Group CnTenser Product of Representation for the Group Cn
Tenser Product of Representation for the Group Cn
 
A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...
 
A Method for the Reduction 0f Linear High Order MIMO Systems Using Interlacin...
A Method for the Reduction 0f Linear High Order MIMO Systems Using Interlacin...A Method for the Reduction 0f Linear High Order MIMO Systems Using Interlacin...
A Method for the Reduction 0f Linear High Order MIMO Systems Using Interlacin...
 
Bat algorithm and applications
Bat algorithm and applicationsBat algorithm and applications
Bat algorithm and applications
 
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
 
MUMS Opening Workshop - Model Uncertainty in Data Fusion for Remote Sensing -...
MUMS Opening Workshop - Model Uncertainty in Data Fusion for Remote Sensing -...MUMS Opening Workshop - Model Uncertainty in Data Fusion for Remote Sensing -...
MUMS Opening Workshop - Model Uncertainty in Data Fusion for Remote Sensing -...
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...
 

Dernier

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 

Dernier (20)

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 

Exploring temporal graph data with Python: 
a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

  • 1. EXPLORING TEMPORAL GRAPH DATA WITH PYTHON
 A STUDY ON TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA ANDRÉ PANISSON @apanisson ISI Foundation, Torino, Italy & New York City
  • 2. WHY TENSOR FACTORIZATION + PYTHON? ▸ Matrix Factorization is already used in many fields ▸ Tensor Factorization is becoming very popular
 for multiway data analysis ▸ TF is very useful to explore temporal graph data ▸ But still, the most used tool is Matlab ▸ There’s room for improvement in 
 the Python libraries for TF ▸ Study: NTF of wearable sensor data
  • 3. TENSORS AND TENSOR DECOMPOSITION
  • 4. FACTOR ANALYSIS Spearman ~1900 X≈WH Xtests x subjects ≈ Wtests x intelligences Hintelligences x subjects Spearman, 1927: The abilities of man. ≈ tests subjects subjects tests Int. Int. X W H
  • 5. TOPIC MODELING / LATENT SEMANTIC ANALYSIS Blei, David M. "Probabilistic topic models." Communications of the ACM 55.4 (2012): 77-84. . , , . , , . . . gene dna genetic life evolve organism brai n neuron nerve data number computer . , , Topics Documents Topic proportions and assignments 0.04 0.02 0.01 0.04 0.02 0.01 0.02 0.01 0.01 0.02 0.02 0.01 data number computer . , , 0.02 0.02 0.01
  • 6. TOPIC MODELING / LATENT SEMANTIC ANALYSIS X≈WH Non-negative Matrix Factorization (NMF): (~1970 Lawson, ~1995 Paatero, ~2000 Lee & Seung) 2005 Gaussier et al. "Relation between PLSA and NMF and implications." arg min W,H kX WHk s. t. W, H 0 ≈ documents terms terms documents topic topic Sparse
 Matrix!
  • 7. NON-NEGATIVE MATRIX FACTORIZATION (NMF) NMF gives Part based representation
 (Lee & Seung – Nature 1999) NMF =× Original PCA × = NMF is equivalent to Spectral Clustering
 (Ding et al. - SDM 2005) W W • VHT WHHT H H • WT V WTWH arg min W,H kX WHk s. t. W, H 0
  • 8. from sklearn import datasets, decomposition digits = datasets.load_digits() A = digits.data nmf = decomposition.NMF(n_components=10) W = nmf.fit_transform(A) H = nmf.components_ plt.rc("image", cmap="binary") plt.figure(figsize=(8,4)) for i in range(10): plt.subplot(2,5,i+1) plt.imshow(H[i].reshape(8,8)) plt.xticks(()) plt.yticks(()) plt.tight_layout()
  • 9. BEYOND MATRICES: HIGH DIMENSIONAL DATASETS Cichocki et al. Nonnegative Matrix and Tensor Factorizations Environmental analysis ▸ Measurement as a function of (Location, Time, Variable) Sensory analysis ▸ Score as a function of (Food sample, Judge, Attribute) Process analysis ▸ Measurement as a function of (Batch, Variable, time) Spectroscopy ▸ Intensity as a function of (Wavelength, Retention, Sample, Time, Location, …) … MULTIWAY DATA ANALYSIS
  • 10. DIGITAL TRACES FROM SENSORS AND IOT USER POSITION TIME …
  • 11. Sidiropoulos, Giannakis and Bro, IEEE Trans. Signal Processing, 2000. Mørup, Hansen and Arnfred, Journal of Neuroscience Methods, 2007. Hazan, Polak and Shashua, ICCV 2005. Bader, Berry, Browne, Survey of Text Mining: Clustering, Classification, and Retrieval, 2nd Ed., 2007. Doostan and Iaccarino, Journal of Computational Physics, 2009. Andersen and Bro, Journal of Chemometrics, 2003. • Chemometrics – Fluorescence Spectroscopy – Chromatographic Data Analysis • Neuroscience – Epileptic Seizure Localization – Analysis of EEG and ERP • Signal Processing • Computer Vision – Image compression, classification – Texture analysis • Social Network Analysis – Web link analysis – Conversation detection in emails – Text analysis • Approximation of PDEs data reconstruction, cluster analysis, compression, 
 dimensionality reduction, latent semantic analysis, …
  • 13. WHAT IS A TENSOR? A tensor is a multidimensional array
 E.g., three-way tensor: Mode-1 Mode-2 Mode-3 651a
  • 14. FIBERS AND SLICES Cichocki et al. Nonnegative Matrix and Tensor Factorizations Column (Mode-1) Fibers Row (Mode-2) Fibers Tube (Mode-3) Fibers Horizontal Slices Lateral Slices Frontal Slices A[:, 4, 1] A[:, 1, 4] A[1, 3, :] A[1, :, :] A[:, :, 1]A[:, 1, :]
  • 15. TENSOR UNFOLDINGS: MATRICIZATION AND VECTORIZATION Matricization: convert a tensor to a matrix Vectorization: convert a tensor to a vector
  • 16. >>> T = np.arange(0, 24).reshape((3, 4, 2)) >>> T array([[[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7]], [[ 8, 9], [10, 11], [12, 13], [14, 15]], [[16, 17], [18, 19], [20, 21], [22, 23]]]) OK for dense tensors: use a combination 
 of transpose() and reshape() Not simple for sparse datasets (e.g.: <authors, terms, time>) for j in range(2): for i in range(4): print T[:, i, j] [ 0 8 16] [ 2 10 18] [ 4 12 20] [ 6 14 22] [ 1 9 17] [ 3 11 19] [ 5 13 21] [ 7 15 23] # supposing the existence of unfold >>> T.unfold(0) array([[ 0, 2, 4, 6, 1, 3, 5, 7], [ 8, 10, 12, 14, 9, 11, 13, 15], [16, 18, 20, 22, 17, 19, 21, 23]]) >>> T.unfold(1) array([[ 0, 8, 16, 1, 9, 17], [ 2, 10, 18, 3, 11, 19], [ 4, 12, 20, 5, 13, 21], [ 6, 14, 22, 7, 15, 23]]) >>> T.unfold(2) array([[ 0, 8, 16, 2, 10, 18, 4, 12, 20, 6, 14, 22], [ 1, 9, 17, 3, 11, 19, 5, 13, 21, 7, 15, 23]])
  • 17. RANK-1 TENSOR The outer product of N vectors results in a rank-1 tensor array([[[ 1., 2.], [ 2., 4.], [ 3., 6.], [ 4., 8.]], [[ 2., 4.], [ 4., 8.], [ 6., 12.], [ 8., 16.]], [[ 3., 6.], [ 6., 12.], [ 9., 18.], [ 12., 24.]]]) a = np.array([1, 2, 3]) b = np.array([1, 2, 3, 4]) c = np.array([1, 2]) T = np.zeros((a.shape[0], b.shape[0], c.shape[0])) for i in range(a.shape[0]): for j in range(b.shape[0]): for k in range(c.shape[0]): T[i, j, k] = a[i] * b[j] * c[k] T = a(1) · · · a(N) = a c b Ti,j,k = aibjck
  • 18. TENSOR RANK ▸ Every tensor can be written as a sum of rank-1 tensors = a1 aJ c1 cJ b1 bJ + + ▸ Tensor rank: smallest number of rank-1 tensors 
 that can generate it by summing up X ⇡ RX r=1 a(1) r a(2) r · · · a(N) r ⌘ JA(1) , A(2) , · · · , A(N) K T ⇡ RX r=1 ar br cr ⌘ JA, B, CK
  • 19. array([[[ 61., 82.], [ 74., 100.], [ 87., 118.], [ 100., 136.]], [[ 77., 104.], [ 94., 128.], [ 111., 152.], [ 128., 176.]], [[ 93., 126.], [ 114., 156.], [ 135., 186.], [ 156., 216.]]]) A = np.array([[1, 2, 3], [4, 5, 6]]).T B = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]).T C = np.array([[1, 2], [3, 4]]).T T = np.zeros((A.shape[0], B.shape[0], C.shape[0])) for i in range(A.shape[0]): for j in range(B.shape[0]): for k in range(C.shape[0]): for r in range(A.shape[1]): T[i, j, k] += A[i, r] * B[j, r] * C[k, r] T = np.einsum('ir,jr,kr->ijk', A, B, C) : Kruskal Tensorbr cr ⌘ JA, B, CK
  • 20. TENSOR FACTORIZATION ▸ CANDECOMP/PARAFAC factorization (CP) ▸ extensions of SVD / PCA / NMF of matrices NON-NEGATIVE TENSOR FACTORIZATION ▸ Decompose a non-negative tensor to 
 a sum of R non-negative rank-1 tensors arg min A,B,C kT JA, B, CKk with JA, B, CK ⌘ RX r=1 ar br cr subject to A 0, B 0, C 0
  • 21. TENSOR FACTORIZATION: HOW TO Alternating Least Squares(ALS):
 Fix all but one factor matrix to which LS is applied min A 0 kT(1) A(C B)T k min B 0 kT(2) B(C A)T k min C 0 kT(3) C(B A)T k denotes the Khatri-Rao product, which is a column-wise Kronecker product, i.e., C B = [c1 ⌦ b1, c2 ⌦ b2, . . . , cr ⌦ br] T(1) = ˆA(ˆC ˆB)T T(2) = ˆB(ˆC ˆA)T T(3) = ˆC(ˆB ˆA)T Unfolded Tensor
 on the kth mode
  • 22. F = [zeros(n, r), zeros(m, r), zeros(t, r)] FF_init = np.rand((len(F), r, r)) def iter_solver(T, F, FF_init): # Update each factor for k in range(len(F)): # Compute the inner-product matrix FF = ones((r, r)) for i in range(k) + range(k+1, len(F)): FF = FF * FF_init[i] # unfolded tensor times Khatri-Rao product XF = T.uttkrp(F, k) F[k] = F[k]*XF/(F[k].dot(FF)) # F[k] = nnls(FF, XF.T).T FF_init[k] = (F[k].T.dot(F[k])) return F, FF_init W W • VHT WHHT H H • WT V WTWH min A 0 kT(1) A(C B)T k min B 0 kT(2) B(C A)T k min C 0 kT(3) C(B A)T k arg min W,H kX WHk s. J. Kim and H. Park. Fast Nonnegative Tensor Factorization with an Active-set-like Method. In High- Performance Scientific Computing: Algorithms and Applications, Springer, 2012, pp. 311-326.
  • 23. HOW TO INTERPRET: USER X TERM X TIME X is a 3-way tensor in which xnmt is 1 if the term m was used by user n at interval t, 0 otherwise ANxK is the the association of each user n to a factor k BMxK is the association of each term m to a factor k CTxK shows the time activity of each factor users users C = X A B (N×M×T) (T×K) (N×K) (M×K) terms tim e tim e terms factors
  • 25. TOOLS FOR TENSOR DECOMPOSITION
  • 26. TOOLS FOR TENSOR FACTORIZATION
  • 27. TOOLS: THE PYTHON WORLD NumPy SciPy Scikit-Tensor (under development): github.com/mnick/scikit-tensor NTF: gist.github.com/panisson/7719245
  • 28. TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA
  • 29.
  • 30. recorded proximity data direct proximity sensing
  • 32. Hong Kong primary school 900 students 65 teachers
  • 33. SocioPatterns.org 7 years, 30+ deployments, 10 countries, 50,000+ persons • Mongan Institute for Health Policy, Boston • US Army Medical Component of the Armed Forces, Bangkok • School of Public Health of the University of Hong Kong • KEMRI Wellcome Trust, Kenya • London School for Hygiene and Tropical Medicine, London • Public Health England, London • Saw Swee Hock School of Public Health, Singapore
  • 35. 0 1 0 1 0 1 0 1 0 FROM TEMPORAL GRAPHS TO 3-WAY TENSORS
  • 36. temporal network tensorial representation tensor factorization factors communities temporal activity factorization quality A,B C tuning the complexity of the model nodes communities 1B 5A 3B 5B 2B 2A 3A 4A 1A 4B 50 60 70 80 0 10 20 30 404050 60 70 80 0 10 20 30 404050 60 70 80 0 10 20 30 4040 structures in temporal networks components nodes time time interval quality metrics component
  • 37. L. Gauvin et al., PLoS ONE 9(1), e86028 (2014) 1B 5A 3B 5B 2B 2A 3A 4A 1A 4B TENSOR DECOMPOSITION OF SCHOOL NETWORK
  • 40. ANOMALY DETECTION IN TEMPORAL NETWORKS A. Sapienza et al. ”Detecting anomalies in time-varying networks using tensor decomposition”, ICDM Data Mining in Networks
  • 41. anomaly detection in temporal networks
  • 42. Laetitia Gauvin Ciro Cattuto Anna Sapienza .fit().predict() ( )