SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
Extracting relevant Metrics with Spectral Clustering
Evelyn Trautmann
PyData Berlin
6-8 July, 2018
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 1 / 24
Introduction
Introduction
platform monitoring means keeping track of various metrics
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 2 / 24
Introduction
Introduction
platform monitoring means keeping track of various metrics
some effects are visible only in sub-dimensions like single countries, or
device types
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 2 / 24
Introduction
Introduction
platform monitoring means keeping track of various metrics
some effects are visible only in sub-dimensions like single countries, or
device types
dimension reduction without losing important details
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 2 / 24
Algorithm
Clustering
metrics considered as points in a vector space
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 3 / 24
Algorithm
Clustering
metrics considered as points in a vector space
similarity defines (inverse) distance measure
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 3 / 24
Algorithm
Clustering
metrics considered as points in a vector space
similarity defines (inverse) distance measure
find clusters of closely related points
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 3 / 24
Algorithm
Toy Example - Gaming App
metric 0 game a: count of games played by paying users
metric 1 game a: count of games played by non paying users
metric 2 game b: count of games played by paying users
metric 3 game b: count of games played by non paying users
...
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 4 / 24
Algorithm
Similarity Graph
Figure: Similarity Graph
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 5 / 24
Algorithm
Matrix Representation
Figure: Matrix Representation
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 6 / 24
Algorithm
Matrix Representation
Figure: Matrix Representation
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 7 / 24
Algorithm
Toy Example
Toy Example.ipynb
https://github.com/metterlein/spectral clustering.
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 8 / 24
Algorithm
Spectral Clustering
1: procedure SpectralClustering(X, k)
2: compute similarity matrix with pairwise similarities
3: Transform to Graph Laplacian L
4: v1, ..., vk = eig(L, k) first k eigenvectors
5: U = V T k n-dimensional → n k-dimensional vectors
6: clusterAssignment = kmeans(U, k)
7: return clusterAssignment
8: end procedure[vL07]
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 9 / 24
Parameters
Similarity Function
Similarity Function φ : Rn × Rn → [0, 1]
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 10 / 24
Parameters
Similarity Function
Similarity Function φ : Rn × Rn → [0, 1]
radial basis function [PVG+11]
φ(xi , xj ) = exp(−γ xi − xj
2
)
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 10 / 24
Parameters
Similarity Function
Similarity Function φ : Rn × Rn → [0, 1]
radial basis function [PVG+11]
φ(xi , xj ) = exp(−γ xi − xj
2
)
correlation based similarity
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 10 / 24
Parameters
Similarity Function
Similarity Function φ : Rn × Rn → [0, 1]
radial basis function [PVG+11]
φ(xi , xj ) = exp(−γ xi − xj
2
)
correlation based similarity
handle negative correlations
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 10 / 24
Parameters
Number of Clusters
with clear block structure eigenvalue gap is obvious
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 11 / 24
Parameters
Number of Clusters
with clear block structure eigenvalue gap is obvious
if connectivity amongst blocks is too large, determination of cluster
count is getting complex
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 11 / 24
Parameters
Number of Clusters
with clear block structure eigenvalue gap is obvious
if connectivity amongst blocks is too large, determination of cluster
count is getting complex
possible approach:
- PCA explained variance
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 11 / 24
Application
Data Description
metrics representing user activities, payments, etc
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 12 / 24
Application
Data Description
metrics representing user activities, payments, etc
each metric is divided in several dimensions like country, gender, etc
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 12 / 24
Application
Data Description
metrics representing user activities, payments, etc
each metric is divided in several dimensions like country, gender, etc
combination of metrics and dimensions generates around 300
timeseries in our example.
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 12 / 24
Application
Data Preparation
aggregation per day
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 13 / 24
Application
Data Preparation
aggregation per day
normalization X = 1
σ (X − µ), with µ mean value and σ standard
deviation
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 13 / 24
Application
Data Preparation
aggregation per day
normalization X = 1
σ (X − µ), with µ mean value and σ standard
deviation
rolling mean of 7 days to smooth weekly periodicity
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 13 / 24
Application
Measure Clustering Quality
compute variance for each timepoint over all cluster members
optimal clustering minimizes variance for each cluster
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 14 / 24
Demo
Real Data Example
Real Data Example.ipynb
https://github.com/metterlein/spectral clustering
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 15 / 24
Conclusion
Conclusion
clustering metrics reduces dimensionality of observation space
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 16 / 24
Conclusion
Conclusion
clustering metrics reduces dimensionality of observation space
small number of interpretable cluster representatives help keeping
track of main platform dynamics
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 16 / 24
Conclusion
Conclusion
clustering metrics reduces dimensionality of observation space
small number of interpretable cluster representatives help keeping
track of main platform dynamics
by cluster assignments can be discovered unexpected relations
between several metrics
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 16 / 24
Conclusion
Outlook
investigate cluster assignment change over time
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 17 / 24
Conclusion
Outlook
investigate cluster assignment change over time
hierarchical approach: clustering sub-blocks
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 17 / 24
Conclusion
Outlook
investigate cluster assignment change over time
hierarchical approach: clustering sub-blocks
add time shifted series to recognize Granger causalities
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 17 / 24
Conclusion
References I
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg,
J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and
E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of
Machine Learning Research 12 (2011), 2825–2830.
Evelyn Trautmann, Spectral clustering demo,
https://github.com/metterlein/spectral_clustering, 2018.
Ulrike von Luxburg, A tutorial on spectral clustering, CoRR
abs/0711.0189 (2007).
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 18 / 24
Conclusion
Thank You!
Questions?
evelyn.trautmann@lovoo.com
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 19 / 24
Conclusion
Graph Laplacian
Graph Laplacian with similarities as off-diagonal entries
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 20 / 24
Conclusion
Graph Laplacian
Graph Laplacian with similarities as off-diagonal entries
negative row sums on diagonal
L =
lij ≥ 0, for i = j
lii = − k=i lik, for i = j.
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 20 / 24
Conclusion
Graph Laplacian
Graph Laplacian with similarities as off-diagonal entries
negative row sums on diagonal
L =
lij ≥ 0, for i = j
lii = − k=i lik, for i = j.
matrix exponential is a stochastic matrix
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 20 / 24
Conclusion
Graph Laplacian
Graph Laplacian with similarities as off-diagonal entries
negative row sums on diagonal
L =
lij ≥ 0, for i = j
lii = − k=i lik, for i = j.
matrix exponential is a stochastic matrix
results from Markov chain theory applicable
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 20 / 24
Conclusion
Clustering Methods
[PVG+11]
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 21 / 24
Conclusion
PCA
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 22 / 24
Conclusion
PCA
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 23 / 24
Conclusion
Extract Platform Dynamics by means of Cluster
Centers
compute average timeseries per cluster
illustrate average cluster timeseries with variance corridor
visualize metrics types and subdimensions entering respective clusters
Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 24 / 24

Contenu connexe

Similaire à Extracting relevant Metrics with Spectral Clustering - Evelyn Trautmann

On Contracts and Sandboxes for JavaScript
On Contracts and Sandboxes for JavaScriptOn Contracts and Sandboxes for JavaScript
On Contracts and Sandboxes for JavaScriptMatthias Keil
 
Grouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big DataGrouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big DataFacultad de Informática UCM
 
Sessione II - Estimation methods and accuracy - P.D. Falorsi F. Petrarca, P...
Sessione II - Estimation methods and accuracy  -  P.D. Falorsi F. Petrarca, P...Sessione II - Estimation methods and accuracy  -  P.D. Falorsi F. Petrarca, P...
Sessione II - Estimation methods and accuracy - P.D. Falorsi F. Petrarca, P...Istituto nazionale di statistica
 
Detection of fraud in financial blockchain-based transactions through big dat...
Detection of fraud in financial blockchain-based transactions through big dat...Detection of fraud in financial blockchain-based transactions through big dat...
Detection of fraud in financial blockchain-based transactions through big dat...CARLOS III UNIVERSITY OF MADRID
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksTomaso Aste
 
Predictive Analytics for Alpha Generation and Risk Management
Predictive Analytics for Alpha Generation and Risk ManagementPredictive Analytics for Alpha Generation and Risk Management
Predictive Analytics for Alpha Generation and Risk ManagementYigal D. Jhirad
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Modelling Probability Distributions using Neural Networks: Applications to Me...
Modelling Probability Distributions using Neural Networks: Applications to Me...Modelling Probability Distributions using Neural Networks: Applications to Me...
Modelling Probability Distributions using Neural Networks: Applications to Me...Christian Baumgartner
 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataMPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataIRJET Journal
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GANDai-Hai Nguyen
 
Knowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsKnowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsEnrico Palumbo
 
Parallel Processing of Big Data in Finance for Alpha Generation and Risk Mana...
Parallel Processing of Big Data in Finance for Alpha Generation and Risk Mana...Parallel Processing of Big Data in Finance for Alpha Generation and Risk Mana...
Parallel Processing of Big Data in Finance for Alpha Generation and Risk Mana...Yigal D. Jhirad
 
CrossSim: exploiting mutual relationships to detect similar OSS projects
CrossSim: exploiting mutual relationships to detect similar OSS projectsCrossSim: exploiting mutual relationships to detect similar OSS projects
CrossSim: exploiting mutual relationships to detect similar OSS projectsDavide Ruscio
 
Presentation PRIDE Cluster HUPO 2014
Presentation PRIDE Cluster HUPO 2014Presentation PRIDE Cluster HUPO 2014
Presentation PRIDE Cluster HUPO 2014Juan Antonio Vizcaino
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 

Similaire à Extracting relevant Metrics with Spectral Clustering - Evelyn Trautmann (20)

On Contracts and Sandboxes for JavaScript
On Contracts and Sandboxes for JavaScriptOn Contracts and Sandboxes for JavaScript
On Contracts and Sandboxes for JavaScript
 
Thermography slide
Thermography slideThermography slide
Thermography slide
 
resume_MH
resume_MHresume_MH
resume_MH
 
Grouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big DataGrouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big Data
 
Sessione II - Estimation methods and accuracy - P.D. Falorsi F. Petrarca, P...
Sessione II - Estimation methods and accuracy  -  P.D. Falorsi F. Petrarca, P...Sessione II - Estimation methods and accuracy  -  P.D. Falorsi F. Petrarca, P...
Sessione II - Estimation methods and accuracy - P.D. Falorsi F. Petrarca, P...
 
Detection of fraud in financial blockchain-based transactions through big dat...
Detection of fraud in financial blockchain-based transactions through big dat...Detection of fraud in financial blockchain-based transactions through big dat...
Detection of fraud in financial blockchain-based transactions through big dat...
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
Predictive Analytics for Alpha Generation and Risk Management
Predictive Analytics for Alpha Generation and Risk ManagementPredictive Analytics for Alpha Generation and Risk Management
Predictive Analytics for Alpha Generation and Risk Management
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Modelling Probability Distributions using Neural Networks: Applications to Me...
Modelling Probability Distributions using Neural Networks: Applications to Me...Modelling Probability Distributions using Neural Networks: Applications to Me...
Modelling Probability Distributions using Neural Networks: Applications to Me...
 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataMPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GAN
 
Knowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsKnowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender Systems
 
An Evaluation of Models for Runtime Approximation in Link Discovery
An Evaluation of Models for Runtime Approximation in Link DiscoveryAn Evaluation of Models for Runtime Approximation in Link Discovery
An Evaluation of Models for Runtime Approximation in Link Discovery
 
Parallel Processing of Big Data in Finance for Alpha Generation and Risk Mana...
Parallel Processing of Big Data in Finance for Alpha Generation and Risk Mana...Parallel Processing of Big Data in Finance for Alpha Generation and Risk Mana...
Parallel Processing of Big Data in Finance for Alpha Generation and Risk Mana...
 
CrossSim: exploiting mutual relationships to detect similar OSS projects
CrossSim: exploiting mutual relationships to detect similar OSS projectsCrossSim: exploiting mutual relationships to detect similar OSS projects
CrossSim: exploiting mutual relationships to detect similar OSS projects
 
Presentation PRIDE Cluster HUPO 2014
Presentation PRIDE Cluster HUPO 2014Presentation PRIDE Cluster HUPO 2014
Presentation PRIDE Cluster HUPO 2014
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 

Plus de PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

Plus de PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Extracting relevant Metrics with Spectral Clustering - Evelyn Trautmann

  • 1. Extracting relevant Metrics with Spectral Clustering Evelyn Trautmann PyData Berlin 6-8 July, 2018 Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 1 / 24
  • 2. Introduction Introduction platform monitoring means keeping track of various metrics Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 2 / 24
  • 3. Introduction Introduction platform monitoring means keeping track of various metrics some effects are visible only in sub-dimensions like single countries, or device types Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 2 / 24
  • 4. Introduction Introduction platform monitoring means keeping track of various metrics some effects are visible only in sub-dimensions like single countries, or device types dimension reduction without losing important details Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 2 / 24
  • 5. Algorithm Clustering metrics considered as points in a vector space Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 3 / 24
  • 6. Algorithm Clustering metrics considered as points in a vector space similarity defines (inverse) distance measure Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 3 / 24
  • 7. Algorithm Clustering metrics considered as points in a vector space similarity defines (inverse) distance measure find clusters of closely related points Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 3 / 24
  • 8. Algorithm Toy Example - Gaming App metric 0 game a: count of games played by paying users metric 1 game a: count of games played by non paying users metric 2 game b: count of games played by paying users metric 3 game b: count of games played by non paying users ... Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 4 / 24
  • 9. Algorithm Similarity Graph Figure: Similarity Graph Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 5 / 24
  • 10. Algorithm Matrix Representation Figure: Matrix Representation Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 6 / 24
  • 11. Algorithm Matrix Representation Figure: Matrix Representation Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 7 / 24
  • 12. Algorithm Toy Example Toy Example.ipynb https://github.com/metterlein/spectral clustering. Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 8 / 24
  • 13. Algorithm Spectral Clustering 1: procedure SpectralClustering(X, k) 2: compute similarity matrix with pairwise similarities 3: Transform to Graph Laplacian L 4: v1, ..., vk = eig(L, k) first k eigenvectors 5: U = V T k n-dimensional → n k-dimensional vectors 6: clusterAssignment = kmeans(U, k) 7: return clusterAssignment 8: end procedure[vL07] Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 9 / 24
  • 14. Parameters Similarity Function Similarity Function φ : Rn × Rn → [0, 1] Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 10 / 24
  • 15. Parameters Similarity Function Similarity Function φ : Rn × Rn → [0, 1] radial basis function [PVG+11] φ(xi , xj ) = exp(−γ xi − xj 2 ) Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 10 / 24
  • 16. Parameters Similarity Function Similarity Function φ : Rn × Rn → [0, 1] radial basis function [PVG+11] φ(xi , xj ) = exp(−γ xi − xj 2 ) correlation based similarity Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 10 / 24
  • 17. Parameters Similarity Function Similarity Function φ : Rn × Rn → [0, 1] radial basis function [PVG+11] φ(xi , xj ) = exp(−γ xi − xj 2 ) correlation based similarity handle negative correlations Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 10 / 24
  • 18. Parameters Number of Clusters with clear block structure eigenvalue gap is obvious Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 11 / 24
  • 19. Parameters Number of Clusters with clear block structure eigenvalue gap is obvious if connectivity amongst blocks is too large, determination of cluster count is getting complex Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 11 / 24
  • 20. Parameters Number of Clusters with clear block structure eigenvalue gap is obvious if connectivity amongst blocks is too large, determination of cluster count is getting complex possible approach: - PCA explained variance Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 11 / 24
  • 21. Application Data Description metrics representing user activities, payments, etc Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 12 / 24
  • 22. Application Data Description metrics representing user activities, payments, etc each metric is divided in several dimensions like country, gender, etc Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 12 / 24
  • 23. Application Data Description metrics representing user activities, payments, etc each metric is divided in several dimensions like country, gender, etc combination of metrics and dimensions generates around 300 timeseries in our example. Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 12 / 24
  • 24. Application Data Preparation aggregation per day Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 13 / 24
  • 25. Application Data Preparation aggregation per day normalization X = 1 σ (X − µ), with µ mean value and σ standard deviation Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 13 / 24
  • 26. Application Data Preparation aggregation per day normalization X = 1 σ (X − µ), with µ mean value and σ standard deviation rolling mean of 7 days to smooth weekly periodicity Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 13 / 24
  • 27. Application Measure Clustering Quality compute variance for each timepoint over all cluster members optimal clustering minimizes variance for each cluster Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 14 / 24
  • 28. Demo Real Data Example Real Data Example.ipynb https://github.com/metterlein/spectral clustering Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 15 / 24
  • 29. Conclusion Conclusion clustering metrics reduces dimensionality of observation space Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 16 / 24
  • 30. Conclusion Conclusion clustering metrics reduces dimensionality of observation space small number of interpretable cluster representatives help keeping track of main platform dynamics Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 16 / 24
  • 31. Conclusion Conclusion clustering metrics reduces dimensionality of observation space small number of interpretable cluster representatives help keeping track of main platform dynamics by cluster assignments can be discovered unexpected relations between several metrics Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 16 / 24
  • 32. Conclusion Outlook investigate cluster assignment change over time Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 17 / 24
  • 33. Conclusion Outlook investigate cluster assignment change over time hierarchical approach: clustering sub-blocks Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 17 / 24
  • 34. Conclusion Outlook investigate cluster assignment change over time hierarchical approach: clustering sub-blocks add time shifted series to recognize Granger causalities Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 17 / 24
  • 35. Conclusion References I F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011), 2825–2830. Evelyn Trautmann, Spectral clustering demo, https://github.com/metterlein/spectral_clustering, 2018. Ulrike von Luxburg, A tutorial on spectral clustering, CoRR abs/0711.0189 (2007). Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 18 / 24
  • 36. Conclusion Thank You! Questions? evelyn.trautmann@lovoo.com Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 19 / 24
  • 37. Conclusion Graph Laplacian Graph Laplacian with similarities as off-diagonal entries Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 20 / 24
  • 38. Conclusion Graph Laplacian Graph Laplacian with similarities as off-diagonal entries negative row sums on diagonal L = lij ≥ 0, for i = j lii = − k=i lik, for i = j. Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 20 / 24
  • 39. Conclusion Graph Laplacian Graph Laplacian with similarities as off-diagonal entries negative row sums on diagonal L = lij ≥ 0, for i = j lii = − k=i lik, for i = j. matrix exponential is a stochastic matrix Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 20 / 24
  • 40. Conclusion Graph Laplacian Graph Laplacian with similarities as off-diagonal entries negative row sums on diagonal L = lij ≥ 0, for i = j lii = − k=i lik, for i = j. matrix exponential is a stochastic matrix results from Markov chain theory applicable Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 20 / 24
  • 41. Conclusion Clustering Methods [PVG+11] Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 21 / 24
  • 42. Conclusion PCA Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 22 / 24
  • 43. Conclusion PCA Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 23 / 24
  • 44. Conclusion Extract Platform Dynamics by means of Cluster Centers compute average timeseries per cluster illustrate average cluster timeseries with variance corridor visualize metrics types and subdimensions entering respective clusters Evelyn Trautmann PyData Berlin Extracting relevant Metrics with Spectral Clustering 6-8 July, 2018 24 / 24