2. 2015
Outline
Motivation for Graph-based Semi-Supervised Learning
Core Concepts of Graph-based techniques
Attribute Value Hyper-Graph Modelling
The end to end pipeline for the techniques
3. 2015
Motivation
Effective in propagating a limited amount of initial labels to a
huge pool of unlabeled data (expensive to annotate)
Many real life data sources are available as Graph
Graph provides natural representation for multi-model,
multi-format data from disparate sources
Increasingly rich detailed and massive data sources need a
promising paradigm for dealing with high-dimensional space
4. 2015
Core Concepts
Vertices: Labeled and
Unlabeled instances as nodes
Connections: Pairwise Edges
between vertices weighted by
affinities (similarities)
Prediction: Small portion of
vertices carrying seed labels
are harnessed to predict
unlabeled vertices
Information propagates from
labeled data points, termed
energy or heat
Graph-based semi-supervised:
Labelled and Unlabeled instances
together with the graph structure is
used in inference method
5. 2015
GSSL Algorithms
Several popular GSSL algorithms exist
− Graph Cuts, Graph-based random walks,
manifold regularization and graph transduction
Important Challenges
− large scale data prevent adoption in practice
− Noisy Contaminated labels
Novel Technique: Spectral Graph Analytics
with Hyper-Graph Heat diffusion
Credit: Dr. Avinash Sharma
6. 2015
Fundamental Components of GSSL
Graph
Construction
Information
Propagation
Framework
Label
Propagation
and Inference
Hypergraph
Modelling
Hyper-Graph
Heat Kernel
Framework &
Sparsification
Label
Propagation
and
Inference
Generalized
GSSL
techniques
Hypergraph
based heat
diffusion
technique
8. 2015
Connectivity in Graphs and Spectral Embedding
Connectivity structure can be explored by random walks
Projecting the graph into an isometric latent space (distance preserving)
This space is spanned by Eigen vectors of graph Laplacian matrix
10. 2015
Got Feedback?
Rate and review the session on our mobile app – Convene
For all details visit: http://ghcindia.anitaborg.org
Notes de l'éditeur
We compute spectrum (eigen vector and eigen values) of hypergraph Laplacian matrix
Use Laplacian to derive heat kernel matrix
Diffuse the churn labels using the heat kernel matrix
Infer churn labels by thresholding the diffused labels
Optimizations:
Sparsification
Scale dependent heat diffusion for label propagation over hypergraphs: diffuse at a large scale when the known labels are small and vice-versa. Small scale diffusion would enforce label propagation in smaller neighborhoods & large vice-versa.
Multiway relationships or higher order relations between the nodes
Hyperedges can be used to link multiple instances/customers with same attribute value e.g. customers who bought same product/services
Connectivity structure of a graph can be explored by random walks. A random walk on a graph is stochastic process which randomly jumps from vertex to vertex.
Shortest path is not important. Average Reach through a s. connectivity should be average connectivity.
Instead of simulating the exact random walk, which is a iterative process. At each node you have a probability that you can leave this node or which
Arora, Akhil
1) stochasic process
13:53
2) thus one has to perform multiple iterations for the RWR distance to converge
3) After convergence, the values represent the avg. connectivity betweeen a pair of nodes
Scale of label propagation governed by heat diffusion parameter t
** What is small churn and large chunr?
Traditional graph and hypergraph based diffusion techniques diffuse at a fixed scale obtained by taking inverse of L and use it as diffusion matrix. Instead, we propose to use multi-scale heat diffusion based propagation parameterized by scale parameter t which governs scale of diffusion.
Ans.
Small number of training data i.e., number of customers with known churn labels.
This is the last slide and must be included in the slide deck