SlideShare une entreprise Scribd logo
1  sur  10
2015
Graph Based Semi-
Supervised Learning
Neeta Pande, XRCI
3-Dec-2015
#GHCI15
2015
2015
Outline
 Motivation for Graph-based Semi-Supervised Learning
 Core Concepts of Graph-based techniques
 Attribute Value Hyper-Graph Modelling
 The end to end pipeline for the techniques
2015
Motivation
 Effective in propagating a limited amount of initial labels to a
huge pool of unlabeled data (expensive to annotate)
 Many real life data sources are available as Graph
 Graph provides natural representation for multi-model,
multi-format data from disparate sources
 Increasingly rich detailed and massive data sources need a
promising paradigm for dealing with high-dimensional space
2015
Core Concepts
 Vertices: Labeled and
Unlabeled instances as nodes
 Connections: Pairwise Edges
between vertices weighted by
affinities (similarities)
 Prediction: Small portion of
vertices carrying seed labels
are harnessed to predict
unlabeled vertices
 Information propagates from
labeled data points, termed
energy or heat
Graph-based semi-supervised:
Labelled and Unlabeled instances
together with the graph structure is
used in inference method
2015
GSSL Algorithms
 Several popular GSSL algorithms exist
− Graph Cuts, Graph-based random walks,
manifold regularization and graph transduction
 Important Challenges
− large scale data prevent adoption in practice
− Noisy Contaminated labels
Novel Technique: Spectral Graph Analytics
with Hyper-Graph Heat diffusion
Credit: Dr. Avinash Sharma
2015
Fundamental Components of GSSL
Graph
Construction
Information
Propagation
Framework
Label
Propagation
and Inference
Hypergraph
Modelling
Hyper-Graph
Heat Kernel
Framework &
Sparsification
Label
Propagation
and
Inference
Generalized
GSSL
techniques
Hypergraph
based heat
diffusion
technique
2015
Customer
he1 he2 he3 he4
v1 1 0 0 0
v2 1 0 1 0
v3 0 0 1 1
v4 0 0 1 0
v5 0 1 0 0
v6 0 1 1 1
v7 0 1 0 0
v8 0 1 0 1
Age
(20-30)
Age
(30-40)
Service
(Voice)
Service
(Data)
v4
v3
v1
v2
v5
v7
v6
he3
he2
he1
he4
v8
Hyper-graph Hyper-graph Incidence Matrix
Hyper-Graph Modelling
v7
v1
v6
v5
v2 v3
v4
v8
Simple-graph
e1
e2
e3
e4
e6
e5
2015
Connectivity in Graphs and Spectral Embedding
 Connectivity structure can be explored by random walks
 Projecting the graph into an isometric latent space (distance preserving)
 This space is spanned by Eigen vectors of graph Laplacian matrix
2015
Heat-
Diffusion
HyperEdge
Induction
Label
Propagation
Label
Inference
The heat-diffusion pipeline
Spectral
Embedding
Label, 𝑦 =
1 : 𝑐ℎ𝑢𝑟𝑛𝑒𝑟,
-1 : 𝑙𝑜𝑦𝑎𝑙,
0 : 𝑢𝑛𝑘𝑛𝑜𝑤𝑛
Apply
threshold
on diffused
labels
• Scale dependent heat diffusion: known/unknown labels and outliers
• SVD reduces high dimensional space, helping scale
• Hypergraph modelling for scale and better similarity measure based on
multiway relationships
• Sparsification for noise reduction, scale and better similarity measure
2015
Got Feedback?
Rate and review the session on our mobile app – Convene
For all details visit: http://ghcindia.anitaborg.org

Contenu connexe

Tendances

TO FIND AREA UNDER THE CURVE USING INTEGRATION
TO FIND AREA UNDER THE CURVE USING INTEGRATIONTO FIND AREA UNDER THE CURVE USING INTEGRATION
TO FIND AREA UNDER THE CURVE USING INTEGRATIONAbdurrahmaan Kazi
 
Optimal route queries with arbitrary order constraints
Optimal route queries with arbitrary order constraintsOptimal route queries with arbitrary order constraints
Optimal route queries with arbitrary order constraintsIEEEFINALYEARPROJECTS
 
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph CompletionNaomi Shiraishi
 
SCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATION
SCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATIONSCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATION
SCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATIONNexgen Technology
 
Neo4j - Rik Van Bruggen
Neo4j - Rik Van BruggenNeo4j - Rik Van Bruggen
Neo4j - Rik Van Bruggenbigdatalondon
 
tetracomTTP-FE_EVON_v2
tetracomTTP-FE_EVON_v2tetracomTTP-FE_EVON_v2
tetracomTTP-FE_EVON_v2Dejan Dovzan
 
Fast top k path-based relevance query on massive graphs
Fast top k path-based relevance query on massive graphsFast top k path-based relevance query on massive graphs
Fast top k path-based relevance query on massive graphsieeechennai
 
Development Infographic
Development InfographicDevelopment Infographic
Development InfographicRealMassive
 
Using TensorFlow for Machine Learning
Using TensorFlow for Machine LearningUsing TensorFlow for Machine Learning
Using TensorFlow for Machine LearningJustin Brandenburg
 
Conditional Matching Preclusion Number of Certain Graphs
Conditional Matching Preclusion Number of Certain GraphsConditional Matching Preclusion Number of Certain Graphs
Conditional Matching Preclusion Number of Certain Graphsijcoa
 
Exploring Abandoned GIS Research to Augment Applied Geography Education
Exploring Abandoned GIS Research to Augment Applied Geography EducationExploring Abandoned GIS Research to Augment Applied Geography Education
Exploring Abandoned GIS Research to Augment Applied Geography EducationMichael DeMers
 
[SECSI 2018] CONAMO - Continuous Athlete Monitoring through a Real-Time Senso...
[SECSI 2018] CONAMO - Continuous Athlete Monitoring through a Real-Time Senso...[SECSI 2018] CONAMO - Continuous Athlete Monitoring through a Real-Time Senso...
[SECSI 2018] CONAMO - Continuous Athlete Monitoring through a Real-Time Senso...Gilles Vandewiele
 
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...I3E Technologies
 
Berlin buzzwords 2018
Berlin buzzwords 2018Berlin buzzwords 2018
Berlin buzzwords 2018Rekha Joshi
 
Planet lab : cloud vs grid computing
Planet lab : cloud vs grid computingPlanet lab : cloud vs grid computing
Planet lab : cloud vs grid computingGaurav Singh
 

Tendances (20)

TO FIND AREA UNDER THE CURVE USING INTEGRATION
TO FIND AREA UNDER THE CURVE USING INTEGRATIONTO FIND AREA UNDER THE CURVE USING INTEGRATION
TO FIND AREA UNDER THE CURVE USING INTEGRATION
 
Optimal route queries with arbitrary order constraints
Optimal route queries with arbitrary order constraintsOptimal route queries with arbitrary order constraints
Optimal route queries with arbitrary order constraints
 
Cross cloud map reduce for big data
Cross cloud map reduce for big dataCross cloud map reduce for big data
Cross cloud map reduce for big data
 
Data visualization with R
Data visualization with RData visualization with R
Data visualization with R
 
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
論文紹介:Graph Pattern Entity Ranking Model for Knowledge Graph Completion
 
Resume
ResumeResume
Resume
 
SCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATION
SCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATIONSCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATION
SCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATION
 
Neo4j - Rik Van Bruggen
Neo4j - Rik Van BruggenNeo4j - Rik Van Bruggen
Neo4j - Rik Van Bruggen
 
Mp resume
Mp resumeMp resume
Mp resume
 
tetracomTTP-FE_EVON_v2
tetracomTTP-FE_EVON_v2tetracomTTP-FE_EVON_v2
tetracomTTP-FE_EVON_v2
 
Fast top k path-based relevance query on massive graphs
Fast top k path-based relevance query on massive graphsFast top k path-based relevance query on massive graphs
Fast top k path-based relevance query on massive graphs
 
Development Infographic
Development InfographicDevelopment Infographic
Development Infographic
 
Using TensorFlow for Machine Learning
Using TensorFlow for Machine LearningUsing TensorFlow for Machine Learning
Using TensorFlow for Machine Learning
 
Conditional Matching Preclusion Number of Certain Graphs
Conditional Matching Preclusion Number of Certain GraphsConditional Matching Preclusion Number of Certain Graphs
Conditional Matching Preclusion Number of Certain Graphs
 
Exploring Abandoned GIS Research to Augment Applied Geography Education
Exploring Abandoned GIS Research to Augment Applied Geography EducationExploring Abandoned GIS Research to Augment Applied Geography Education
Exploring Abandoned GIS Research to Augment Applied Geography Education
 
[SECSI 2018] CONAMO - Continuous Athlete Monitoring through a Real-Time Senso...
[SECSI 2018] CONAMO - Continuous Athlete Monitoring through a Real-Time Senso...[SECSI 2018] CONAMO - Continuous Athlete Monitoring through a Real-Time Senso...
[SECSI 2018] CONAMO - Continuous Athlete Monitoring through a Real-Time Senso...
 
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
 
Session 57 Gideon Mbiydzenyuy
Session 57 Gideon MbiydzenyuySession 57 Gideon Mbiydzenyuy
Session 57 Gideon Mbiydzenyuy
 
Berlin buzzwords 2018
Berlin buzzwords 2018Berlin buzzwords 2018
Berlin buzzwords 2018
 
Planet lab : cloud vs grid computing
Planet lab : cloud vs grid computingPlanet lab : cloud vs grid computing
Planet lab : cloud vs grid computing
 

En vedette

Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised LearningLukas Tencer
 
4 avrachenkov
4 avrachenkov4 avrachenkov
4 avrachenkovYandex
 
Patterns in Interactive Tagging Networks
Patterns in Interactive Tagging NetworksPatterns in Interactive Tagging Networks
Patterns in Interactive Tagging NetworksYuto Yamaguchi
 
CVPR2010: Semi-supervised Learning in Vision: Part 2: Theory
CVPR2010: Semi-supervised Learning in Vision: Part 2: TheoryCVPR2010: Semi-supervised Learning in Vision: Part 2: Theory
CVPR2010: Semi-supervised Learning in Vision: Part 2: Theoryzukun
 
SocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with ConfidenceSocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with ConfidenceYuto Yamaguchi
 
02 probabilistic inference in graphical models
02 probabilistic inference in graphical models02 probabilistic inference in graphical models
02 probabilistic inference in graphical modelszukun
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsNYC Predictive Analytics
 
Semi supervised learning
Semi supervised learningSemi supervised learning
Semi supervised learningAhmed Taha
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphsNicola Barbieri
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social MediaSymeon Papadopoulos
 
Label propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPLabel propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPDavid Przybilla
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 

En vedette (14)

Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 
4 avrachenkov
4 avrachenkov4 avrachenkov
4 avrachenkov
 
Patterns in Interactive Tagging Networks
Patterns in Interactive Tagging NetworksPatterns in Interactive Tagging Networks
Patterns in Interactive Tagging Networks
 
CVPR2010: Semi-supervised Learning in Vision: Part 2: Theory
CVPR2010: Semi-supervised Learning in Vision: Part 2: TheoryCVPR2010: Semi-supervised Learning in Vision: Part 2: Theory
CVPR2010: Semi-supervised Learning in Vision: Part 2: Theory
 
SocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with ConfidenceSocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with Confidence
 
02 probabilistic inference in graphical models
02 probabilistic inference in graphical models02 probabilistic inference in graphical models
02 probabilistic inference in graphical models
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
Semi supervised learning
Semi supervised learningSemi supervised learning
Semi supervised learning
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphs
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
 
Label propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPLabel propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLP
 
Xgboost
XgboostXgboost
Xgboost
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 

Similaire à Graph based Semi Supervised Learning V1

BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONScscpconf
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions csandit
 
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...ErhardRahm
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesStratio
 
Machine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroMachine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroGraphAware
 
How Graphs Enhance AI
How Graphs Enhance AIHow Graphs Enhance AI
How Graphs Enhance AINeo4j
 
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4jScalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4jNeo4j
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
Data Imputation by Soft Computing
Data Imputation by Soft ComputingData Imputation by Soft Computing
Data Imputation by Soft Computingijtsrd
 
How Graph Technology is Changing AI
How Graph Technology is Changing AIHow Graph Technology is Changing AI
How Graph Technology is Changing AIDatabricks
 
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetGraph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetTigerGraph
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstracttsysglobalsolutions
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AINeo4j
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...Subhajit Sahu
 
Optimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jOptimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jNeo4j
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AINeo4j
 
How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...Nicolle Dammann
 

Similaire à Graph based Semi Supervised Learning V1 (20)

BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
 
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph Datasources
 
Machine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroMachine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro Negro
 
How Graphs Enhance AI
How Graphs Enhance AIHow Graphs Enhance AI
How Graphs Enhance AI
 
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4jScalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Data Imputation by Soft Computing
Data Imputation by Soft ComputingData Imputation by Soft Computing
Data Imputation by Soft Computing
 
How Graph Technology is Changing AI
How Graph Technology is Changing AIHow Graph Technology is Changing AI
How Graph Technology is Changing AI
 
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetGraph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
 
Optimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jOptimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4j
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...
 
587_EswarPrasadReddyMachireddy_CEE
587_EswarPrasadReddyMachireddy_CEE587_EswarPrasadReddyMachireddy_CEE
587_EswarPrasadReddyMachireddy_CEE
 

Graph based Semi Supervised Learning V1

  • 1. 2015 Graph Based Semi- Supervised Learning Neeta Pande, XRCI 3-Dec-2015 #GHCI15 2015
  • 2. 2015 Outline  Motivation for Graph-based Semi-Supervised Learning  Core Concepts of Graph-based techniques  Attribute Value Hyper-Graph Modelling  The end to end pipeline for the techniques
  • 3. 2015 Motivation  Effective in propagating a limited amount of initial labels to a huge pool of unlabeled data (expensive to annotate)  Many real life data sources are available as Graph  Graph provides natural representation for multi-model, multi-format data from disparate sources  Increasingly rich detailed and massive data sources need a promising paradigm for dealing with high-dimensional space
  • 4. 2015 Core Concepts  Vertices: Labeled and Unlabeled instances as nodes  Connections: Pairwise Edges between vertices weighted by affinities (similarities)  Prediction: Small portion of vertices carrying seed labels are harnessed to predict unlabeled vertices  Information propagates from labeled data points, termed energy or heat Graph-based semi-supervised: Labelled and Unlabeled instances together with the graph structure is used in inference method
  • 5. 2015 GSSL Algorithms  Several popular GSSL algorithms exist − Graph Cuts, Graph-based random walks, manifold regularization and graph transduction  Important Challenges − large scale data prevent adoption in practice − Noisy Contaminated labels Novel Technique: Spectral Graph Analytics with Hyper-Graph Heat diffusion Credit: Dr. Avinash Sharma
  • 6. 2015 Fundamental Components of GSSL Graph Construction Information Propagation Framework Label Propagation and Inference Hypergraph Modelling Hyper-Graph Heat Kernel Framework & Sparsification Label Propagation and Inference Generalized GSSL techniques Hypergraph based heat diffusion technique
  • 7. 2015 Customer he1 he2 he3 he4 v1 1 0 0 0 v2 1 0 1 0 v3 0 0 1 1 v4 0 0 1 0 v5 0 1 0 0 v6 0 1 1 1 v7 0 1 0 0 v8 0 1 0 1 Age (20-30) Age (30-40) Service (Voice) Service (Data) v4 v3 v1 v2 v5 v7 v6 he3 he2 he1 he4 v8 Hyper-graph Hyper-graph Incidence Matrix Hyper-Graph Modelling v7 v1 v6 v5 v2 v3 v4 v8 Simple-graph e1 e2 e3 e4 e6 e5
  • 8. 2015 Connectivity in Graphs and Spectral Embedding  Connectivity structure can be explored by random walks  Projecting the graph into an isometric latent space (distance preserving)  This space is spanned by Eigen vectors of graph Laplacian matrix
  • 9. 2015 Heat- Diffusion HyperEdge Induction Label Propagation Label Inference The heat-diffusion pipeline Spectral Embedding Label, 𝑦 = 1 : 𝑐ℎ𝑢𝑟𝑛𝑒𝑟, -1 : 𝑙𝑜𝑦𝑎𝑙, 0 : 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 Apply threshold on diffused labels • Scale dependent heat diffusion: known/unknown labels and outliers • SVD reduces high dimensional space, helping scale • Hypergraph modelling for scale and better similarity measure based on multiway relationships • Sparsification for noise reduction, scale and better similarity measure
  • 10. 2015 Got Feedback? Rate and review the session on our mobile app – Convene For all details visit: http://ghcindia.anitaborg.org

Notes de l'éditeur

  1. We compute spectrum (eigen vector and eigen values) of hypergraph Laplacian matrix Use Laplacian to derive heat kernel matrix Diffuse the churn labels using the heat kernel matrix Infer churn labels by thresholding the diffused labels Optimizations: Sparsification Scale dependent heat diffusion for label propagation over hypergraphs: diffuse at a large scale when the known labels are small and vice-versa. Small scale diffusion would enforce label propagation in smaller neighborhoods & large vice-versa.
  2. Multiway relationships or higher order relations between the nodes Hyperedges can be used to link multiple instances/customers with same attribute value e.g. customers who bought same product/services
  3. Connectivity structure of a graph can be explored by random walks. A random walk on a graph is stochastic process which randomly jumps from vertex to vertex. Shortest path is not important. Average Reach through a s. connectivity should be average connectivity. Instead of simulating the exact random walk, which is a iterative process. At each node you have a probability that you can leave this node or which Arora, Akhil 1) stochasic process   13:53   2) thus one has to perform multiple iterations for the RWR distance to converge       3) After convergence, the values represent the avg. connectivity betweeen a pair of nodes      
  4. Scale of label propagation governed by heat diffusion parameter t ** What is small churn and large chunr? Traditional graph and hypergraph based diffusion techniques diffuse at a fixed scale obtained by taking inverse of L and use it as diffusion matrix. Instead, we propose to use multi-scale heat diffusion based propagation parameterized by scale parameter t which governs scale of diffusion. Ans. Small number of training data i.e., number of customers with known churn labels.
  5. This is the last slide and must be included in the slide deck