SlideShare a Scribd company logo
1 of 23
Download to read offline
MANIFOLDS IN SEMI-SUPERVISED LEARNING
Monojit Basu
Director, TechYugadi IT Solutions & Consulting, Bangalore
EXTENDED
2
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
3
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
4
Semi-supervized Learning: Overview
● Training Samples consist of data with and without class label
● Images with and without captions
● Text with and without tags, ..
● Model is built with both labeled and unlabeled data
Prob(y|x) Prob(x)
● Smoothness Property: If two data points are close, their labels
should be similar
Label Data
Based on labeled samples Based on both labeled
and unlabeled samples
5
Graph-based Algorithms For SSL
● There are many many ways of exploiting smoothness property
● A simplistic baseline approach is self-training (not graph-based)
● Graph-based Algorithms are particularly effective
● Label Propagation
● Random-Walk
● Min-Cut
● Density-based Distances
● Local and Global Consistency
● Using Graph Kernels, ..
6
Label Propagation
● Generates a weighted graph where edges between similar
neighbours have higher weights (Zhu and Ghahramani, 2002)
● Defines a transition matrix:
● Tij = probability of node i ‘jumping’ into node j, that is, taking up j’s label
● Repeatedly multiplies the current label matrix with the transition
matrix (which itself gets updated)
● Until labels on all nodes stabilize (convergence)
● In effect labels propagate from labeled to unlabeled nodes
1
1
1
00
0 unlabeled
7
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
8
Manifold Structures
● Data (nodes) are distributed over low and high density regions
● Two nodes that are geometrically close may not be similar
● Or equivalently, the geometry / distance measure should be redefined
● Euclidean distances and weights based on them may not work
● Such data is said to lie on a manifold
● Although not necessary, manifold structures are often
observed with high-dimensional data
● More complex scenario: data may not lie on a single manifold
● This is called multi-manifold structure
9
Single Manifold Structures
SWISS ROLL TWO MOONS
10
Multi-manifold Structures
$
Dollar Symbol
Surface Sphere
11
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
12
Manifold Regularization
● This is the technical term for semi-supervized classification of
data distributed on a (single) manifold (Belkin et al., 2006)
● Key is to establish connectivity between similar nodes by
staying along a high-density region
● Mathematically it involves
● Computing a matrix L derived from the ordinary weight matrix W
● Taking the top n eigenvalues of L
● Computing an indicator function using the dot product of a data point
with the eigenvalues
● It is based on a theory known as Kernel Hilbert Spaces
13
Maniford Regularization (Schematic)
DATA
W
L=D-W
Eigen(L)
dotxData Point >0
+ve
-ve
CLASS LABELS
14
Multi-manifold Regularization
● This is the technical term for semi-supervized classification of
data distributed on a multi-manifold (Goldberg et al., 2009)
● Single manifold algorithm still starts with Euclidean distances,
but reformulates steps based on the derived matrix L
● Multi-manifold algorithm straight away changes distance
metrics
● It is based on Hellinger distances H, and
● A Mahalnabis k-nearest neighbor graph computed from H
● Complete algorithm is much longer, involving spectral
clustering and self-training on each cluster
15
Multi-manifold Regularization (Schematic)
DATA
Σs
Sample Cov. Mat.
H
kNN graph
Spectral Clustering
Self-trained Clusters
16
Multi-view Semi-supervised Learning
● Multi-view learning involves two or more independent
projections for each data point
● Classic Example: web-page classification using
● Bag of words
● Links to other web-pages
● Instead of representing data as (X, y) where y is class label, it
may be represented as (X1, X2, y), where Xi are views
● Somewhat related to multimodal learning (like video and
audio)
17
Multi-view Manifold Regularization
● Can manifold regularization be extended to multi-view data
● Yes, algorithms exist, based on strong mathematical
foundations, like Sindhwani and Rosenberg, 2008
● There is actually a generic pattern for multi-view semi-
supervized learning, called co-training
● Sindhwani et al., extends co-training with an algorithm called
co-regularization
● It reduces the problem to a convex optimization to minimize a
loss function
● The total loss function depends on individual class predictors
for each view, and a couple of regularization hyperparameters
18
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
19
Python Implementation
● An implementation of some of these algorithms in Python 3.x is
published on github:
https://github.com/techyugadi/manifold_ssl
● These algorithms offer an interface similar to scikit-learn
● There are some programs to generate synthetic data and also
use the MNIST handwritten digits data
● Note: scikit-learn as of now supports only label propagation
algorithm for semi-supervized learning
● R package has more algorithms but not maifold regularization
● This is early-access release, more algorithms to be published !
20
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
21
Summary
● Manifold regularization is an improvement over the standard
label propagation algorithm for semi-supervised learning
● It may lead to better results when data is distributed over a
manifold or multi-manifold
● This class of algorithms cover a wide range of scenarios,
including multi-view datasets
● These algorithms can be implemented in Python using
common numpy and linear algebra packages (see github)
22
References
● Zhu and Ghahramani, 2002: Learning from Labeled and
Unlabeled Data with Label Propagation
● Belkin, Niyogi and Sindhwani, 2006: Manifold Regularization:
A Geometric Framework for Learning from Labeled and
Unlabeled Examples
● Sindhwani and Rosenberg, 2008: An RKHS for Multi-View
Learning and Manifold Co-Regularization
● Goldberg, Zhu, Singh, Xu and Nowak, 2009: Multi-Manifold
Semi-Supervised Learning
23
THANK YOU
monojit@techyugadi.com

More Related Content

Similar to NODES 2020 extended - Manifolds in semi-supervised learning

Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine LearningSudarsun Santhiappan
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkPetr Zapletal
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series ForecastingBillTubbs
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksSeunghyun Hwang
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
Object Oriented, Design patterns and data modelling worshop
Object Oriented, Design patterns and data modelling worshopObject Oriented, Design patterns and data modelling worshop
Object Oriented, Design patterns and data modelling worshopMohammad Shawahneh
 
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Pramati Technologies
 
Single Responsibility Principle
Single Responsibility PrincipleSingle Responsibility Principle
Single Responsibility PrincipleBADR
 
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell ReboNYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell ReboMaryam Farooq
 
Ad Click Prediction - Paper review
Ad Click Prediction - Paper reviewAd Click Prediction - Paper review
Ad Click Prediction - Paper reviewMazen Aly
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Anant Corporation
 
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...thanhdowork
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Sparkdatamantra
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowAnant Corporation
 
End to end MLworkflows
End to end MLworkflowsEnd to end MLworkflows
End to end MLworkflowsAdam Gibson
 
OpenHPI - Parallel Programming Concepts - Week 6
OpenHPI - Parallel Programming Concepts - Week 6OpenHPI - Parallel Programming Concepts - Week 6
OpenHPI - Parallel Programming Concepts - Week 6Peter Tröger
 
Deep Semi-supervised Learning methods
Deep Semi-supervised Learning methodsDeep Semi-supervised Learning methods
Deep Semi-supervised Learning methodsPrincy Joy
 

Similar to NODES 2020 extended - Manifolds in semi-supervised learning (20)

Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on Spark
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series Forecasting
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Object Oriented, Design patterns and data modelling worshop
Object Oriented, Design patterns and data modelling worshopObject Oriented, Design patterns and data modelling worshop
Object Oriented, Design patterns and data modelling worshop
 
C3 w3
C3 w3C3 w3
C3 w3
 
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
 
Single Responsibility Principle
Single Responsibility PrincipleSingle Responsibility Principle
Single Responsibility Principle
 
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell ReboNYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
 
Ad Click Prediction - Paper review
Ad Click Prediction - Paper reviewAd Click Prediction - Paper review
Ad Click Prediction - Paper review
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
 
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with Airflow
 
End to end MLworkflows
End to end MLworkflowsEnd to end MLworkflows
End to end MLworkflows
 
OpenHPI - Parallel Programming Concepts - Week 6
OpenHPI - Parallel Programming Concepts - Week 6OpenHPI - Parallel Programming Concepts - Week 6
OpenHPI - Parallel Programming Concepts - Week 6
 
Deep Semi-supervised Learning methods
Deep Semi-supervised Learning methodsDeep Semi-supervised Learning methods
Deep Semi-supervised Learning methods
 

More from Neo4j

QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...Neo4j
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AINeo4j
 

More from Neo4j (20)

QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
 

Recently uploaded

What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 

Recently uploaded (20)

What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 

NODES 2020 extended - Manifolds in semi-supervised learning

  • 1. MANIFOLDS IN SEMI-SUPERVISED LEARNING Monojit Basu Director, TechYugadi IT Solutions & Consulting, Bangalore EXTENDED
  • 2. 2 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 3. 3 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 4. 4 Semi-supervized Learning: Overview ● Training Samples consist of data with and without class label ● Images with and without captions ● Text with and without tags, .. ● Model is built with both labeled and unlabeled data Prob(y|x) Prob(x) ● Smoothness Property: If two data points are close, their labels should be similar Label Data Based on labeled samples Based on both labeled and unlabeled samples
  • 5. 5 Graph-based Algorithms For SSL ● There are many many ways of exploiting smoothness property ● A simplistic baseline approach is self-training (not graph-based) ● Graph-based Algorithms are particularly effective ● Label Propagation ● Random-Walk ● Min-Cut ● Density-based Distances ● Local and Global Consistency ● Using Graph Kernels, ..
  • 6. 6 Label Propagation ● Generates a weighted graph where edges between similar neighbours have higher weights (Zhu and Ghahramani, 2002) ● Defines a transition matrix: ● Tij = probability of node i ‘jumping’ into node j, that is, taking up j’s label ● Repeatedly multiplies the current label matrix with the transition matrix (which itself gets updated) ● Until labels on all nodes stabilize (convergence) ● In effect labels propagate from labeled to unlabeled nodes 1 1 1 00 0 unlabeled
  • 7. 7 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 8. 8 Manifold Structures ● Data (nodes) are distributed over low and high density regions ● Two nodes that are geometrically close may not be similar ● Or equivalently, the geometry / distance measure should be redefined ● Euclidean distances and weights based on them may not work ● Such data is said to lie on a manifold ● Although not necessary, manifold structures are often observed with high-dimensional data ● More complex scenario: data may not lie on a single manifold ● This is called multi-manifold structure
  • 11. 11 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 12. 12 Manifold Regularization ● This is the technical term for semi-supervized classification of data distributed on a (single) manifold (Belkin et al., 2006) ● Key is to establish connectivity between similar nodes by staying along a high-density region ● Mathematically it involves ● Computing a matrix L derived from the ordinary weight matrix W ● Taking the top n eigenvalues of L ● Computing an indicator function using the dot product of a data point with the eigenvalues ● It is based on a theory known as Kernel Hilbert Spaces
  • 14. 14 Multi-manifold Regularization ● This is the technical term for semi-supervized classification of data distributed on a multi-manifold (Goldberg et al., 2009) ● Single manifold algorithm still starts with Euclidean distances, but reformulates steps based on the derived matrix L ● Multi-manifold algorithm straight away changes distance metrics ● It is based on Hellinger distances H, and ● A Mahalnabis k-nearest neighbor graph computed from H ● Complete algorithm is much longer, involving spectral clustering and self-training on each cluster
  • 15. 15 Multi-manifold Regularization (Schematic) DATA Σs Sample Cov. Mat. H kNN graph Spectral Clustering Self-trained Clusters
  • 16. 16 Multi-view Semi-supervised Learning ● Multi-view learning involves two or more independent projections for each data point ● Classic Example: web-page classification using ● Bag of words ● Links to other web-pages ● Instead of representing data as (X, y) where y is class label, it may be represented as (X1, X2, y), where Xi are views ● Somewhat related to multimodal learning (like video and audio)
  • 17. 17 Multi-view Manifold Regularization ● Can manifold regularization be extended to multi-view data ● Yes, algorithms exist, based on strong mathematical foundations, like Sindhwani and Rosenberg, 2008 ● There is actually a generic pattern for multi-view semi- supervized learning, called co-training ● Sindhwani et al., extends co-training with an algorithm called co-regularization ● It reduces the problem to a convex optimization to minimize a loss function ● The total loss function depends on individual class predictors for each view, and a couple of regularization hyperparameters
  • 18. 18 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 19. 19 Python Implementation ● An implementation of some of these algorithms in Python 3.x is published on github: https://github.com/techyugadi/manifold_ssl ● These algorithms offer an interface similar to scikit-learn ● There are some programs to generate synthetic data and also use the MNIST handwritten digits data ● Note: scikit-learn as of now supports only label propagation algorithm for semi-supervized learning ● R package has more algorithms but not maifold regularization ● This is early-access release, more algorithms to be published !
  • 20. 20 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 21. 21 Summary ● Manifold regularization is an improvement over the standard label propagation algorithm for semi-supervised learning ● It may lead to better results when data is distributed over a manifold or multi-manifold ● This class of algorithms cover a wide range of scenarios, including multi-view datasets ● These algorithms can be implemented in Python using common numpy and linear algebra packages (see github)
  • 22. 22 References ● Zhu and Ghahramani, 2002: Learning from Labeled and Unlabeled Data with Label Propagation ● Belkin, Niyogi and Sindhwani, 2006: Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples ● Sindhwani and Rosenberg, 2008: An RKHS for Multi-View Learning and Manifold Co-Regularization ● Goldberg, Zhu, Singh, Xu and Nowak, 2009: Multi-Manifold Semi-Supervised Learning