SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
Label Propagation
                           Seminar:
Semi-supervised and unsupervised learning with Applications to NLP




                                               David Przybilla
                                  davida@coli.uni-saarland.de
Outline

●   What is Label Propagation

●   The Algorithm

●   The motivation behind the algorithm

●   Parameters of Label Propagation

●   Relation Extraction with Label Propagation
Label Propagation

●   Semi-supervised

●   Shows good results when the amount of
    annotated data is low with respect to the
    supervised options

●   Similar to kNN
K-Nearest Neighbors(KNN)

           ●   Shares similar ideas
               with Label Propagation

           ●   Label Propagation
               (LP) uses unlabeled
               instances during the
               process of finding out
               the labels
Idea of the Problem
                    Similar near Unlabeled
                    Instances should have
                    similar Labels




       L=set of Labeled Instances
       U =set of Unlabeled Instances
We want to find a function f such that:
The Model
●   A complete graph
     ● Each Node is an instance

     ●
       Each arc has a weight T xy




    ●   T xy is high if Nodes x and   y are similar.
The Model
●   Inside a Node:


               Soft Labels
Variables - Model
  ●   T is a matrix, holding all the weights of the graph

                                  N 1 ... N l = Labeled Data
             TllTlu               N l+1 .. N n=Unlabeled Data
             T u lT u u
Tll
Tlu
T ul
T uu
Variables - Model
●   Y is a matrix, holding the soft probabilities of
    each instance

                           YN   a                            n
                                    , R b is the probability of a
                                           being labeled as R b
            YL
            YU

                                          The problem to solve

R1 , R 2 ... R k each of the possible labels
N 1 , N 2 ... N n each of the instances to label
Algorithm




            Y will change in
              each iteration
How to Measure T?

                                                          Distance
                                                          Measure




                                         Euclidean Distance
Important Parameter
(ignore it at the moment) we will talk about this later
How to Initialize Y?
                                                 0
    ●   How to Correctly set the values of   Y       ?

    ●   Fill the known values (of the labeled data)

    ●   How to fill the values of the unlabeled data?
         → The initialization of this values can be
        arbitrary.


●   Transform T into T' (row normalization)
Propagation Step
●   During the process Y will change

                           0        1              k
                       Y    →   Y    → ... →   Y

    ●   Update   Y   during each iteration
Convergence
During the iteration
                  Clamped


     Yl                       ̄
                             T l l T̄l u               Yl
                   =
     Yu                      T̄u l T̄ u
                                     u
                                                       Yu

  Assumming we iterate infinite times then:
              1
            Y =T
              U
                ̄uu Y 0+ T ul Y L
                      u
                          ̄
              2
            Y =T
              U
                ̄uu ( T̄uu Y 0 + T ul Y L )+T ul Y L
                             u
                                  ̄          ̄
                     ...
Convergence
      ̄
Since T is normalized and                          ̄
                                 is a submatrix of T:



Doing it n times will lead to:




                                   Converges to Zero
After convergence
After convergence one can find   by solving:

               =
Optimization Problem


               w i j : Similarity between i j

   F should minimize the energy function



f (i ) and f ( j) should be similar for a high w i j
       in order to minimize
The graph laplacian
Let D be a diagonal matrix where

                            T̄i j            Rows are normalized so:
                                              D= I
The graph laplacian is defined as :

                                    ̄
                                    T

                    since   f :V → R

Then we can use the graph laplacian to act on it
So the energy function can be rewritten in terms of
Back to the optimization Problem
  Energy can be rewritten using laplacian



F should minimize the energy function.




                                                 ̄
                                  Δuu =( D uu −T uu)
                                              ̄
                                  Δuu =( I −T uu)
                                                ̄
                                  Δ ul =( Dul − T ul )
                                           ̄
                                  Δ ul =−T ul
Optimization Problem

                                                         ̄
                                          Δuu =( D uu −T uu)
 Delta can be rewritten in terms of   ̄
                                      T               ̄
                                          Δ uu=( I − T uu)
                                                        ̄
                                          Δ ul =( Dul − T ul )
                      ̄
            f u =( I −T uu)T ul f l                ̄
                                          Δ ul =−T ul




The algorithm converges to the
minimization of the Energy function
Sigma Parameter




Remember the Sigma parameter?

 ●   It strongly influences the behavior of LP.

 ●   There can be:
        ● just one
                   σ for the whole feature vector
        ● One σ per dimension
Sigma Parameter
            ●   What happens if   σ tends to be:
       –   0:
            ●   The label of an unknown instance is given by just the
                nearest labeled instance

       –   Infinite
             ● All the unlabaled instances receive the same influence

               from all labeled instances. The soft probabilities of each
               unlabeled instance is given by the class frecuency in the
               labeled data

●   There are heuristics for finding the appropiate value of sigma
Sigma Parameter - MST

        Label1

                                        Label2




This is the minimum arc connecting
two components with differents labels


                    (min weight (arc))
                 σ=
                            3
      Arc connects two components with different label
Sigma Parameter – Learning it
 How to learn sigma?
  ● Assumption :

       A good sigma will do classification with
       confidence and thus minimize entropy.

How to do it?
 ● Smoothing the transition Matrix T

 ● Finding the derivative of H (the entropy) w.r.t to

   sigma

  When to do it?
  ● when using a sigma for each dimension can

   be used to determine irrelevant dimensions
Labeling Approach
●   Once Yu is measured how do we assign labels
    to the instances?


                                 Yu




●   Take the most likely class
●   Class mass Normalization
●   Label Bidding
Labeling Approach
        ●   Take the most likely class




    ●   Simply, look at the rows of Yu, and choose for each instance
        the label with highest probability


●       Problem: no control on the proportion of classes
Labeling Approach
●   Class mass Normalization
●   Given some class proportions              P 1 , P 2 ... P k
●   Scalate each column C to             Pc




    ●   Then Simply, look at the rows of Yu, and choose for each
        instance the label with highest probability
Labeling Approach
●       Label bidding

    ●   Given some class proportions   P 1 , P 2 ... P k

1.estimate numbers of items per label        (C k )

2. choose the label with greatest number of items, take C k
items whose probabilty of being the current label is the highest
and label as the current selected label.


3. iterate through all the possible labels
Experiment Setup
●   Artificial Data
    ●   Comparison LP vs kNN (k=1)


●   Character recognition
    ●   Recognize handwritten digits
    ●   Images 16x16 pixels,gray scale
    ●   Recognizing 1,2,3.
    ●   256 dimensional vector
Results using LP on artificial data
Results using LP on artificial data




●   LP finds the structure in the data while KNN fails
P1NN
●   P1NN is a baseline for comparisons
●   Simplified version of LP




    1.During each iteration find the unlabeled instance nearest
    to a labeled instance and label it
    2. Iterate until all instances are labeled
Results using LP on Handwritten
                    dataSet
●   P1NN (BaseLine), 1NN (kNN)




    ●   Cne: Class mass normalization. Proportions from Labeled Data
    ●   Lbo: Label bidding with oracle class proportions
    ●   ML: most likely labels
Relation Extraction?
●   From natural language texts detect semantic
    relations among entities




Example:

B. Gates married Melinda French on January 1, 1994



    spouse(B.Gates, Melinda French)
Why LP to do RE?
                 Problems




  Supervised                  Unsupervised


                            Retrieves clusters of
Needs many                  relations with no
annotated data              label.
RE- Problem Definition
  ●   Find an appropiate label to an ocurrance of two
      entities in a context
Example:

….. B. Gates married Melinda French on January 1, 1994


Context
(Cpre)             Context     Entity 2
          Entity 1 (Cmid)                   Context
                               (e2)         (Cpos)
          (e1)


   Idea: if two ocurrances of entity pairs ahve similar
   Contexts, then they have same relation type
RE problem Definition - Features

●   Words: in the contexts
●   Entity Types: Person, Location, Org...
●   POS tagging: of Words in the contexts
●   Chunking Tag: mark which words in the
    contexts are inside chunks
●   Grammatical function of words in the contexts.
    i.e : NP-SBJ (subject)
●   Position of words:
    ●   First Word of e1      -is there any word in Cmid
                              -first word in Cpre,Cmid,Cpost...
    ●   Second Word of e1..   -second word in Cpre...
RE problem Definition - Labels
Experiment
●   ACE 2003 data. Corpus from Newspapers


●   Assume all entities have been identified already


●   Comparison between:
          –   Differents amount of labeled samples
              1%,10%,25,50%,75%,100%
          –   Different Similarity Functions
          –   LP, SVM and Bootstrapping
●   LP:
    ●   Similarity Function: Cosine, JensenShannon
    ●   Labeling Approach: Take the most likely class
    ●   Sigma: average similarity between labeled classes
Experiment
JensenShannon
-Similarity Measure

-Measure the distance between two probabilitiy functions

-JS is a smoothing of Kullback-Leibler divergence
                                  DK L   Kullback-Leibler
                                         divergence
                                    -not symmetric

                                     -not always has a
                                    finite value
Results
Classifying relation subtypes-
          SVM vs LP




       SVM with linear Kernel
Bootstrapping


             Train a Classifier

Seeds                             Classifier

        Update set of seeds whose
        confidence is high enough
Classifying relation types
  Bootstrapping vs LP




 Starting with 100 random seeds
Results
●   Performs well in general when there are few
    annotated data in comparison to SVM and kNN

●   Irrelevant dimensions can be identified by using
    LP

●   Looking at the structure of unlabeled data
    helps when there is few annotated data
Thank you

Contenu connexe

Tendances

support vector regression
support vector regressionsupport vector regression
support vector regressionAkhilesh Joshi
 
非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出
非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出
非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出hoxo_m
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep LearningYan Xu
 
ICLR2020の異常検知論文の紹介 (2019/11/23)
ICLR2020の異常検知論文の紹介 (2019/11/23)ICLR2020の異常検知論文の紹介 (2019/11/23)
ICLR2020の異常検知論文の紹介 (2019/11/23)ぱんいち すみもと
 
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2hirokazutanaka
 
KNN Algorithm using Python | How KNN Algorithm works | Python Data Science Tr...
KNN Algorithm using Python | How KNN Algorithm works | Python Data Science Tr...KNN Algorithm using Python | How KNN Algorithm works | Python Data Science Tr...
KNN Algorithm using Python | How KNN Algorithm works | Python Data Science Tr...Edureka!
 
BIM Data Mining Unit4 by Tekendra Nath Yogi
 BIM Data Mining Unit4 by Tekendra Nath Yogi BIM Data Mining Unit4 by Tekendra Nath Yogi
BIM Data Mining Unit4 by Tekendra Nath YogiTekendra Nath Yogi
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
 
数学で解き明かす深層学習の原理
数学で解き明かす深層学習の原理数学で解き明かす深層学習の原理
数学で解き明かす深層学習の原理Taiji Suzuki
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised LearningLukas Tencer
 
はじめてのパターン認識8章サポートベクトルマシン
はじめてのパターン認識8章サポートベクトルマシンはじめてのパターン認識8章サポートベクトルマシン
はじめてのパターン認識8章サポートベクトルマシンNobuyukiTakayasu
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)Dmytro Fishman
 
ポーカーAIの最新動向 20171031
ポーカーAIの最新動向 20171031ポーカーAIの最新動向 20171031
ポーカーAIの最新動向 20171031Jun Okumura
 
Notes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew NgNotes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew NgdataHacker. rs
 
Deep Learning Lab 異常検知入門
Deep Learning Lab 異常検知入門Deep Learning Lab 異常検知入門
Deep Learning Lab 異常検知入門Shohei Hido
 

Tendances (20)

support vector regression
support vector regressionsupport vector regression
support vector regression
 
非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出
非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出
非制約最小二乗密度比推定法 uLSIF を用いた外れ値検出
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
ICLR2020の異常検知論文の紹介 (2019/11/23)
ICLR2020の異常検知論文の紹介 (2019/11/23)ICLR2020の異常検知論文の紹介 (2019/11/23)
ICLR2020の異常検知論文の紹介 (2019/11/23)
 
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
 
KNN Algorithm using Python | How KNN Algorithm works | Python Data Science Tr...
KNN Algorithm using Python | How KNN Algorithm works | Python Data Science Tr...KNN Algorithm using Python | How KNN Algorithm works | Python Data Science Tr...
KNN Algorithm using Python | How KNN Algorithm works | Python Data Science Tr...
 
BIM Data Mining Unit4 by Tekendra Nath Yogi
 BIM Data Mining Unit4 by Tekendra Nath Yogi BIM Data Mining Unit4 by Tekendra Nath Yogi
BIM Data Mining Unit4 by Tekendra Nath Yogi
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
 
数学で解き明かす深層学習の原理
数学で解き明かす深層学習の原理数学で解き明かす深層学習の原理
数学で解き明かす深層学習の原理
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 
はじめてのパターン認識8章サポートベクトルマシン
はじめてのパターン認識8章サポートベクトルマシンはじめてのパターン認識8章サポートベクトルマシン
はじめてのパターン認識8章サポートベクトルマシン
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
 
Unsupervised Machine Learning
Unsupervised Machine LearningUnsupervised Machine Learning
Unsupervised Machine Learning
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)
 
Regularization
RegularizationRegularization
Regularization
 
ポーカーAIの最新動向 20171031
ポーカーAIの最新動向 20171031ポーカーAIの最新動向 20171031
ポーカーAIの最新動向 20171031
 
Random forest
Random forestRandom forest
Random forest
 
deep learning
deep learningdeep learning
deep learning
 
Notes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew NgNotes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew Ng
 
Deep Learning Lab 異常検知入門
Deep Learning Lab 異常検知入門Deep Learning Lab 異常検知入門
Deep Learning Lab 異常検知入門
 

En vedette

Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social MediaSymeon Papadopoulos
 
Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...
Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...
Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...Spark Summit
 
Semi supervised learning
Semi supervised learningSemi supervised learning
Semi supervised learningAhmed Taha
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphsNicola Barbieri
 
Language of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisLanguage of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisYelena Mejova
 
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...zukun
 
Semi-supervised classification for natural language processing
Semi-supervised classification for natural language processingSemi-supervised classification for natural language processing
Semi-supervised classification for natural language processingRushdi Shams
 
SocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with ConfidenceSocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with ConfidenceYuto Yamaguchi
 
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACHMINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACHNexgen Technology
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™Databricks
 
Semi-supervised Learning
Semi-supervised LearningSemi-supervised Learning
Semi-supervised Learningbutest
 
What is Agile Software Development?
What is Agile Software Development?What is Agile Software Development?
What is Agile Software Development?Blossom IO Inc.
 
Agile Software Development with Scrum – Introduction
Agile Software Development with Scrum – IntroductionAgile Software Development with Scrum – Introduction
Agile Software Development with Scrum – IntroductionBlackvard
 
[Dl輪読会]semi supervised learning with context-conditional generative adversari...
[Dl輪読会]semi supervised learning with context-conditional generative adversari...[Dl輪読会]semi supervised learning with context-conditional generative adversari...
[Dl輪読会]semi supervised learning with context-conditional generative adversari...Deep Learning JP
 
論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative ModelsSeiya Tokui
 
Overview of Agile Methodology
Overview of Agile MethodologyOverview of Agile Methodology
Overview of Agile MethodologyHaresh Karkar
 
Agile Software Development Overview
Agile Software Development OverviewAgile Software Development Overview
Agile Software Development OverviewStewart Rogers
 
Hierarchical Label Propagation and Discovery for Machine Generated Email
Hierarchical Label Propagation and Discovery for Machine Generated EmailHierarchical Label Propagation and Discovery for Machine Generated Email
Hierarchical Label Propagation and Discovery for Machine Generated EmailKenji Esaki
 
Semi-supervised concept detection by learning the structure of similarity graphs
Semi-supervised concept detection by learning the structure of similarity graphsSemi-supervised concept detection by learning the structure of similarity graphs
Semi-supervised concept detection by learning the structure of similarity graphsSymeon Papadopoulos
 
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)Spark Summit
 

En vedette (20)

Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
 
Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...
Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...
Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...
 
Semi supervised learning
Semi supervised learningSemi supervised learning
Semi supervised learning
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphs
 
Language of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisLanguage of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 Analysis
 
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
 
Semi-supervised classification for natural language processing
Semi-supervised classification for natural language processingSemi-supervised classification for natural language processing
Semi-supervised classification for natural language processing
 
SocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with ConfidenceSocNL: Bayesian Label Propagation with Confidence
SocNL: Bayesian Label Propagation with Confidence
 
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACHMINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
 
Semi-supervised Learning
Semi-supervised LearningSemi-supervised Learning
Semi-supervised Learning
 
What is Agile Software Development?
What is Agile Software Development?What is Agile Software Development?
What is Agile Software Development?
 
Agile Software Development with Scrum – Introduction
Agile Software Development with Scrum – IntroductionAgile Software Development with Scrum – Introduction
Agile Software Development with Scrum – Introduction
 
[Dl輪読会]semi supervised learning with context-conditional generative adversari...
[Dl輪読会]semi supervised learning with context-conditional generative adversari...[Dl輪読会]semi supervised learning with context-conditional generative adversari...
[Dl輪読会]semi supervised learning with context-conditional generative adversari...
 
論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models
 
Overview of Agile Methodology
Overview of Agile MethodologyOverview of Agile Methodology
Overview of Agile Methodology
 
Agile Software Development Overview
Agile Software Development OverviewAgile Software Development Overview
Agile Software Development Overview
 
Hierarchical Label Propagation and Discovery for Machine Generated Email
Hierarchical Label Propagation and Discovery for Machine Generated EmailHierarchical Label Propagation and Discovery for Machine Generated Email
Hierarchical Label Propagation and Discovery for Machine Generated Email
 
Semi-supervised concept detection by learning the structure of similarity graphs
Semi-supervised concept detection by learning the structure of similarity graphsSemi-supervised concept detection by learning the structure of similarity graphs
Semi-supervised concept detection by learning the structure of similarity graphs
 
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
 

Similaire à Label propagation - Semisupervised Learning with Applications to NLP

Relaxed Utility Maximization in Complete Markets
Relaxed Utility Maximization in Complete MarketsRelaxed Utility Maximization in Complete Markets
Relaxed Utility Maximization in Complete Marketsguasoni
 
Fractional Calculus
Fractional CalculusFractional Calculus
Fractional CalculusVRRITC
 
Weatherwax cormen solutions
Weatherwax cormen solutionsWeatherwax cormen solutions
Weatherwax cormen solutionskirankoushik
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksStratio
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer visionzukun
 
Doering Savov
Doering SavovDoering Savov
Doering Savovgh
 
NIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
NIPS2009: Sparse Methods for Machine Learning: Theory and AlgorithmsNIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
NIPS2009: Sparse Methods for Machine Learning: Theory and Algorithmszukun
 
Sienna 3 bruteforce
Sienna 3 bruteforceSienna 3 bruteforce
Sienna 3 bruteforcechidabdu
 
Classification of-signals-systems-ppt
Classification of-signals-systems-pptClassification of-signals-systems-ppt
Classification of-signals-systems-pptMayankSharma1126
 
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
The univalence of some integral operators
The univalence of some integral operatorsThe univalence of some integral operators
The univalence of some integral operatorsAlexander Decker
 
11.the univalence of some integral operators
11.the univalence of some integral operators11.the univalence of some integral operators
11.the univalence of some integral operatorsAlexander Decker
 
Transactional Data Mining
Transactional Data MiningTransactional Data Mining
Transactional Data MiningTed Dunning
 
P805 bourgeois
P805 bourgeoisP805 bourgeois
P805 bourgeoiskklub
 
On probability distributions
On probability distributionsOn probability distributions
On probability distributionsEric Xihui Lin
 

Similaire à Label propagation - Semisupervised Learning with Applications to NLP (20)

Relaxed Utility Maximization in Complete Markets
Relaxed Utility Maximization in Complete MarketsRelaxed Utility Maximization in Complete Markets
Relaxed Utility Maximization in Complete Markets
 
Fractional Calculus
Fractional CalculusFractional Calculus
Fractional Calculus
 
Weatherwax cormen solutions
Weatherwax cormen solutionsWeatherwax cormen solutions
Weatherwax cormen solutions
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
Neural network and mlp
Neural network and mlpNeural network and mlp
Neural network and mlp
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
 
Doering Savov
Doering SavovDoering Savov
Doering Savov
 
Algo complexity
Algo complexityAlgo complexity
Algo complexity
 
NIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
NIPS2009: Sparse Methods for Machine Learning: Theory and AlgorithmsNIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
NIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
 
Sienna 3 bruteforce
Sienna 3 bruteforceSienna 3 bruteforce
Sienna 3 bruteforce
 
Classification of-signals-systems-ppt
Classification of-signals-systems-pptClassification of-signals-systems-ppt
Classification of-signals-systems-ppt
 
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
 
The univalence of some integral operators
The univalence of some integral operatorsThe univalence of some integral operators
The univalence of some integral operators
 
11.the univalence of some integral operators
11.the univalence of some integral operators11.the univalence of some integral operators
11.the univalence of some integral operators
 
Transactional Data Mining
Transactional Data MiningTransactional Data Mining
Transactional Data Mining
 
P805 bourgeois
P805 bourgeoisP805 bourgeois
P805 bourgeois
 
On probability distributions
On probability distributionsOn probability distributions
On probability distributions
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Nokton theory-en
Nokton theory-enNokton theory-en
Nokton theory-en
 

Plus de David Przybilla

Reproducible datascience [with Terraform]
Reproducible datascience [with Terraform]Reproducible datascience [with Terraform]
Reproducible datascience [with Terraform]David Przybilla
 
Transition Based Dependency Parsing
Transition Based Dependency ParsingTransition Based Dependency Parsing
Transition Based Dependency ParsingDavid Przybilla
 
Python in the land of serverless
Python in the land of serverlessPython in the land of serverless
Python in the land of serverlessDavid Przybilla
 
Apache Spark - Introduccion a RDDs
Apache Spark - Introduccion a RDDsApache Spark - Introduccion a RDDs
Apache Spark - Introduccion a RDDsDavid Przybilla
 
Procesamiento de Lenguaje Natural
Procesamiento de Lenguaje NaturalProcesamiento de Lenguaje Natural
Procesamiento de Lenguaje NaturalDavid Przybilla
 
Automatic generation of domain models for call centers
Automatic generation of domain models for call centersAutomatic generation of domain models for call centers
Automatic generation of domain models for call centersDavid Przybilla
 

Plus de David Przybilla (7)

Reproducible datascience [with Terraform]
Reproducible datascience [with Terraform]Reproducible datascience [with Terraform]
Reproducible datascience [with Terraform]
 
Transition Based Dependency Parsing
Transition Based Dependency ParsingTransition Based Dependency Parsing
Transition Based Dependency Parsing
 
Python in the land of serverless
Python in the land of serverlessPython in the land of serverless
Python in the land of serverless
 
Terraforming
Terraforming Terraforming
Terraforming
 
Apache Spark - Introduccion a RDDs
Apache Spark - Introduccion a RDDsApache Spark - Introduccion a RDDs
Apache Spark - Introduccion a RDDs
 
Procesamiento de Lenguaje Natural
Procesamiento de Lenguaje NaturalProcesamiento de Lenguaje Natural
Procesamiento de Lenguaje Natural
 
Automatic generation of domain models for call centers
Automatic generation of domain models for call centersAutomatic generation of domain models for call centers
Automatic generation of domain models for call centers
 

Dernier

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 

Dernier (20)

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 

Label propagation - Semisupervised Learning with Applications to NLP

  • 1. Label Propagation Seminar: Semi-supervised and unsupervised learning with Applications to NLP David Przybilla davida@coli.uni-saarland.de
  • 2. Outline ● What is Label Propagation ● The Algorithm ● The motivation behind the algorithm ● Parameters of Label Propagation ● Relation Extraction with Label Propagation
  • 3. Label Propagation ● Semi-supervised ● Shows good results when the amount of annotated data is low with respect to the supervised options ● Similar to kNN
  • 4. K-Nearest Neighbors(KNN) ● Shares similar ideas with Label Propagation ● Label Propagation (LP) uses unlabeled instances during the process of finding out the labels
  • 5. Idea of the Problem Similar near Unlabeled Instances should have similar Labels L=set of Labeled Instances U =set of Unlabeled Instances We want to find a function f such that:
  • 6. The Model ● A complete graph ● Each Node is an instance ● Each arc has a weight T xy ● T xy is high if Nodes x and y are similar.
  • 7. The Model ● Inside a Node: Soft Labels
  • 8. Variables - Model ● T is a matrix, holding all the weights of the graph N 1 ... N l = Labeled Data TllTlu N l+1 .. N n=Unlabeled Data T u lT u u Tll Tlu T ul T uu
  • 9. Variables - Model ● Y is a matrix, holding the soft probabilities of each instance YN a n , R b is the probability of a being labeled as R b YL YU The problem to solve R1 , R 2 ... R k each of the possible labels N 1 , N 2 ... N n each of the instances to label
  • 10. Algorithm Y will change in each iteration
  • 11. How to Measure T? Distance Measure Euclidean Distance Important Parameter (ignore it at the moment) we will talk about this later
  • 12. How to Initialize Y? 0 ● How to Correctly set the values of Y ? ● Fill the known values (of the labeled data) ● How to fill the values of the unlabeled data? → The initialization of this values can be arbitrary. ● Transform T into T' (row normalization)
  • 13. Propagation Step ● During the process Y will change 0 1 k Y → Y → ... → Y ● Update Y during each iteration
  • 14. Convergence During the iteration Clamped Yl ̄ T l l T̄l u Yl = Yu T̄u l T̄ u u Yu Assumming we iterate infinite times then: 1 Y =T U ̄uu Y 0+ T ul Y L u ̄ 2 Y =T U ̄uu ( T̄uu Y 0 + T ul Y L )+T ul Y L u ̄ ̄ ...
  • 15. Convergence ̄ Since T is normalized and ̄ is a submatrix of T: Doing it n times will lead to: Converges to Zero
  • 16. After convergence After convergence one can find by solving: =
  • 17. Optimization Problem w i j : Similarity between i j F should minimize the energy function f (i ) and f ( j) should be similar for a high w i j in order to minimize
  • 18. The graph laplacian Let D be a diagonal matrix where T̄i j Rows are normalized so: D= I The graph laplacian is defined as : ̄ T since f :V → R Then we can use the graph laplacian to act on it So the energy function can be rewritten in terms of
  • 19. Back to the optimization Problem Energy can be rewritten using laplacian F should minimize the energy function. ̄ Δuu =( D uu −T uu) ̄ Δuu =( I −T uu) ̄ Δ ul =( Dul − T ul ) ̄ Δ ul =−T ul
  • 20. Optimization Problem ̄ Δuu =( D uu −T uu) Delta can be rewritten in terms of ̄ T ̄ Δ uu=( I − T uu) ̄ Δ ul =( Dul − T ul ) ̄ f u =( I −T uu)T ul f l ̄ Δ ul =−T ul The algorithm converges to the minimization of the Energy function
  • 21. Sigma Parameter Remember the Sigma parameter? ● It strongly influences the behavior of LP. ● There can be: ● just one σ for the whole feature vector ● One σ per dimension
  • 22. Sigma Parameter ● What happens if σ tends to be: – 0: ● The label of an unknown instance is given by just the nearest labeled instance – Infinite ● All the unlabaled instances receive the same influence from all labeled instances. The soft probabilities of each unlabeled instance is given by the class frecuency in the labeled data ● There are heuristics for finding the appropiate value of sigma
  • 23. Sigma Parameter - MST Label1 Label2 This is the minimum arc connecting two components with differents labels (min weight (arc)) σ= 3 Arc connects two components with different label
  • 24. Sigma Parameter – Learning it How to learn sigma? ● Assumption : A good sigma will do classification with confidence and thus minimize entropy. How to do it? ● Smoothing the transition Matrix T ● Finding the derivative of H (the entropy) w.r.t to sigma When to do it? ● when using a sigma for each dimension can be used to determine irrelevant dimensions
  • 25. Labeling Approach ● Once Yu is measured how do we assign labels to the instances? Yu ● Take the most likely class ● Class mass Normalization ● Label Bidding
  • 26. Labeling Approach ● Take the most likely class ● Simply, look at the rows of Yu, and choose for each instance the label with highest probability ● Problem: no control on the proportion of classes
  • 27. Labeling Approach ● Class mass Normalization ● Given some class proportions P 1 , P 2 ... P k ● Scalate each column C to Pc ● Then Simply, look at the rows of Yu, and choose for each instance the label with highest probability
  • 28. Labeling Approach ● Label bidding ● Given some class proportions P 1 , P 2 ... P k 1.estimate numbers of items per label (C k ) 2. choose the label with greatest number of items, take C k items whose probabilty of being the current label is the highest and label as the current selected label. 3. iterate through all the possible labels
  • 29. Experiment Setup ● Artificial Data ● Comparison LP vs kNN (k=1) ● Character recognition ● Recognize handwritten digits ● Images 16x16 pixels,gray scale ● Recognizing 1,2,3. ● 256 dimensional vector
  • 30. Results using LP on artificial data
  • 31. Results using LP on artificial data ● LP finds the structure in the data while KNN fails
  • 32. P1NN ● P1NN is a baseline for comparisons ● Simplified version of LP 1.During each iteration find the unlabeled instance nearest to a labeled instance and label it 2. Iterate until all instances are labeled
  • 33. Results using LP on Handwritten dataSet ● P1NN (BaseLine), 1NN (kNN) ● Cne: Class mass normalization. Proportions from Labeled Data ● Lbo: Label bidding with oracle class proportions ● ML: most likely labels
  • 34. Relation Extraction? ● From natural language texts detect semantic relations among entities Example: B. Gates married Melinda French on January 1, 1994 spouse(B.Gates, Melinda French)
  • 35. Why LP to do RE? Problems Supervised Unsupervised Retrieves clusters of Needs many relations with no annotated data label.
  • 36. RE- Problem Definition ● Find an appropiate label to an ocurrance of two entities in a context Example: ….. B. Gates married Melinda French on January 1, 1994 Context (Cpre) Context Entity 2 Entity 1 (Cmid) Context (e2) (Cpos) (e1) Idea: if two ocurrances of entity pairs ahve similar Contexts, then they have same relation type
  • 37. RE problem Definition - Features ● Words: in the contexts ● Entity Types: Person, Location, Org... ● POS tagging: of Words in the contexts ● Chunking Tag: mark which words in the contexts are inside chunks ● Grammatical function of words in the contexts. i.e : NP-SBJ (subject) ● Position of words: ● First Word of e1 -is there any word in Cmid -first word in Cpre,Cmid,Cpost... ● Second Word of e1.. -second word in Cpre...
  • 39. Experiment ● ACE 2003 data. Corpus from Newspapers ● Assume all entities have been identified already ● Comparison between: – Differents amount of labeled samples 1%,10%,25,50%,75%,100% – Different Similarity Functions – LP, SVM and Bootstrapping ● LP: ● Similarity Function: Cosine, JensenShannon ● Labeling Approach: Take the most likely class ● Sigma: average similarity between labeled classes
  • 40. Experiment JensenShannon -Similarity Measure -Measure the distance between two probabilitiy functions -JS is a smoothing of Kullback-Leibler divergence DK L Kullback-Leibler divergence -not symmetric -not always has a finite value
  • 42. Classifying relation subtypes- SVM vs LP SVM with linear Kernel
  • 43. Bootstrapping Train a Classifier Seeds Classifier Update set of seeds whose confidence is high enough
  • 44. Classifying relation types Bootstrapping vs LP Starting with 100 random seeds
  • 45. Results ● Performs well in general when there are few annotated data in comparison to SVM and kNN ● Irrelevant dimensions can be identified by using LP ● Looking at the structure of unlabeled data helps when there is few annotated data