Machine Learning for NLP

Seminar: Statistical NLP Girona, June 2003 Machine Learning for Natural Language Processing Lluís Màrquez TALP Research Center Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya

Outline ,[object Object],[object Object],[object Object],[object Object]

[object Object],Machine Learning ML4NLP ,[object Object],[object Object],[object Object],[object Object],Making a computer automatically acquire some kind of knowledge from a concrete data domain

Machine Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ML4NLP

Machine Learning ,[object Object],[object Object],A more precise definition: ML4NLP Obtaining a description of the concept in some representation language that explains observations and helps predicting new instances of the same distribution

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Empirical NLP 90 ’s : Application of Machine Learning techniques (ML) to NLP problems ML4NLP ,[object Object],Clasification problems

[object Object],NLP “classification” problems ,[object Object],( The Wall Street Journal Corpus ) ML4NLP

[object Object],NLP “classification” problems ,[object Object],NN VB JJ VB NN VB ( The Wall Street Journal Corpus ) ML4NLP

[object Object],NLP “classification” problems ,[object Object],body -part clock -part ( The Wall Street Journal Corpus ) ML4NLP

Feature Vector Classification ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IA perspective Classification

Feature Vector Classification ,[object Object],[object Object],Classification

An Example otherwise  negative Classification (COLOR= red )  (SHAPE= circle )  positive Rules red blue SHAPE negative positive circle triangle negative COLOR Decision Tree

An Example Classification Rules (SIZE= small )   (SHAPE= circle )  positive otherwise  negative (SIZE= big )   (COLOR= red )  positive small big SHAPE pos circle red SIZE Decision Tree COLOR triang blue neg pos neg

Some important concepts ,[object Object],[object Object],[object Object],Classification red blue SHAPE negative positive circle triangle negative COLOR Decision Tree

[object Object],[object Object],Some important concepts ,[object Object],[object Object],[object Object],[object Object],Classification

Propositional vs. Relational Learning Classification color(red)  shape(circle)    classA ,[object Object],course(X)  person(Y)  link_to(Y,X)    instructor_of(X,Y) research_project(X)  person(Z)  link_to(L 1 ,X,Y)  link_to(L 2 ,Y,Z)  neighbour_word_ people (L 1 )   member_proj(X,Z) ,[object Object]

The Classification Setting Class, Point, Example, Data Set, ... ,[object Object],[object Object],[object Object],[object Object],[object Object],Classification CoLT/SLT perspective

The Classification Setting Learning, Error, ... ,[object Object],[object Object],Classification

The Classification Setting Learning, Error, ... ,[object Object],[object Object],[object Object],Classification

The Classification Setting Error, Over(under)fitting,... ,[object Object],[object Object],[object Object],[object Object],(Müller et al., 2001) Classification Under fitting Over fitting

Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Learning Paradigms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Algorithms

Learning Paradigms ,[object Object],[object Object],[object Object],[object Object],[object Object],Algorithms

Decision Trees ,[object Object],[object Object],[object Object],Algorithms

Decision Trees ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Algorithms

An Example Algorithms A1 A2 A3 C1 A5 A2 A2 A5 C3 C2 C1 ... ... ... ... v 1 v 2 v 3 v 5 v 4 v 6 v 7 small big SHAPE pos circle red SIZE Decision Tree COLOR triang blue neg pos neg

Learning Decision Trees Algorithms Training Training Set TDIDT + DT = Test = DT Example + Class

General Induction Algorithm Algorithms function TDIDT (X:set-of-examples; A:set-of-features) var : tree 1 ,tree 2 : decision-tree; X’: set-of-examples; A’: set-of-features end-var if ( stopping_criterion (X)) then tree 1 := create_leaf_tree (X) else a max := feature_selection (X,A); tree 1 := create_tree (X, a max ); for-all val in values (a max ) do X’ := select_examples (X,a max ,val); A’ := A - {a max }; tree 2 := TDIDT (X’,A’); tree 1 := add_branch (tree 1 ,tree 2 ,val) end-for end-if return (tree 1 ) end-function

Feature Selection Criteria ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Algorithms

Extensions of DTs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],(Murthy 95) Algorithms

Decision Trees and NLP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Algorithms

Decision Trees and NLP ,[object Object],[object Object],[object Object],[object Object],Algorithms

Decision Trees: pros&cons ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Algorithms

Boosting algorithms ,[object Object],[object Object],[object Object],[object Object],[object Object],Algorithms

AdaBoost: general scheme TRAINING Algorithms TS 2 D 2 TS 1 D 1 Weak Learner h 1 Weak Learner h 2 TS T . . . Probability distribution updating D T Weak Learner h T . . . Linear combination F( h 1 ,h 2 ,...,h T ) TEST  2    

AdaBoost: algorithm Algorithms (Freund & Schapire 97)

AdaBoost: example Weak hypotheses = vertical/horizontal hyperplanes Algorithms

Combined Hypothesis Algorithms www.research.att.com/ ~ yoav/adaboost

AdaBoost and NLP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Algorithms

AdaBoost: pros&cons Algorithms ,[object Object],[object Object],[object Object],[object Object],[object Object]

AdaBoost: pros&cons ,[object Object],[object Object],[object Object],Algorithms

[object Object],Algorithms SVM: A General Definition

SVM: A General Definition ,[object Object],Key Concepts Algorithms

Linear Classifiers ,[object Object],[object Object],[object Object],Algorithms w + + + + + + _ _ _ _ _ _ _ _ _

Optimal Hyperplane: Geometric Intuition Algorithms

Optimal Hyperplane: Geometric Intuition    Maximal Margin Hyperplane Algorithms These are the Support Vectors

Linearly separable data Quadratic Programming Algorithms Seminari SVM s 22/05/2001

Non-separable case (soft margin) Algorithms Seminari SVM s 22/05/2001

Non-linear SVMs ,[object Object],Algorithms Non-linear mapping Set of hypotheses Seminari SVM s 22/05/2001 Dual formulation Kernel function Evaluation

Non-linear SVMs ,[object Object],[object Object],[object Object],[object Object],[object Object],Algorithms Seminari SVM s 22/05/2001

Non linear SVMs Degree 3 polynomial kernel lin. separable lin. non-separable Algorithms Seminari SVM s 22/05/2001

Toy Examples ,[object Object],[object Object],[object Object],Algorithms

Toy Examples (I) Linearly separable data set Linear SVM Maximal margin Hyperplane Algorithms . What happens if we add a blue training example here?

Toy Examples (I) (still) Linearly separable data set Linear SVM High value of C parameter Maximal margin Hyperplane The example is correctly classified Algorithms

Toy Examples (I) (still) Linearly separable data set Linear SVM Low value of C parameter Trade-off between: margin and training error The example is now a bounded SV Algorithms

Toy Examples (III) Algorithms

SVM: Summary ,[object Object],[object Object],[object Object],[object Object],Algorithms

SVM: Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],Algorithms

NLP problems Applications ,[object Object],[object Object]

NLP problems: structural complexity Applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],POS tagging ,[object Object],NN VB JJ VB NN VB ( The Wall Street Journal Corpus ) Applications

POS tagging Applications “ preposition-adverb” tree root P(IN)=0.81 P(RB)=0.19 Word Form leaf P(IN)=0.83 P(RB)=0.17 tag(+1) P(IN)=0.13 P(RB)=0.87 tag(+2) P(IN)=0.013 P(RB)=0.987 “ As”,“as” RB IN others others ... ... ^ Probabilistic interpretation: P( RB | word=“A/as”  tag(+1)=RB  tag(+2)=IN) = 0.987 P( IN | word=“A/as”  tag(+1)=RB  tag(+2)=IN) = 0.013 ^

POS tagging “ as _ RB much_ RB as_ IN ” Collocations: “ as _ RB well_ RB as_ IN ” “ as _ RB soon_ RB as_ IN ” Applications “ preposition-adverb” tree root P(IN)=0.81 P(RB)=0.19 Word Form leaf P(IN)=0.83 P(RB)=0.17 tag(+1) P(IN)=0.13 P(RB)=0.87 tag(+2) P(IN)=0.013 P(RB)=0.987 “ As”,“as” RB IN others others ... ...

POS tagging Raw text Morphological analysis Tagged text Classify Update Filter Language Model Disambiguation stop? RTT (Màrquez & Rodríguez 97) yes no Applications A Sequential Model for Multi-class Classification: NLP/POS Tagging (Even-Zohar & Roth, 01)

POS tagging STT (Màrquez & Rodríguez 97) Applications Tagged text Raw text Morphological analysis Viterbi algorithm Language Model Disambiguation Lexical probs. + Contextual probs. The Use of Classifiers in sequential inference: Chunking (Punyakanok & Roth, 00)

Detection of sequential and hierarchical structures ,[object Object],[object Object],Applications

Summary/conclusions ,[object Object],[object Object],[object Object],[object Object],Conclusions

[object Object],[object Object],[object Object],[object Object],[object Object],Conclusions Summary/conclusions

Summary/conclusions ,[object Object],Conclusions

Some current research lines ,[object Object],[object Object],[object Object],[object Object],Conclusions

Bibliografia ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Conclusions

Machine Learning for NLP

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (14)

Similaire à Machine Learning for NLP

Similaire à Machine Learning for NLP (20)

Plus de butest

Plus de butest (20)

Machine Learning for NLP

Notes de l'éditeur