1. Seminar: Statistical NLP Girona, June 2003 Machine Learning for Natural Language Processing Lluís Màrquez TALP Research Center Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19. An Example otherwise negative Classification (COLOR= red ) (SHAPE= circle ) positive Rules red blue SHAPE negative positive circle triangle negative COLOR Decision Tree
20. An Example Classification Rules (SIZE= small ) (SHAPE= circle ) positive otherwise negative (SIZE= big ) (COLOR= red ) positive small big SHAPE pos circle red SIZE Decision Tree COLOR triang blue neg pos neg
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34. An Example Algorithms A1 A2 A3 C1 A5 A2 A2 A5 C3 C2 C1 ... ... ... ... v 1 v 2 v 3 v 5 v 4 v 6 v 7 small big SHAPE pos circle red SIZE Decision Tree COLOR triang blue neg pos neg
35. Learning Decision Trees Algorithms Training Training Set TDIDT + DT = Test = DT Example + Class
36. General Induction Algorithm Algorithms function TDIDT (X:set-of-examples; A:set-of-features) var : tree 1 ,tree 2 : decision-tree; X’: set-of-examples; A’: set-of-features end-var if ( stopping_criterion (X)) then tree 1 := create_leaf_tree (X) else a max := feature_selection (X,A); tree 1 := create_tree (X, a max ); for-all val in values (a max ) do X’ := select_examples (X,a max ,val); A’ := A - {a max }; tree 2 := TDIDT (X’,A’); tree 1 := add_branch (tree 1 ,tree 2 ,val) end-for end-if return (tree 1 ) end-function
37. General Induction Algorithm Algorithms function TDIDT (X:set-of-examples; A:set-of-features) var : tree 1 ,tree 2 : decision-tree; X’: set-of-examples; A’: set-of-features end-var if ( stopping_criterion (X)) then tree 1 := create_leaf_tree (X) else a max := feature_selection (X,A); tree 1 := create_tree (X, a max ); for-all val in values (a max ) do X’ := select_examples (X,a max ,val); A’ := A - {a max }; tree 2 := TDIDT (X’,A’); tree 1 := add_branch (tree 1 ,tree 2 ,val) end-for end-if return (tree 1 ) end-function
38.
39.
40.
41.
42.
43.
44.
45. AdaBoost: general scheme TRAINING Algorithms TS 2 D 2 TS 1 D 1 Weak Learner h 1 Weak Learner h 2 TS T . . . Probability distribution updating D T Weak Learner h T . . . Linear combination F( h 1 ,h 2 ,...,h T ) TEST 2
64. Non linear SVMs Degree 3 polynomial kernel lin. separable lin. non-separable Algorithms Seminari SVM s 22/05/2001
65.
66. Toy Examples (I) Linearly separable data set Linear SVM Maximal margin Hyperplane Algorithms . What happens if we add a blue training example here?
67. Toy Examples (I) (still) Linearly separable data set Linear SVM High value of C parameter Maximal margin Hyperplane The example is correctly classified Algorithms
68. Toy Examples (I) (still) Linearly separable data set Linear SVM Low value of C parameter Trade-off between: margin and training error The example is now a bounded SV Algorithms
80. POS tagging “ as _ RB much_ RB as_ IN ” Collocations: “ as _ RB well_ RB as_ IN ” “ as _ RB soon_ RB as_ IN ” Applications “ preposition-adverb” tree root P(IN)=0.81 P(RB)=0.19 Word Form leaf P(IN)=0.83 P(RB)=0.17 tag(+1) P(IN)=0.13 P(RB)=0.87 tag(+2) P(IN)=0.013 P(RB)=0.987 “ As”,“as” RB IN others others ... ...
81. POS tagging Raw text Morphological analysis Tagged text Classify Update Filter Language Model Disambiguation stop? RTT (Màrquez & Rodríguez 97) yes no Applications A Sequential Model for Multi-class Classification: NLP/POS Tagging (Even-Zohar & Roth, 01)
82. POS tagging STT (Màrquez & Rodríguez 97) Applications Tagged text Raw text Morphological analysis Viterbi algorithm Language Model Disambiguation Lexical probs. + Contextual probs. The Use of Classifiers in sequential inference: Chunking (Punyakanok & Roth, 00)
83.
84.
85.
86.
87.
88.
89. Seminar: Statistical NLP Girona, June 2003 Machine Learning for Natural Language Processing Lluís Màrquez TALP Research Center Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
Notes de l'éditeur
ReliefF-IG... Variant de la funció ReliefF de Kononenko que determina la utilitat dels diferents atributs considerant les interrelacions entre ells.
Last point : many functions for attribute selection, stopping criteria, pruning method, etc.
Last point : many functions for attribute selection, stopping criteria, pruning method, etc.
Maximitzar el marge funcional és equivalent a normalitzar-lo igualant-lo a 1(canonical hyperplanes) i minimitzar la norma del vector de pesos
Maximitzar el marge funcional és equivalent a normalitzar-lo igualant-lo a 1(canonical hyperplanes) i minimitzar la norma del vector de pesos