Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection

Fast and Robust Part-of-Speech Tagging
Using Dynamic Model Selection
Jinho D. Choi and Martha Palmer
Institute of Cognitive Science, University of Colorado Boulder
Supervised Learning
Domain Adaptation
Dynamic Model Selection
Part-of-speech Tagging
Training
Decoding
Dynamic Model Selection
Experimental setup
• Training corpus
: The Wall Street Journal Sections 2-21 from OntoNotes v4.0.
: 731,677 tokens, 30,060 sentences.
• Tagging algorithm
: A one-pass, left-to-right POS tagging algorithm.
• Machine learning algorithm
: Liblinear L2-regularization, L1-loss support vector classification.
• Evaluation corpora
Comparisons
Experiments
Conclusion
• Our dynamic model selection approachimproves the robustness of POS tagging on
heterogeneous data, and shows noticeably faster tagging speed against two other systems.
•We believe that this approach can be applied to more sophisticated tagging algorithms and
improve their robustness even further.
ClearNLP
• Open source projects: clearnlp.googlecode.com, clearparser.googlecode.com
• Contact: Jinho D. Choi (choijd@colorado.edu)
Conclusion
Simplified word form
• In a simplified word form, all numerical expressions are replaced with 0.
•A lowercase simplified word form (LSW) is a decapitalized simplified word form.
• Simplified word forms give more generalization to lexical features than their original forms.
Regular expressions
• A simplified word form is derived by applying the following regular expressions sequentially to
the original word-form, w.
•‘replaceAll’ is a function that replaces all matches of the regular expression inw(the 1st
parameter) with the specific string (the 2nd parameter).
1. w.replaceAll(d%, 0) e.g., 1% → 0
2. w.replaceAll($d, 0) e.g., $1 → 0
3. w.replaceAll(∧.d, 0) e.g., .1 → 0
4. w.replaceAll(d(,|:|-|/|.)d, 0) e.g., 1,2|1:2|1-2|1/2|1.2 → 0
5. w.replaceAll(d+, 0) e.g., 1234 → 0
Pre-processing
Target
data
Training
data
Model
Target
data
Training
data’
Target
data
Target
data
Training
data’’
Model’
Model’
’
How many models do we need to build?
Do we always know about the target data?
Target
data
Target
data
Model
D
Model
G
Do not assume the target data.
Training
data
Target
data
Select one of two models dynamically.
BC BN CN MD MZ NW WB Total
Model D 91.81 95.27 87.36 90.74 93.91 97.45 93.93 92.97
Model G 92.65 94.82 88.24 91.46 93.24 97.11 93.51 93.05
G over D 50.63 36.67 68.80 40.22 21.43 9.51 36.02 41.74
Model S 92.26 95.13 88.18 91.34 93.88 97.46 93.90 93.21
Stanford 87.71 95.50 88.49 90.86 92.80 97.42 94.01 92.50
SVMTool 87.82 95.13 87.86 90.54 92.94 97.31 93.99 92.32
Genre All Tokens Unknown Tok’s Sentences
BN Broadcasting news 31,704 3,077 2,076
BC Broadcasting conversation 31,328 1,284 1,969
CN Clinical notes 35,721 6,077 3,170
MD Medpedia articles 34,022 4,755 1,850
MZ Magazine 32,120 2,663 1,409
NW Newswire 39,590 983 1,640
WB Web-text 34,707 2,609 1,738
Tagging accuracies of all tokens (in %)
BC BN CN MD MZ NW WB Total
Model S 60.97 77.73 68.69 67.30 75.97 88.40 76.27 70.54
Stanford 19.24 87.31 71.20 64.82 66.28 88.40 78.15 64.32
SVMTool 19.08 78.35 66.51 62.94 65.23 86.88 76.47 47.65
Tagging accuracies of unknown tokens (in %)
Stanford SVMTool Model S
421 1,163 31,914
Tagging speeds (tokens / sec.)
•This work was supported by the SHARP program funded by ONC: 90TR0002/01. The content is solely
the responsibility of the authors and does not necessarily represent the official views of the ONC.
Acknowledgments
Training
Data
Document
N
Document
1
. . .
DF(LSW)
>thD
DF(LSW)
>thG
Model
D
Model
G
Domain-specific model
: using lexical features whose DF(LFW) > 1
Generalized model
: using lexical features whose DF(LFW) > 2
Separate documents
Extract two sets of features
Build two models
Input
Sentences
Is
Model D?
Model
D
Model
G
YES NO
Output
Sentences
Output
Sentences
Is the cosine similarity between LSWs
of the input sentence and Model D is
greater than a threshold?

Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection

Recommandé

Recommandé

Contenu connexe

Similaire à Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection

Similaire à Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection (20)

Plus de Jinho Choi

Plus de Jinho Choi (20)

Dernier

Dernier (20)

Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection