SlideShare une entreprise Scribd logo
1  sur  1
Fast and Robust Part-of-Speech Tagging
Using Dynamic Model Selection
Jinho D. Choi and Martha Palmer
Institute of Cognitive Science, University of Colorado Boulder
Supervised Learning
Domain Adaptation
Dynamic Model Selection
Part-of-speech Tagging
Training
Decoding
Dynamic Model Selection
Experimental setup
• Training corpus
: The Wall Street Journal Sections 2-21 from OntoNotes v4.0.
: 731,677 tokens, 30,060 sentences.
• Tagging algorithm
: A one-pass, left-to-right POS tagging algorithm.
• Machine learning algorithm
: Liblinear L2-regularization, L1-loss support vector classification.
• Evaluation corpora
Comparisons
Experiments
Conclusion
• Our dynamic model selection approachimproves the robustness of POS tagging on
heterogeneous data, and shows noticeably faster tagging speed against two other systems.
•We believe that this approach can be applied to more sophisticated tagging algorithms and
improve their robustness even further.
ClearNLP
• Open source projects: clearnlp.googlecode.com, clearparser.googlecode.com
• Contact: Jinho D. Choi (choijd@colorado.edu)
Conclusion
Simplified word form
• In a simplified word form, all numerical expressions are replaced with 0.
•A lowercase simplified word form (LSW) is a decapitalized simplified word form.
• Simplified word forms give more generalization to lexical features than their original forms.
Regular expressions
• A simplified word form is derived by applying the following regular expressions sequentially to
the original word-form, w.
•‘replaceAll’ is a function that replaces all matches of the regular expression inw(the 1st
parameter) with the specific string (the 2nd parameter).
1. w.replaceAll(d%, 0) e.g., 1% → 0
2. w.replaceAll($d, 0) e.g., $1 → 0
3. w.replaceAll(∧.d, 0) e.g., .1 → 0
4. w.replaceAll(d(,|:|-|/|.)d, 0) e.g., 1,2|1:2|1-2|1/2|1.2 → 0
5. w.replaceAll(d+, 0) e.g., 1234 → 0
Pre-processing
Target
data
Training
data
Model
Target
data
Training
data’
Target
data
Target
data
Training
data’’
Model’
Model’
’
How many models do we need to build?
Do we always know about the target data?
Target
data
Target
data
Model
D
Model
G
Do not assume the target data.
Training
data
Target
data
Select one of two models dynamically.
BC BN CN MD MZ NW WB Total
Model D 91.81 95.27 87.36 90.74 93.91 97.45 93.93 92.97
Model G 92.65 94.82 88.24 91.46 93.24 97.11 93.51 93.05
G over D 50.63 36.67 68.80 40.22 21.43 9.51 36.02 41.74
Model S 92.26 95.13 88.18 91.34 93.88 97.46 93.90 93.21
Stanford 87.71 95.50 88.49 90.86 92.80 97.42 94.01 92.50
SVMTool 87.82 95.13 87.86 90.54 92.94 97.31 93.99 92.32
Genre All Tokens Unknown Tok’s Sentences
BN Broadcasting news 31,704 3,077 2,076
BC Broadcasting conversation 31,328 1,284 1,969
CN Clinical notes 35,721 6,077 3,170
MD Medpedia articles 34,022 4,755 1,850
MZ Magazine 32,120 2,663 1,409
NW Newswire 39,590 983 1,640
WB Web-text 34,707 2,609 1,738
Tagging accuracies of all tokens (in %)
BC BN CN MD MZ NW WB Total
Model S 60.97 77.73 68.69 67.30 75.97 88.40 76.27 70.54
Stanford 19.24 87.31 71.20 64.82 66.28 88.40 78.15 64.32
SVMTool 19.08 78.35 66.51 62.94 65.23 86.88 76.47 47.65
Tagging accuracies of unknown tokens (in %)
Stanford SVMTool Model S
421 1,163 31,914
Tagging speeds (tokens / sec.)
•This work was supported by the SHARP program funded by ONC: 90TR0002/01. The content is solely
the responsibility of the authors and does not necessarily represent the official views of the ONC.
Acknowledgments
Training
Data
Document
N
Document
1
. . .
DF(LSW)
>thD
DF(LSW)
>thG
Model
D
Model
G
Domain-specific model
: using lexical features whose DF(LFW) > 1
Generalized model
: using lexical features whose DF(LFW) > 2
Separate documents
Extract two sets of features
Build two models
Input
Sentences
Is
Model D?
Model
D
Model
G
YES NO
Output
Sentences
Output
Sentences
Is the cosine similarity between LSWs
of the input sentence and Model D is
greater than a threshold?

Contenu connexe

Similaire à Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection

Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
Protein Secondary Structure Prediction using Deep Learning methods
Protein Secondary Structure Prediction using Deep Learning methodsProtein Secondary Structure Prediction using Deep Learning methods
Protein Secondary Structure Prediction using Deep Learning methodsChrysoula Kosma
 
Understand students’ self-reflections through learning analytics
Understand students’ self-reflections through learning analyticsUnderstand students’ self-reflections through learning analytics
Understand students’ self-reflections through learning analyticsVitomir Kovanovic
 
[2012] Empirical Evaluation on FBD Model-Based Test Coverage Criteria using M...
[2012] Empirical Evaluation on FBD Model-Based Test Coverage Criteria using M...[2012] Empirical Evaluation on FBD Model-Based Test Coverage Criteria using M...
[2012] Empirical Evaluation on FBD Model-Based Test Coverage Criteria using M...Donghwan Shin
 
Leveraging Feature Selection Within TreeNet
Leveraging Feature Selection Within TreeNetLeveraging Feature Selection Within TreeNet
Leveraging Feature Selection Within TreeNetagdavis
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionMartin Pinzger
 
Qu meeting PhD kessentini
Qu meeting PhD kessentiniQu meeting PhD kessentini
Qu meeting PhD kessentinikessentini
 
Qu meeting phd thesis kessentini
Qu meeting phd thesis kessentiniQu meeting phd thesis kessentini
Qu meeting phd thesis kessentinikessentini
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxKiranKumar918931
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningKarunakar Kotha
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningWenfan Xu
 
data_compression.pdf explains different
data_compression.pdf  explains differentdata_compression.pdf  explains different
data_compression.pdf explains differentJatin Patil
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detectionNASIM ALAM
 
Performance evaluation of IR models
Performance evaluation of IR modelsPerformance evaluation of IR models
Performance evaluation of IR modelsNisha Arankandath
 

Similaire à Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection (20)

Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
AIRS2016
AIRS2016AIRS2016
AIRS2016
 
Protein Secondary Structure Prediction using Deep Learning methods
Protein Secondary Structure Prediction using Deep Learning methodsProtein Secondary Structure Prediction using Deep Learning methods
Protein Secondary Structure Prediction using Deep Learning methods
 
Trivandrum
TrivandrumTrivandrum
Trivandrum
 
2012 predictive clusters
2012 predictive clusters2012 predictive clusters
2012 predictive clusters
 
Understand students’ self-reflections through learning analytics
Understand students’ self-reflections through learning analyticsUnderstand students’ self-reflections through learning analytics
Understand students’ self-reflections through learning analytics
 
[2012] Empirical Evaluation on FBD Model-Based Test Coverage Criteria using M...
[2012] Empirical Evaluation on FBD Model-Based Test Coverage Criteria using M...[2012] Empirical Evaluation on FBD Model-Based Test Coverage Criteria using M...
[2012] Empirical Evaluation on FBD Model-Based Test Coverage Criteria using M...
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
Leveraging Feature Selection Within TreeNet
Leveraging Feature Selection Within TreeNetLeveraging Feature Selection Within TreeNet
Leveraging Feature Selection Within TreeNet
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug Prediction
 
Qu meeting PhD kessentini
Qu meeting PhD kessentiniQu meeting PhD kessentini
Qu meeting PhD kessentini
 
Qu meeting phd thesis kessentini
Qu meeting phd thesis kessentiniQu meeting phd thesis kessentini
Qu meeting phd thesis kessentini
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptx
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
EiB Seminar from Esteban Vegas, Ph.D.
EiB Seminar from Esteban Vegas, Ph.D. EiB Seminar from Esteban Vegas, Ph.D.
EiB Seminar from Esteban Vegas, Ph.D.
 
data_compression.pdf explains different
data_compression.pdf  explains differentdata_compression.pdf  explains different
data_compression.pdf explains different
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detection
 
Performance evaluation of IR models
Performance evaluation of IR modelsPerformance evaluation of IR models
Performance evaluation of IR models
 

Plus de Jinho Choi

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Jinho Choi
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Jinho Choi
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Jinho Choi
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Jinho Choi
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionJinho Choi
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning RepresentationJinho Choi
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingJinho Choi
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet SimilaritiesJinho Choi
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical RelationsJinho Choi
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementJinho Choi
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingJinho Choi
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueJinho Choi
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingJinho Choi
 
Topological Sort
Topological SortTopological Sort
Topological SortJinho Choi
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseJinho Choi
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsJinho Choi
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyJinho Choi
 
Text-to-SQL with Data-Driven Templates
Text-to-SQL with Data-Driven TemplatesText-to-SQL with Data-Driven Templates
Text-to-SQL with Data-Driven TemplatesJinho Choi
 

Plus de Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 
Text-to-SQL with Data-Driven Templates
Text-to-SQL with Data-Driven TemplatesText-to-SQL with Data-Driven Templates
Text-to-SQL with Data-Driven Templates
 

Dernier

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Dernier (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection

  • 1. Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection Jinho D. Choi and Martha Palmer Institute of Cognitive Science, University of Colorado Boulder Supervised Learning Domain Adaptation Dynamic Model Selection Part-of-speech Tagging Training Decoding Dynamic Model Selection Experimental setup • Training corpus : The Wall Street Journal Sections 2-21 from OntoNotes v4.0. : 731,677 tokens, 30,060 sentences. • Tagging algorithm : A one-pass, left-to-right POS tagging algorithm. • Machine learning algorithm : Liblinear L2-regularization, L1-loss support vector classification. • Evaluation corpora Comparisons Experiments Conclusion • Our dynamic model selection approachimproves the robustness of POS tagging on heterogeneous data, and shows noticeably faster tagging speed against two other systems. •We believe that this approach can be applied to more sophisticated tagging algorithms and improve their robustness even further. ClearNLP • Open source projects: clearnlp.googlecode.com, clearparser.googlecode.com • Contact: Jinho D. Choi (choijd@colorado.edu) Conclusion Simplified word form • In a simplified word form, all numerical expressions are replaced with 0. •A lowercase simplified word form (LSW) is a decapitalized simplified word form. • Simplified word forms give more generalization to lexical features than their original forms. Regular expressions • A simplified word form is derived by applying the following regular expressions sequentially to the original word-form, w. •‘replaceAll’ is a function that replaces all matches of the regular expression inw(the 1st parameter) with the specific string (the 2nd parameter). 1. w.replaceAll(d%, 0) e.g., 1% → 0 2. w.replaceAll($d, 0) e.g., $1 → 0 3. w.replaceAll(∧.d, 0) e.g., .1 → 0 4. w.replaceAll(d(,|:|-|/|.)d, 0) e.g., 1,2|1:2|1-2|1/2|1.2 → 0 5. w.replaceAll(d+, 0) e.g., 1234 → 0 Pre-processing Target data Training data Model Target data Training data’ Target data Target data Training data’’ Model’ Model’ ’ How many models do we need to build? Do we always know about the target data? Target data Target data Model D Model G Do not assume the target data. Training data Target data Select one of two models dynamically. BC BN CN MD MZ NW WB Total Model D 91.81 95.27 87.36 90.74 93.91 97.45 93.93 92.97 Model G 92.65 94.82 88.24 91.46 93.24 97.11 93.51 93.05 G over D 50.63 36.67 68.80 40.22 21.43 9.51 36.02 41.74 Model S 92.26 95.13 88.18 91.34 93.88 97.46 93.90 93.21 Stanford 87.71 95.50 88.49 90.86 92.80 97.42 94.01 92.50 SVMTool 87.82 95.13 87.86 90.54 92.94 97.31 93.99 92.32 Genre All Tokens Unknown Tok’s Sentences BN Broadcasting news 31,704 3,077 2,076 BC Broadcasting conversation 31,328 1,284 1,969 CN Clinical notes 35,721 6,077 3,170 MD Medpedia articles 34,022 4,755 1,850 MZ Magazine 32,120 2,663 1,409 NW Newswire 39,590 983 1,640 WB Web-text 34,707 2,609 1,738 Tagging accuracies of all tokens (in %) BC BN CN MD MZ NW WB Total Model S 60.97 77.73 68.69 67.30 75.97 88.40 76.27 70.54 Stanford 19.24 87.31 71.20 64.82 66.28 88.40 78.15 64.32 SVMTool 19.08 78.35 66.51 62.94 65.23 86.88 76.47 47.65 Tagging accuracies of unknown tokens (in %) Stanford SVMTool Model S 421 1,163 31,914 Tagging speeds (tokens / sec.) •This work was supported by the SHARP program funded by ONC: 90TR0002/01. The content is solely the responsibility of the authors and does not necessarily represent the official views of the ONC. Acknowledgments Training Data Document N Document 1 . . . DF(LSW) >thD DF(LSW) >thG Model D Model G Domain-specific model : using lexical features whose DF(LFW) > 1 Generalized model : using lexical features whose DF(LFW) > 2 Separate documents Extract two sets of features Build two models Input Sentences Is Model D? Model D Model G YES NO Output Sentences Output Sentences Is the cosine similarity between LSWs of the input sentence and Model D is greater than a threshold?