The document presents a method for joint English spelling error correction and part-of-speech (POS) tagging for language learners' writing. It proposes analyzing the text by deleting word boundaries and building a lattice to find the lowest cost path that considers both spelling corrections and POS tags. An experiment on ESL learner corpora shows the joint method improves over baselines in both POS tagging accuracy by up to 5.9% and spelling correction recall by 19.5%, demonstrating the mutual benefit of jointly modeling spelling and syntax.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
COLING12_sakaguchi
1. Keisuke SAKAGUCHI
Joint English Spelling Error Correction and
POS Tagging for Language Learners Writing
Keisuke SAKAGUCHI, Tomoya MIZUMOTO,
Mamoru KOMACHI, and Yuji MATSUMOTO
Nara Institute of Science and Technology
(NAIST), Japan
@COLING 2012, Mumbai, India.
7. Keisuke SAKAGUCHI
POS tagging for misspelled texts
7
I am quiet good.
('I', 'PRP'), ('am', 'VBP'), ('quiet', 'JJ'), ('good', 'JJ'), ...
'quite', 'RB'
She felt really badly after ...
... ('really', 'RB'), ('badly', 'RB'), ('after', 'IN'),
'bad', 'JJ'
Spelling errors interfere with POS tagging
8. Keisuke SAKAGUCHI
POS tagging for misspelled texts
8
... research and some analysys.
... ('and', 'CC'), ('some', 'DT'), ('analysys', 'NNS')
CANDIDATES: ('analysis', 'NN'), ('analyses', 'NNS')
POS tags help spelling correction.
I am hear to help you.
('I', 'PRP'), ('am', 'VBP'), ('hear', 'VB'), ...
CANDIDATES: ('hear', 'VB'), ('here', 'RB')
9. Keisuke SAKAGUCHI
Problems are related to each other
9
Spelling Correction
(will be improved by POS tags)
POS tagging
(interfered with spelling correction)
10. Keisuke SAKAGUCHI
Problems are related to each other
10
Spelling Correction
(will be improved by POS tags)
POS tagging
(interfered with spelling correction)
11. Keisuke SAKAGUCHI
Problems are related to each other
11
Spelling Correction
(will be improved by POS tags)
POS tagging
(interfered with spelling correction)
Joint analysis
12. Keisuke SAKAGUCHI
For merge & split errors
12
We donʼ’t have a casual dresscode.
'dress', 'NN' 'code', 'NN'
Word boundary disambiguation
(Japanese, Chinese, etc.)
14. Keisuke SAKAGUCHI
Joint spelling correction and POS tagging
Algorithm
1. delete all word boundaries of input
repeat //build a lattice of the input
2. read next letter and look up lexicon
until the end of the sentence
3. find the path with the lowest cost
14
25. Keisuke SAKAGUCHI
Experiment
• Goal: POS tagging and spelling correction
accuracy on the proposed method
– POS tagging
– Spelling Correction
Spelling
Correction
POS tagging
POS tagging
(baseline) Spelling Correction
(with LM)
POS tagging
Spelling
Correction
(with LM) Spelling
Correction
POS tagging
BASELINE PIPELINE JOINT
JOINTBASELINE
VS
VS
VS
25
26. Keisuke SAKAGUCHI
Experiment
• Dataset (ESL learners corpora)
– Cambridge Learners Corpus First Certificate in English
(CLC FCE) dataset
• Reference POS tags are automatically assigned
– Konan-‐‑‒JIEM learner corpus (KJ corpus)
• Reference POS tags are annotated by human
• Tools
– MeCab 0.98 (cost calc. + decoding)
– NAIST English Dictionary 0.2 (NAIST edic)
– GNU Aspell (speller)
– Language Model (IRSTLM + Google 1T corpus)
26
30. Keisuke SAKAGUCHI
Analysis
• True positives (typographical)
30
INPUT: … it is a surprice birthday party.
ANSWER: surprise (NN)
BASELINE: sur (JJ) price (NN)
PIPELINE: surprise (NN)
JOINT: surprise (NN)
31. Keisuke SAKAGUCHI
Analysis
• True positives (homophone)
31
INPUT: I am hear to help you.
ANSWER: here (RB)
BASELINE: hear (VB)
PIPELINE: hear (VB)
JOINT: here (RB)
32. Keisuke SAKAGUCHI
Analysis
• True positives (derivation)
32
INPUT: She felt really badly after ...
ANSWER: bad (JJ)
BASELINE: badly (RB)
PIPELINE: badly (RB)
JOINT: here (RB)
33. Keisuke SAKAGUCHI
Analysis
• True positives (typo + merge)
33
INPUT: ... a big swimingpool .
ANSWER: swimming (NN) pool (NN)
BASELINE: swim (RB) ing (VBP) pool (NN)
PIPELINE: seemingly (RB)
JOINT: swimming (NN) pool (NN)
34. Keisuke SAKAGUCHI
Error Analysis (Spelling)
• False positives
34
INPUT: After the concer finished ...
ANSWER: concert
BASELINE: cancer
JOINT: cancer
INPUT: technology has made life for ...
ANSWER: made
BASELINE: made
JOINT: mad
35. Keisuke SAKAGUCHI
Error Analysis (Spelling)
• False negatives
35
INPUT: ... your shown one evening ...
ANSWER: show
BASELINE: shown
JOINT: shown
INPUT: I was terrible hungry ...
ANSWER: terribly
BASELINE: terrible
JOINT: terrible
36. Keisuke SAKAGUCHI
Summary
• Spelling Errors in ESL writing
– typographical, homophone, confusion, split, merge, inflection,
derivation
• Joint spelling correction and POS tagging
– Morphological analysis with word boundary disambiguation
• correct lexicon + lexicon of spelling errors
• Result
– Joint model outperforms both baseline and pipeline.
36