The detection and correction of grammatical errors still represent very hard problems for modern error-correction systems. As an example, the top-performing systems at the preposition correction challenge CoNLL-2013 only achieved a F1 score of 17%.
In this paper, we propose and extensively evaluate a series of approaches for correcting prepositions, analyzing a large body of high-quality textual content to capture language usage. Leveraging n-gram statistics, association measures, and machine learning techniques, our system is able to learn which words or phrases govern the usage of a specific preposition. Our approach makes heavy use of n-gram statistics generated from very large textual corpora. In particular, one of our key features is the use of n-gram association measures (e.g., Pointwise Mutual Information) between words and prepositions to generate better aggregated preposition rankings for the individual n-grams.
We evaluate the effectiveness of our approach using cross-validation with different feature combinations and on two test collections created from a set of English language exams and StackExchange forums. We also compare against state-of-the-art supervised methods. Experimental results from the CoNLL-2013 test collection show that our approach to preposition correction achieves ~30% in F1 score which results in 13% absolute improvement over the best performing approach at that challenge.
CIKM14: Fixing grammatical errors by preposition ranking
1. Correct Me If I’m Wrong: Fixing Grammatical
Errors by Preposition Ranking
Roman Prokofyev, Ruslan Mavlyutov, Gianluca Demartini and
Philippe Cudre-Mauroux
eXascale Infolab
University of Fribourg, Switzerland
November 4th, CIKM’14
Shanghai, China
1
2. Motivations and Task Overview
2
• Grammatical correction is important by itself
• Also as a part of Machine Translation or Speech Recognition
Correction of textual content written by English Learners.
I am new in android programming.
[to, at, for, …]
⇒ Rank candidate prepositions by their likelihood of being
correct in order to potentially replace the original.
3. 3
State-of-Art in Grammar Correction
Approaches:
• Multi-class classification using pre-defined candidate set
• Statistical Machine Translation
• Language modeling
Features:
• Lexical features, POS tags
• Head verbs/nouns, word dependencies
• N-gram counts from large corpora (Google Web 1-T)
• Confusion matrix
4. Key Ideas
• Usage of a particular preposition is governed by a
particular word/n-gram;
⇒ Task: select/aggregate n-grams that influence
preposition usage;
⇒ Use n-gram association measures to score each
preposition.
4
5. Processing Pipeline
5
1 ... m
n-gram
construction
Sentence
Tokenization
Supervised
Classifier
Feature
extrFaecatitounre
extFraecattiuornes N-gram
Statistics
Document
foreach
foreach
Sentence
Extraction
Corrected
Sentence
Corrected
Document
List of
n-grams
Ranked list
of
prepositions
for n-gram
6. 6
Tokenization and n-gram distance
N-gram Type Distance
the force PREP 3gram -2
force be PREP 3gram -1
be PREP you 3gram 0
PREP you . 3gram 1
N-gram Type Distance
be PREP 2gram -1
PREP you 2gram 1
PREP . 2gram 2
7. N-gram association measures
7
Motivation:
use association measures to compute a score that will be
proportional to the likelihood of an n-gram appearing
together with a preposition.
N-gram PMI scores by preposition
force be PREP (with: -4.9), (under: -7.86), (at: -9.26), (in: -9.93), …
be PREP you (with: -1.86), (amongst: -1.99), (beside: -2.26), …
PREP you . (behind: -0.71), (beside: -0.82), (around: -0.84), …
Background N-gram collection: Google Books N-grams.
8. N-grams with determiner skips
8
Around 30% prepositions are used in proximity to
determiners (“a”, “the”, etc.)
Determiners Nouns
Correct preposition ranks
“one of the most” “one of most”
9. PMI-based Features
9
• Average rank of a preposition among the ranks of the
considered n-grams;
• Average PMI score of a preposition among the PMI
scores of the considered n-grams;
• Total number of occurrences of a certain preposition on
the first position in the ranking among the ranks of the
considered n-grams.
Calculated across 2 logical groups (considered n-grams):
• N-gram size;
• N-gram distances.
10. Central N-grams
10
Distribution of correct preposition counts on top of PMI
rankings with respect to n-gram distance.
11. Other features
• Confusion matrix values
• POS tags: 5 most frequent tags + “OTHER” catch-all tag;
• Preposition itself: sparse vector of the size of the
candidate preposition set.
11
to in of for on at with from
to 0.958 0.007 0.002 0.011 0.004 0.003 0.005 0.002
in 0.037 0.79 0.01 0.009 0.066 0.036 0.015 0.008
12. Preposition selection
Supervised Learning algorithm.
• Two-class classification with a confidence score for every
preposition from the candidate set;
• Every candidate preposition will receive its own set of
feature values;
Classifier: random forest.
12
13. Training/Test Collections
Training collection:
• First Certificate of English (Cambridge exams)
Test collections:
• CoNLL-2013 (50 essays written by NUS students)
• StackExchange (historical edits)
Cambridge FCE CoNLL-2013 StackExhange
N# sentences 27k 1.4k 6k
13
15. 15
Experiments: Feature Importances
Feature name Importance score
Confusion matrix probability 0.28
Top preposition counts (3grams) 0.13
Average rank (distance=0) 0.06
Central n-gram rank 0.06
Average rank (distance=1) 0.05
All top features except “confusion matrix” are based on the
PMI scores.
16. N-gram Sizes and Distances
16
Distance restriction Precision Recall F1 score
(0) 0.3077* 0.3908* 0.3442*
(-1,1) 0.3231 0.4166 0.3637
(-2,2) 0.3214 0.4222 0.3648
(-5,5) 0.3223 0.4028 0.3577
None 0.3214 0.3924* 0.3532
N-gram sizes Precision Recall F1 score
{2,3}-grams 0.3005* 0.3879* 0.3385*
{3}-grams + skip n-grams (4) 0.2931* 0.4187 0.3447
{2,3}-grams + skip n-grams 0.3231 0.4166 0.3637
* indicates a statistically significant difference to the best performing
approach (in bold)
18. Takeaways
18
• PMI association measures
• + Two-class preposition ranking
⇒ allow to significantly outperform the state of the art.
• Skip n-grams contribute significantly to the final result.
Roman Prokofyev (@rprokofyev)
eXascale Infolab (exascale.info), University of Fribourg, Switzerland
http://www.slideshare.net/eXascaleInfolab/
19. Two-class vs. Multi-class classification
Multi-class:
• input: one feature vector
• output: one of N classes (1-N)
Two-class (still with N possible outcomes)
• input: N feature vectors
• output: 1/0 for each of N feature vectors
19
20. Training/Test Collections
20
Training collection:
• Cambridge Learner Corpus: First Certificate of English
(exams of 2000-2001), CLC FCE
Test collections:
• CoNLL-2013 (50 essays written by NUS students)
• StackExchange (StackOverflow + Superuser)
CLC FCE CoNLL-2013 SE
N# sentences 27k 1.4k 6k
N# prepositions 60k 3.2k 15.8k
N# prepositional errors 2.9k 152 6k
% errors 4.8 4.7 38.2
Welcome everyone, my name is Roman Prokofyev, and I’m a PhD student at the eXascale Infolab, here at the University of Fribourg.
And today I will talk Fixing Grammatical Errors by Preposition Ranking.
I’m going to start directly with a problem and task overview, using an example sentence…
LM – assigning different probability
particulate word/phrase - May be it’s not the case, specially when it’s reference to as a pronoun (determined by another sentence)
Distance between a preposition and a closest word to a preposition in the n-gram we are considering.
Large corpus helps to overcome data sparsity of PMI
Google n-gram corpus does not contain skip n-gram, so we made them ourselves.