Speaker: Marek Rei, Senior Research Associate, University of Cambridge
Summary: The number of people learning English around the world is currently estimated at 1.5 billion and is predicted to exceed 1.9 billion by 2020. The increasing need to communicate beyond borders has created a large unmet demand for qualified language teachers across the globe. Computational models for error detection and essay scoring can alleviate this issue by giving millions of people access to affordable learning resources. Successful systems for automated language teaching will need to analyse language at various levels of granularity and provide useful feedback to individual students.In this talk, we will explore some of the latest approaches to written language assessment, using neural architectures for composing the meaning of a sentence or text, and also discuss potential future directions in the field.
2. 2
Automated Language Assessment
The number of people learning English around the world is currently
estimated at 1.5 billion and is predicted to exceed 1.9 billion by 2020.
Advantages for students:
• Immediate grades and feedback
• Enables self-assessment and self-tutoring
• Constant availability as an online tool
Advantages for teachers/examiners:
• Reduced teacher/examiner workload
• Can focus on more interesting or difficult content
• Cost-effective approach to assessment
3. 3
Automated Language Assessment
Dear Mrs Brown,
I am writing you because my class want to give a
surprise birthday party for your husband Mr Brown. We
need your help for the details.
First of all could you let us know if the date of June 16th
is all right with his timetable program. We have
organised to do the party between three to six o'clock in
afternoon in College Canteen, about food we organised
a buffet, but could you also help us with the music which
he prefer, if prefer something especialy. We have invite
the student, the teachers and the Principal of school but
we appreciate if you are coming. At last would you tell
us which is the best present for him a compact disk or a
book .
We want say thanks again for your help and you must
be sure that your opinion it would be valuable to us.
I am looking forward to receiving your answer and don't
forget that it is a surprice birthday party.
Yours faithfuly,
Tom
Evaluation:
● Detect any writing errors
● Calculate a holistic writing score
● Predict language proficiency score
(IELTS, FCE)
● Detailed analytic scores (e.g.,
coherence, topic relevance)
Guidance:
● Show detailed progress reports
● Provide corrections for errors
● Suggest areas to focus on
● Generate suitable exercises
4. 4
Talk Overview
Error Detection
Identifying the locations of grammatical errors
01
Error Correction
Providing an edited version of an incorrect sentence
02
Applications and Future Directions
How do we make this useful and where do we go next
04
Essay Scoring
Estimating a language proficiency score based on the full text
03
9. 9
I want to thak you for preparing such a nice evening .
Error Detection in Learner Writing
Spelling error (8.6%)
I know how to cook some things like potatoes .
Missing punctuation (7.4%)
If you have time , why don’t you meet up .
Incorrect punctuation (7.1%)
I’m looking forward to seeing you and good luck to your project .
Incorrect preposition (6.3%)
My friend eats two ice creams yesterday .
Verb tense error (6.0%)
10. 10
We can invite also people who are not members .
Error Detection in Learner Writing
Word order error (2.8%)
The main material that have been used is dark green glass .
Verb agreement error (1.6%)
I thing you should better save your money .
Spelling error produces a valid word (1.5%)
And at last but not the least , Captain Davidson showed him ...
Incorrectly reproduced idiom (0.5%)
Specially the old castle Wawel's great .
Complex error (0.5%)
11. 11
Automated Error Detection
1. Experts have hand-annotated a
large dataset of learner essays,
marking the location of each error.
2. We create algorithms that can look
at all these examples and discover
regularities through machine
learning.
3. We apply the resulting models on
new data, where they are able to
provide predictions.
12. 12
Deep Learning and Neural Networks
• Highly-connected networks of
parameters
• Randomly initialised, but optimised for a
specific task during training
• Automatically discovering features that
are useful for the task
• Each layer is a function of the previous
layer
• Have achieved state-of-the-art results on
nearly all language processing tasks
13. 13
Neural Error Detection
Marek Rei and Helen Yannakoudakis (2016) Compositional Sequence Labeling Models for Error Detection
in Learner Writing. ACL 2016.
• Composing words into
context-specific
representations.
• Predicting a probability
distribution over all the
possible labels for each
word.
14. 14
System FCE CoNLL14-1 CoNLL14-2
BiLSTM 41.10 16.40 23.90
Neural Error Detection
First Certificate in English dataset (FCE, Yannakoudakis et al. (2011))
● 1,141 manually annotated essays, containing 450K words
● Written by learners during language examinations
● In response to prompts eliciting free-text answers
● Publicly available dataset
Evaluating error detection using F0.5
15. 15
Additional Training Data
System FCE CoNLL14-1 CoNLL14-2
Public FCE 41.10 16.40 23.90
Private CLC 64.30 34.30 44.00
More data = better performance
We can generate artificial data:
Additional training examples for error detection
Idea 1: Randomly generate errors in correct text
16. 16
Pattern-based Error Generation
Idea 2: Extract known error patterns and insert them into correct text
We went shop on Saturday
We went shopping on Saturday
VVD shop_VV0 II => VVD shopping_VVG II
I was shopping on Monday
I was shop on Monday
Marek Rei, Mariano Felice, Zheng Yuan and Ted Briscoe (2017) Artificial Error Generation with Machine Translation and
Syntactic Patterns. BEA 2017.
17. 17
Translation-based Error Generation
Idea 3: Train a machine translation model to translate from correct to
incorrect text
ORIG: We are a well-mixed class with equal numbers of boys and girls, all about 20 years old.
PAT: We are a well-mixed class with equal numbers of boys an girls, all about 20 year old.
MT: We are a well-mixed class with equals numbers of boys and girls, all about 20 years old.
Normally translate between languages:
E.g. English to French
Now let’s translate for generating errors:
English to faulty English
Can use off-the-shelf machine translation tools
Marek Rei, Mariano Felice, Zheng Yuan and Ted Briscoe (2017) Artificial Error Generation with Machine Translation and
Syntactic Patterns. BEA 2017.
18. 18
System FCE CoNLL14-1 CoNLL14-2
BiLSTM 41.10 16.40 23.90
+PAT 47.81 19.47 28.49
+MT 48.37 19.73 28.39
+PAT+MT 49.11 21.87 30.13
Artificial Error Generation
Training on 450K words of annotated data and 4.5M words of automatically
generated data.
20. 20
Error Correction
Error detection identifies incorrect words
Error correction modifies a sentence to remove errors
We can formulate correction as a machine translation problem:
Let’s translate from incorrect English to correct English
Returns the highest scoring possible translation
Input: We can invite also people who are not members .
Output: We can also invite people who are not members .
21. 21
Statistical Machine Translation
Text is separated into multi-word units (phrases)
Phrase alignments and translation tables are learned from parallel
datasets
Language models are used to ensure reasonable output
22. 22
Neural Machine Translation
The encoder learns to process the source sentence and produce an
informative vector representation
The decoder learns to generate a sentence in a different language based
on that vector
Bahdanau et al. (2014), figure by Stephen Merity.
23. 23
Input: I aren’t seen Albert since last summer .
Output: I haven’t seen OOV since last summer .
Handling Unknown Words
Neural models have a limited fixed vocabulary and represent other words
as OOV tokens.
Solution:
1) Align the words between the input and output text
2) Translate OOV words in a post-processing step
Zheng Yuan and Ted Briscoe (2016) Grammatical error correction using neural machine translation. NAACL 2016.
25. 25
Original sentence:
There are some informations you have asked me about.
SMT output:
1st There are some information you have asked me about.
2nd There is some information you have asked me about.
3rd There are some information you asked me about.
4th There are some information you have asked me.
5th There are some information you have asked me for.
N-best List
26. 26
The correction system may not know how to fix an error, therefore leave
it uncorrected.
How can we use the detection model to fix this problem and assign a
better score to each “translation”?
+ + + + + + - -
The theatre restaurant was closed for unknown reason
Scoring Candidates
27. 27
How can we use the detection model to fix this problem and assign a
better score to each “translation”?
1.0 1.0 1.0 0.9 1.0 1.0 0.3 0.1
The theatre restaurant was closed for unknown reason
Scoring Candidates
1. Sentence correctness score: calculated based on the probability of
each of its tokens being correct.
2. Correction recall score: select the translation that has modified the
(maximum number of) words marked by the detection model as
incorrect.
3. Correction agreement score: the ratio of agreed corrections compared
to the disagreed corrections.
Helen Yannakoudakis, Marek Rei, Øistein E. Andersen and Zheng Yuan (2017) Neural Sequence-Labelling Models for
Grammatical Error Correction. EMNLP 2017.
29. 29
Original sentence:
I work with children an the Computer help my Jop
bat affeted to
MT output:
I work with children and the Computer help my Jop
bat affeted to
MT+detection output:
I work with children and the computer helps my Jop
bat affeted to
Error Correction Results
30. 30
Original sentence:
It takes 25 minutes that is convenient to us
MT output:
It takes 25 minutes that is convenient for us
MT+detection output:
It takes 25 minutes , which is convenient for us
Error Correction Results
31. 31
Original sentence:
I hope that our friend Richard Brown doesn’t have
any serious willness
MT output:
I hope that our friend Richard Brown doesn’t have
any serious willness
MT+detection output:
I hope that our friend Richard Brown doesn’t have
any serious willingness
Error Correction Results
34. 34
Feature-based Essay Scoring
Extract a number of features:
● Word sequences
○ Unigrams
○ Bigrams
○ Trigrams
● Part-of-speech tags
● Grammatical
constructions
● Complexity measures
● Semantic similarity
between sentences
● Estimated error count
Helen Yannakoudakis, Ted Briscoe and Ben Medlock (2011) A New Dataset and Method for Automatically Grading ESOL
Texts. ACL 2011.
37. 37
Score-specific Word Embeddings
Optimising word embeddings to:
1) differentiate between correct
and randomly corrupted
sequences
2) predict the score of the
essay where the current
word sequence came from
Then use these embeddings in a
neural network for essay scoring.
Dimitrios Alikaniotis, Helen Yannakoudakis and Marek Rei (2016) Automatic Text Scoring Using Neural Networks.
ACL 2016.
38. 38
Score-specific Word Embeddings
Pre-training Spearman (⍴) % RMSE
None 68 7.31
word2vec 79 3.2
SSWE 91 2.4
Evaluating score-specific word embeddings on the ASAP dataset: 13K marked
essays (150-550 words each).
Using a two-layer bi-directional LSTM for essay scoring.
39. 39
Error-specific Word Embeddings
Taking advantage of the available
error annotation in the training
data.
Optimising embeddings to detect
real errors, as opposed to
randomly corrupted sequences.
Network predicts the quality of
each word sequence, based on
the number of errors it contains.
Youmna Farag, Marek Rei and Ted Briscoe (2017) An Error-Oriented Approach to Word Embedding Pre-Training.
BEA 2017.
40. 40
Pre-training Spearman (⍴) % RMSE
word2vec 56.7 4.9
Glove 51.8 5.2
SSWE 58.3 4.9
ESWE 63.7 4.5
Error-specific Word Embeddings
Evaluating error-specific word embeddings on the FCE dataset.
Using the convolutional network for essay scoring.
43. 43
Future Directions
Specialised systems
Supervised models
targeting specific error
types
Multi-task learning
Taking better advantage
of other tasks and
datasets
Multi-modal topics
Students writing about
images or videos
44. 44
Summary
Error detection
Neural sequence labelling architecture
Artificial data generation
01
Error correction
Neural machine translation
Reranking with detection
02
Essay scoring
Feature-based model
Neural essay scoring
Score-specific word embeddings
03
BE THE BEST MARKETING COMPANY