ADAPT and a bit of my NLP work

ADAPT and a bit of my NLP
Lifeng Han (Aaron) / ADAPT @DCU, Dublin
LIFENG.HAN@adaptcentre.ie
Suda, Suzhou, 2017, May
<Unto a Full-grown Man / 養天地正氣法古今完⼈人>
The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.

www.adaptcentre.ieLifeng Han (Aaron 亞倫倫)
2016.12-on, PhD student in ADAPT Centre @ DCU
2016.10-11, RA researcher in ADAPT Centre, Dublin
2016.03-2016.07, Guest researcher in Uni. of Amsterdam
2014.09-2016.02, Student/employee in Uni. of Amsterdam
2014.08, Msc. in software engineering and Bs. in Maths
2011-2014, RA and student in NLP2CT lab/ Uni. of Macau
Codes: [github.com/poethan ]
Talk-Slides: [github.com/poethan/slides ]
Network: [www.linkedin.com/in/aaronhan ]

www.adaptcentre.ie
Group in UvA, Amsterdam

www.adaptcentre.ieContent
About ADAPT centre, Ireland
- general
- groups / topics
- activities
- corporations
A bit of my NLP
- MT Evaluation Models
- QE Models
- MWE shared task
- DL4MT phd topic
- other works
Why ‘a bit’? u guys have so many awesome papers/works! Congrats!

www.adaptcentre.ieADAPT - general
http://adaptcentre.ie/
It is a joint research centre with 4 uni. DCU/TCD/UCD/DIT
Located in Dublin, ADAPT- DCU/TCD lab
Former name: CNGL, continued research, some people left, while
coming more.
Funding applied by PI-s from different Uni.
Fundings from Irish Science Foundation and EU.

www.adaptcentre.iewww.adaptcentre.ie/about

www.adaptcentre.ieADAPT - groups/topics
broad research topics:
“ADAPT research is spearheading the development of next-generation digital
technologies that enable seamless tech-mediated interaction and
communication. The breadth of ADAPT's research expertise is unique globally
and the Centre's structure supports collaborative innovation with industry to
unlock the potential of digital content. ADAPT has attracted over €50million
research funding from Science Foundation Ireland and industry
collaborations”.

Social Media / NLP / Knowledge Management / NN / Digital Content / ML /
Multimedia Content Summary / Sentiment Analysis / Ethics and Privacy / AI /
Image and Video / Personalisation / Search and IR / DL / MT / Multimodal
Interaction / Semantic Web and Linked Data / Virtual and Augmented Reality

www.adaptcentre.iewww.adaptcentre.ie/research
Research Themes:
Understanding Global Content
Transforming Global Content
Personalising the User Experience
Interacting with Global Content
Managing the Global Conversation

www.adaptcentre.ieADAPT - activities
ADAPT has many meetings/gatherings:
Monthly 101 seminar: different topics each time
Science meeting: every two months
ADAPT Industrial showcase
Social meetup professionals: Dublin ML (host) /NLP meetup
Social meetup fun: pingpang (German/Spanish winner), etc.
Also join:
- Faculty industrial showcase
- University research open days

www.adaptcentre.ieADAPT - corporation
ADAPT has some cooperation (research/industrial projects / intern,
etc.):
Huaway / iFlytec / DID / Microsoft / Iconic / FBD / Intel
Linkedin,
IBM,
Accenture,
eBay,
etc.
http://www.adaptcentre.ie/industry
always welcome cooperation

www.adaptcentre.ieA bit of my NLP - Aaron
pic from http://www.contrib.andrew.cmu.edu/~dyafei/NLP.html

www.adaptcentre.ieA bit of my NLP
A bit of my NLP
- MT Evaluation Models (2012 -)
- QE (Quality Estimation) tasks (2013 - )
- MWE (Multi-word Expression) tasks (2017 -)
- DL4MT (Deep Learning for MT) phd topic (2017 -)
- other work (CWS/ CNER/ Uni-Treebank/ Paraphrase) (2013 - )

www.adaptcentre.ieA bit of my NLP - MT Evaluation
LEPOR MT Evaluation Metric series (2012-on):
https://en.wikipedia.org/wiki/LEPOR
Motivation:
• With the rapid development of MT, how to evaluate the MT
model?
– Whether the newly designed algorithm/feature enhance the existing
MT system, or not?
– Which MT system yields the best output for specified language pair,
or generally across languages?
• Difficulties in MT evaluation:
– language variability results in no single correct translation
– natural languages are highly ambiguous and different languages do
not always express the same content in the same way (Arnold, 2003)

www.adaptcentre.ie
• Manual MTE methods:
• Traditional Manual judgment
– Intelligibility: how understandable the sentence is
– Fidelity: how much information the translated sentence
retains as compared to the original
• by the Automatic Language Processing Advisory Committee
(ALPAC) around 1966 (Carroll, 1966)
– Adequacy (similar as fidelity)
– Fluency: whether the sentence is well-formed and fluent
– Comprehension (improved intelligibility)
• by Defense Advanced Research Projects Agency (DARPA) of US
(Church et al., 1991; White et al., 1994)

www.adaptcentre.ie
• Advanced manual judgment:
– Task oriented method (White and Taylor, 1998)
• In light of the tasks for which the output might be used
– Further developed criteria
• Bangalore et al. (2000): simple string accuracy/ generation string
accuracy/ two corresponding tree-based accuracies.
• LDC (Linguistics Data Consortium): 5-point scales fluency & adequacy
• Specia et al. (2011): design 4-level adequacy, highly adequacy/fairly
adequacy /poorly adequacy/completely inadequate
– Utilizing Post-editing
• Snover et al., (2006) HTER.
– Segment ranking (WMT 2011~2013)
• Judges are asked to provide a complete ranking over all the
candidate translations of the same source segment (Callison-Burch et
al., 2011, 2012)
• 5 systems are randomly selected for the judges (Bojar et al., 2013)

www.adaptcentre.ie
• Problems in Manual MTE
– Time consuming
• How about a document contain 3,000 sentences or more
– Expensive
• Professional translators? or other people?
– Unrepeatable
• Precious human labor can not be simply re-run
– Low agreement, sometimes (Callison-Burch et al., 2011)
• E.g. in WMT 2011 English-Czech task, multi-annotator
agreement kappa value is very low
• Even the same strings produced by two systems are ranked
differently each time by the same annotator

www.adaptcentre.ie
• How to address the problems?
– Automatic MT evaluation!
• What do we expect? (as compared with manual judgments)
– Repeatable
• Can be re-used whenever we make some change of the MT system, and plan to
have a check of the translation quality
– Fast
• several minutes or seconds for evaluating 3,000 sentences
• V.s. hours of human labor
– Cheap
• We do not need expensive manual judgments
– High agreement
• Each time of running, result in same scores for un-changed outputs
– Reliable
• Give a higher score for better translation output
• Measured by correlation with human judgments

www.adaptcentre.ieAutomatic MT Evaluation
- Lexical similarity
Edit distance: WER / PER / TER
Precision and recall: BLEU / ROUGE / F / METEOR
Word order: ATEC / PORT / LEPOR
- Linguistic Features
Syntax: POS/ Phrase / sentence structure / permutation trees
Semantics: Named entity / Synonyms / Textual entailment /
Paraphrase / Semantic roles / Language Models /
Detailed reference, see: [http://arxiv.org/abs/1605.04515]
- Machine Translation Evaluation: A Survey (draft)

www.adaptcentre.ieWER/TER/HTER
•

www.adaptcentre.ie
Weak points of MTE (at that time):
• Good performance on certain language pairs
– Perform lower with language pairs when English as source compared
with English as target
– E.g. TER (Snover et al., 2006) achieved 0.83 (Czech-English) vs 0.50
(English-Czech) correlation score with human judgments on
WMT-2011 shared tasks
• rely on many linguistic features for good performance
– E.g. METEOR rely on both stemming, and synonyms, etc.
• Employ incomprehensive factors
– E.g. BLEU (Papineni et al., 2002) based on n-gram precision score
– higher BLEU score is not necessarily indicative of better translation
(Callison-Burch et al., 2006)

www.adaptcentre.ieOur Models
Our design (to solve some of the problem):
• Our designed methods
– to make a comprehensive judgments: Enhanced/
Augmented factors (enhanced length penalty + ngram
position difference penalty + Fscore)
– to deal with language bias (perform differently across
languages) problem: Tunable parameters
• Try to make a contribution on
– evaluation with English as the source language
– some low-resource language pairs, e.g. Czech-English

www.adaptcentre.ie1.Enhanced Length Penalty
•

www.adaptcentre.ie2.N-gram Position Difference Penalty
•

www.adaptcentre.ie
• Step 1: N-gram word alignment (single reference)
• N-gram word alignment
– Alignment direction fixed: from hypothesis (output) to reference
– Considering word neighbors, higher priority shall be given for
the candidate matching with neighbor information
• As compared with the traditional nearest matching strategy, without
consider the neighbors
– If both the candidates have neighbors, we select nearest
matching as backup choice

www.adaptcentre.ie
Alignment sample with n-gram nearby content

www.adaptcentre.ieWhen multi-references
• Design the n-gram word alignment for multi-reference
situation
• N-gram alignment for multi-reference:
– The same direction, output to references
– Higher priority also for candidate with neighbor information
– Adding principle:
• If the matching candidates from different references all have
neighbors, we select the one leading to a smaller NPD value
(backup choice for nearest matching)

www.adaptcentre.ie3.Weighted harmonic mean of P and R
•

www.adaptcentre.ieInitial LEPOR
•

www.adaptcentre.ieExperiments
•

www.adaptcentre.ie
• Corpora:
– Development data for tuning of parameters
– WMT2008 (http://www.statmt.org/wmt08/)
– EN: English, ES: Spanish, DE: German, FR: French and CZ: Czech
– Two directions: EN-other and other-EN
– Testing data
– WMT2011 (http://www.statmt.org/wmt11/)
– The numbers of participated automatic MT systems in WMT 2011
– 10, 22, 15 and 17 respectively for English-to-CZ/DE/ES/FR
– 8, 20, 15 and 18 respectively for CZ/DE/ES/FR-to-EN
– The gold standard reference data consists of 3,003 sentences

www.adaptcentre.ie
The system-level Spearman correlation with human judgment on WMT11 corpora
- LEPOR yielded three top one correlation scores on CZ-EN / ES-EN / EN-ES
- LEPOR showed robust performance across langauges, resulting in top one Mean
score
COLING12: LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors

www.adaptcentre.ieFurther LEPOR
•

www.adaptcentre.ie
• Example of bigram (n=2) block matching for
bigram precision and bigram recall:
• Similar strategies for n>=3, block matching
– For the calculation of n-gram precision and recall

www.adaptcentre.ieLinguistics Aspacts
• Enhance the metric with concise linguistic
feature:
• Example of Part-of-speech (POS) utilization
– Sometimes perform as synonym information
– E.g. the “say” and “claim” in the example translation

www.adaptcentre.ieExperiments-2
•

www.adaptcentre.ie
• Comparison (Metrics) with related works:
– In addition to the state-of-the-art metrics METEOR /
BLEU / TER
– Compare with ROSE (Song and Cohn, 2011) and MPF
(Popovic, 2011)
– ROSE and MPF metrics both utilize the POS as external
information

www.adaptcentre.ie
Tuned parameter values of our enhanced method
System-level Spearman correlation with human judgment on WMT11 corpora
Our enhanced method yielded the highest Mean score 0.83 on eight language
pairs
MTsummit: Language-independent Model for Machine Translation Evaluation with Reinforced Factors

www.adaptcentre.ieIn Shared Task
• Performances on MT evaluation shared tasks in
ACL-WMT 2013
– The eighth international workshop of statistical
machine translation, accompanied with ACL-2013
• Corpora:
– English, Spanish, German, French, Czech, and
Russian (new)

www.adaptcentre.ieSubmission
•

www.adaptcentre.ie
System-level Pearson (left)/Spearman (right) correlation score with human judgment
WMT13: A Description of Tunable Machine Translation Evaluation Systems in WMT13 Metrics Task

www.adaptcentre.ie
• From the shared task results
– Practical performance: LEPOR methods are effective
yielding generally higher across language pairs
– Robustness: LEPOR methods achieved the first highest
score on the new language pair English-Russian
– Contribution for the existing weak point: MT
evaluation with English as the source language
• Codes: LEPOR and further models
– github.com/poethan/LEPOR
– https://github.com/poethan

www.adaptcentre.ieMy a bit NLP - II - QE
• WMT13: www.statmt.org/wmt13/
• Task 1.1 (sentence-level quality
estimation)
• Task 1.2 (system selection)
• Task 2 (word-level quality estimation).

www.adaptcentre.ie
Task 1.1 sentence-level QE
• Two variants of the results can be submitted:
– Scoring: A quality score for each sentence translation
in [0,1], to be interpreted as an HTER(Human Translation
Error Rate)
score; lower scores mean better translations.
– Ranking: A ranking of sentence translations for a
number of source sentences, produced by the same
MT system, from best to worst. The reference ranking
will be defined based on the true HTER scores.

www.adaptcentre.ie
• Contributions:
– We developed the English and Spanish POS
tagset mapping as shown in Table 1. The 75
Spanish POS tags yielded by the Treetagger
(Schmid, 1994) are mapped to the 12 universal
tags developed in (Petrov et al., 2012).
ADJ ADP ADV CONJ DET NOUN NUM PRON PRT VERB X .
ADJ PREP, PREP/DEL ADV, NEG CC, CCAD,
CCNEG,
CQUE, CSUBF,
CSUBI, CSUBX
ART NC, NMEA,
NMON, NP,
PERCT,
UMMX
CARD,
CODE, QU
DM, INT,
PPC, PPO,
PPX, REL
SE VCLIger,
VCLIinf,
VCLIfin, VEadj,
VEfin, VEger,
VEinf, VHadj,
VHfin, VHger,
VHinf, VLadj,
VLfin, VLger,
VLinf, VMadj,
VMfin, VMger,
VMinf, VSadj,
VSfin, VSger,
VSinf
ACRNM, ALFP,
ALFS, FO, ITJN,
ORD, PAL,
PDEL, PE, PNC,
SYM
BACKSLASH, CM,
COLON, DASH, DOTS,
FS, LP, QT, RP,
SEMICOLON, SLASH

www.adaptcentre.ie
– The proper application of the developed EN-ES
POS tagset mapping are shown in the sentence-level
Quality Estimation and the results are acceptable.
– We developed a new metric EBLEU to show the
applicability of traditional evaluation criteria into
the advanced Quality Estimation tasks.

www.adaptcentre.ie
• Training data:
– contains 2,254 sentences for source English and
target Spanish, post-edited Spanish, HTER scores,
post-editing scores.
• Testing data:
– contains 500 sentences for source English and
target Spanish.

www.adaptcentre.ie
• Official results:
MAE RMSE DeltaAvg
Spearman
Corr
EBLEU 16.97 21.94 2.74 0.11
Baseline SVM 14.81 18.22 8.52 0.46
poor scores than baseline SVM

www.adaptcentre.ie
Task 1.2 System selection
• Participants will be required to rank up to
five (random selected) translations for the
same source sentence produced by multiple
MT systems.

www.adaptcentre.ie
• Contributions:
– Our experiments confirm that NB performs faster
the SVM in system selection task; however SVM
yields better results than NB.
– Our results using NB-LPR achieve higher |Tau|
correlation score than baseline on EN-ES system
selection task.

www.adaptcentre.ie
• Designed methods:
–We score the five alternative translation sentences as compared to the source sentence
according to the closeness of their POS sequences.
–When we convert the absolute scores into the corresponding rank values, the variant
EBLEU-I means that we use five fixed intervals (with the span from 0 to 1) to achieve the
alignment as shown in table.
– Table 2: Convert absolute scores into ranks
–The metric EBLEU-A, “A” means average. The absolute sentence edit scores are converted
into the five rank values with the same number (average number). For instance, if there are
1000 sentence scores in total then each rank level (from 1 to 5) will gain 200 scores from
the best to the worst.
[1,0.4) [0.4, 0.3) [0.3, 0.25) [0.25, 0.2) [0.2, 0]
5 4 3 2 1

www.adaptcentre.ie
• Features
– NB-LPR model - Naïve Bayes
• Length penalty (introduced in previous section),
• Precision,
• Recall,
• Rank values: officially offered training data, and from EBLEU
metric.
– SVM-LPR model - Support Vector Machine
• Length penalty,
• Precision,
• Recall and
• Rank values.

www.adaptcentre.ie
• Training data 1.2:
– contains the English-Spanish and German-English,
WMT09,10,11,12. Each source document contains
hundreds of sentences, and in the corresponding
target document, each source sentence is aligned
with up to five target sentences. The target
sentences are ranked by human efforts.
• Testing data:
– contains the English(200+)-Spanish and
German(300+)-English. Each source sentence has up
to five target sentences.

www.adaptcentre.ie
Result:
DE-EN EN-ES
Methods
Tau(ties
penalized)
|Tau|(ties
ignored)
Tau(ties
penalized)
|Tau|(ties
ignored)
EBLEU-I -0.38 -0.03 -0.35 0.02
EBLEU-A N/A N/A -0.27 N/A
NB-LPR -0.49 0.01 N/A 0.07
Baseline -0.12 0.08 -0.23 0.03

www.adaptcentre.ie
Task 2 Word-level QE
• Participating systems will be required to produce for
each token a label in one of the following settings:
– Binary classification: a good/bad label,
where bad indicates the need for editing the token.
– Multi-class classification: a label specifying the edit action
needed for the token (keep as is, delete, or substitute).

www.adaptcentre.ie
• Contributions:
– To consider the context information, we developed
augmented and optimized features for the CRF and
NB in the word level QE.
– We achieve the highest precision score using NB, the
highest recall and F1 scores using CRF in the binary
judgment of word level QE among all the systems.

www.adaptcentre.ie
Unigram, from antecedent 4th to
subsequent 3rd token
Bigram, from antecedent 2nd to
subsequent 2nd token
Jump bigram, antecedent and
subsequent token
Trigram, from antecedent 2nd to
subsequent 2nd token
Developed features for NB and CRF algorithms

www.adaptcentre.ie
• Training data:
– contains EN-ES source and target documents,
each with 803 sentences; binary and multi-classes
annotation document.
• Testing data:
– contains EN-ES source and target documents,
each with 288 sentences.

www.adaptcentre.ie
Results:
Binary Multiclass
Methods Pre Recall F1 Acc
CNGL-dMEMM 0.7392 0.9261 0.8222 0.7162
CNGL-MEMM 0.7554 0.8581 0.8035 0.7116
LIG-All N/A N/A N/A 0.7192
LIG-FS 0.7885 0.8644 0.8247 0.7207
LIG-BOOSTING 0.7779 0.8843 0.8276 N/A
NB 0.8181 0.4937 0.6158 0.5174
CRF 0.7169 0.9846 0.8297 0.7114

www.adaptcentre.ie
• Brief summary:
• In the sentence-level QE task (Task 1.1), we develop an enhanced
version of BLEU metric (eBLEU), and this shows a potential usage
for the traditional evaluation criteria.
• In the newly proposed system selection task (Task 1.2) and word-
level QE task (Task 2), we explore the performances of several
statistical models including NB, SVM, and CRF, of which the CRF
performs best, the NB performs lower than SVM but much faster
than SVM.
• The official results show that the NB model yields the highest
Precision score 0.8181, the CRF model yields the highest Recall
score 0.9846 and the highest F1-score 0.8297 in binary classification
judgment of word-level QE (task 2).
• Codes: https://github.com/poethan/QE
WMT13: Quality Estimation for Machine Translation Using the Joint Method of Evaluation Criteria and Statistical
Modeling

www.adaptcentre.ieA bit of my NLP - III - MWE
MWE (Multi-Word Expression) Detection task:
Task intro: Verbal MWE (VMWE)
Proposed models
Performances
Thanks for Dr. Alfredo Maldonado for the slides of MWE section

www.adaptcentre.ieVMWE - Shared Task

www.adaptcentre.ieShort summary

www.adaptcentre.ieMWE note: Semantic Reranking - Erwan Moreau
It's important to distinguish the two main components, A and B below:
A) the unsupervised semantic similarity part, which uses Europarl to calculate "semantic features" for a sentence
with expressions tagged. The goal is that these features help predict whether the tagged expressions are correct
or not (note that a sentence may contain 0, 1 or several expressions). More precisely, the idea is to compute
features which represent whether a candidate expression is a real MWE, by comparing frequency and semantic
similarity between its individual words and the full expression. It works like this:
1) extracts all the sentences with expressions labeled from the CRF output.
2) For every expression, we build pseudo-expressions for each individual word in the expression as well as for
each case of "the expression minus one word". Then for every pseudo-expression and for the full expression we
compute the context vector based on Europarl, i.e. the count of every word which co-occurs with the target
expression (or word) within a fixed-size window. In the features we use the frequencies for each of these pseudo-
expressions, as well as the semantic similarity score between each pseudo-expression and the full expression.
Originally the goal was to measure compositionality (whether the meanings of the words are combined together in
the expression); but these features probably also capture how often the words appear together, which is an
indication of a real expression. There is an additional set of features which consist of comparing the current
expression to the other 9 candidate expressions.
3) Since we need a fixed number of features for every instance = sentence (for the supervised learning part), we
must "summarize": if an expression has N words, the N values are "summarized" with the min, mean and max.
Same thing for the M expressions in the sentence. In training mode we also add the probability found by the CRF
as a feature.
Thanks to Dr. Erwan Moreau for detail: erwan.moreau@adaptcentre.ie

www.adaptcentre.ie
B) the supervised regression part (we used Weka decision trees regression, but other models would certainly
work as well), which is fed with the features calculated using the above and predicts a single score in [0,1]
which represents "how correct" the labelling of the expressions is for a sentence: here an instance is a sentence
with its expressions labeled, and since for every sentence the CRF part gives us the top 10 labelling we use
each of these 10 as one instance. In training mode, we assign score 1 to the gold labelling (if found among the
CRF candidates) and 0 to other (wrong) labeling (the goal being to make the system assign low scores to wrong
answers and high scores to good answers). In testing mode, we obtain the predicted scores and for every
sentence we take the labelings which obtained the highest in the group of 10 candidates; most of the time the
first from CRF is also the highest score, but sometimes the labelling we select was ranked after the first -> that's
when the proper re-ranking happens.

www.adaptcentre.ieA bit of my NLP - DL4MT
DL4MT: PhD topic (so far)
Literature reviews
Some classifications
Ongoing work
Thanks for the NMT slides of Philipp Koehn <Neural MT web seminar.
2017.Jan.> for DL4MT section

www.adaptcentre.ieHow MT began and developed
The original idea is from ‘ the Tower of Babel’ (<Genesis>创世纪)
- 11:5 LORD came down to see the city and the tower the people were
building.
- 11:6 The LORD said, "If as one people speaking the same language
they have begun to do this, then nothing they plan to do will be
impossible for them.
The second idea is from René Descartes (1629)
- a universal language,
- with equivalent ideas in different tongues sharing one symbol.[3]
- Philosophical statement: ‘I think, therefore I am’
The third idea is from Warren Weaver "machine translation“
- appeared in <Memorandum on Translation> 1949

www.adaptcentre.ie
Note: Chis Manning used the same Babel pic later in Stanford NLP lecture :-)

www.adaptcentre.ie
How MT began and developed - developed
MT Models:
Rule-based MT (RBMT)
Statistical MT (SMT)
Example-based MT (EBMT)
Hybrid MT (HMT)
Neural MT (NMT)

www.adaptcentre.ie
How MT began and developed - SMT
SMT derivations:
- Word-based
- Phrase-based
- Hierachical phrase-based
- Syntax-based (constituency structure vs dependency structure)
- Semantic integration
Problems of syntax-based model:
- Long distance dependency is still problem
- no linguistic restrictions imposed on the variables.
- when the translated piece of text is longer than a shreshold, models
can not use syntax-based rules, instead using so-called ‘glue rules’

www.adaptcentre.ie
How MT began and developed - NMT
Neural MT:
A deep learning based approach to MT
- Radical departure from phrase-based statistical translation
approaches, in which a translation system consists of subcomponents
that are separately engineered
- all parts of the neural translation model are trained jointly (end-to-end)
to maximize the translation performance
- ‘NMT is the approach of modeling the entire MT process via one big
Artificial NN’ —Chris Manning

www.adaptcentre.ie
Began from ‘word-to-vector’, by NN
Word embedding
Neural Language model
Encoder-Decoder model
Later:
- Attention mechanism, deriving from alignment thought in SMT.
- Coverage modelling: pay attention to different important parts.
- Document level NMT:

www.adaptcentre.ie
Benefits of NMT:
Each output predicted from
- encoding of the full input sentence
- all previously produced output words (theoretically)
Word embeddings allow generalisation
- ‘cat’ and ‘cats’ can have similar representations
- similar goes to ‘home’ and ‘house’
- better fluency
- better handling sentence-level context

www.adaptcentre.ie
Disadvantages:
- limited vocabulary, allows limited vocabulary size
- no explicit modeling of coverage / bad with rare words
- development challenges / speed / hardware / process not transparent
- traditional SMT allows customization / using own terminology/
customers domain / rules for dates, units / markup tags handling etc. but
NMT not.

www.adaptcentre.iedeveloped topics
Some sub-topics:
Attentional NMT
Multimodal NMT (image+text)
Multilingual NMT (Google)
Character-level NMT (Edinburgh, Cho)
Linguistics aspect in NMT
- add syntax, source side: Li et al. (2017); target side: Aharoni et
al. (2017);
Adversarial NMT: Yang et al. (2017), Wu et al. (2017)
Tools: a lot

www.adaptcentre.ie
Ongoing work: Chinese NMT / Multimodal NLP with some colleagues
DCU / TCD / UvA Amsterdam/ 新译信息 (www.newtranx.com) Co.
China
Tools-using:
GAN / Nematus / OpenNMT

www.adaptcentre.ieA bit of my NLP - IV - other
Other works:
- Did a bit more sequence labelling tasks like
CWS (Chinese word segmentation)
CNER (Chinese Named Entity Recognition)
github.com/poethan/SeqLabel
- Developed some universal phrase structure tagset for constituency
treebanks (uni-phrase)
aligned tags from 25 treebanks covering 21 languages
tested on the treebanks (Fr/Pt/En/De/Zh)
github.com/poethan/UniPhrase
- Did some paraphrase modelling
together with Prof. Khalil Sima’an (UvA)
based on CCB (Chris Callison-Burch) statistical paraphrase work /
statistical
experiments to be done

www.adaptcentre.ieBored? :-)
A bit more of me?

www.adaptcentre.ie
Honors / Awards
# National second prize in China National Postgraduate Maths
Modeling Contest, 2011.
# First prize in Hebei Province, twice, China National
Undergraduate Maths Modeling Contest, 2009-2010
# Baidu Fellowship, Final run candidate, presentation, 2014
Services / Volunteer:
# Co-organiser - ADAPT ML Research Group
[adapt_mlrg@googlegroups.com] [https://groups.google.com/
forum/#!forum/adapt_mlrg]
# Reviewer - IEEE/ACM Trans. on ASLP

www.adaptcentre.ie
Hobbies:
- Sports: swimming / cycling / volleyball / badminton / table tennis /
football / basketball / hiking / trampoline /
- Arts: paintings / musics / draw cartoon
- Literature: novel / poem [poethan.wordpress.com/]
- Cook: pages [poethan.wordpress.com/about/]
- Photography
Notes:
- Welcome to take me for group sports @ Suda
- Welcome to visit us/me @ DCU, Dublin

www.adaptcentre.ie
I live close, Welcome to come for food :D

www.adaptcentre.ie
Your flights :D

www.adaptcentre.ieReferences & Thanks
Qun Liu. Dependence-based SMT talk. ILLC, UvA. 2014.Nov.
Philipp Koehn. Neural MT web seminar. Omniscientech. 2017.Jan.25th.
Alfredo Maldonado. Presentation <Detection of Verbal Multi-Word Expressions via Conditional Random
Fields with Syntactic Dependency Features and Semantic Re-Ranking>. Dublin CL seminar. 2017.
Lifeng Han. ‘Neural Machine Translation: Are we building 'The Tower of Babel‘ again?’ Talk. DCU,
Dublin. 2017.01.25
th
.
LEPOR: An Augmented Machine Translation Evaluation Metric
Lifeng Han (Aaron) Thesis for Master of Science in Software Engineering. University of Macau
Download [thesis] [ ppt-slides ] [bibtex]
Quality Estimation for Machine Translation Using the Joint Method of Evaluation Criteria and Statistical Modeling [bibTex]
Authors: Aaron Li-Feng Han, Yi Lu, Derek F. Wong, Lidia S. Chao, Liangye He, Anson JunWen Xing
Proceedings of the ACL 2013 EIGHTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION (ACL-WMT 2013).
LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors [bibTex]
Authors: Aaron Li-Feng Han, Derek F. Wong and Lidia S. Chao
Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pages 441–450, [poster] [ppt-slides] http://
www.aclweb.org/anthology/C12-2044
https://en.wikipedia.org/wiki/LEPOR
https://en.wikipedia.org/wiki/Machine_translation

www.adaptcentre.ie
Maldonado, A., Han, L., Moreau, E., Alsulaimani, A., Chowdhury, K. D., Vogel, C., & Liu, Q. (2017).
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with Syntactic
Dependency Features and Semantic Re-Ranking. In Proceedings of The 13th Workshop on Multiword
Expressions. Valencia.
Hall, M. et al. (2009). The WEKA data mining software: an update. ACM SIGKDD Explorations, 11(1):
10–18.
Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In Proceedings of the
10th Machine Translation Summit, pages 79–86, Phuket.
Laferty, J., McCallum, A., & Pereira, F. C. N. (2001). Conditional random elds: Probabilistic models for
segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference
on Machine Learning. pp. 282– 289.
Maldonado, A., & Emms, M. (2011). Measuring the compositionality of collocations via word co-
occurrence vectors: Shared task system description. In Proceedings of the Distributional Semantics
and Compositionality workshop (DISCo 2011). Portland, OR.
Quinlan, J.R. (1992). Learning with continuous classes. In Proceedings of the 5th Australian Joint
Conference on Arti cial Intelligence, pages 343–348.
Sag, I. A. et al. (2002). Multiword Expressions: A Pain in the Neck for NLP. Third International
Conference on Computational Linguistics and Intelligent Text Processing (Lecture Notes in Computer
Science), 2276, 1–15.
Svary, A. et al. (2017). The PARSEME Shared Task on Automatic Identi cation of Verbal Multiword
Expressions. In Proceedings of The 13th Workshop on Multiword Expressions. Valencia.
Singleton, D. (2000). Language and the Lexicon: An Introduction. London: Arnold.

www.adaptcentre.ie
Yang et al. 2017. Improving Neural Machine Translation with Conditional Sequence Generative
Adversarial Nets.
Wu et al. 2017. Adversarial Neural Machine Translation.
Isabelle et al. 2017. A Challenge Set Approach to Evaluating Machine Translation.
Aharoni et al. 2017. Towards String-to-Tree Neural Machine Translation
Li et al. 2017. Modeling Source Syntax for Neural Machine Translation.
2016.11 Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot
Translation
2016.11 Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder
Hill2017. The representational geometry of word meanings acquired by neural machine translation
models.
Suggested:
https://github.com/harvardnlp/seq2seq-attn
http://opennmt.net
Chris Manning. Stanford Lecture 10: Neural Machine Translation and Models with Attention
https://youtu.be/IxQtK2SjWWM

www.adaptcentre.ie
Q & A
LIFENG.HAN@adaptcentre.ie
ADAPT Center, DCU
github.com/poethan/slides

ADAPT and a bit of my NLP work

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à ADAPT and a bit of my NLP work

Similaire à ADAPT and a bit of my NLP work (20)

Plus de Lifeng (Aaron) Han

Plus de Lifeng (Aaron) Han (20)

Dernier

Dernier (20)

ADAPT and a bit of my NLP work