18. Regular expression
• [a-z]+
• Colours of cats and dogs.
• [^o]{2}
• Colours of cats and dogs.
• cat|dog
• Colours of cats and dogs.
• Colou?rs?
• Colours of cats and dogs.
• Colors of cats and dogs.
• Color of a cat.
• <[A-Za-z][A-Za-z]*>
• <html>Colours of cats and dogs.</html>
18
19. Edit Distance
• Colors
• Delete s
• Color
• Insert u
• Colour
• Replace C with c
• colour
• Distance from Colors to colour: 3
(or 4 if the cost of replacing is 2)
19
20. – One may ask
“What if I wanted to map 1,1, one, and ONE?”
20
21. Normalization
• time flies like an arrow. fruit flies like bananas.
• Case restoration
• Time flies like an arrow. Fruit flies like bananas.
• Sentence segmentation
• time flies like an arrow.
• fruit flies like bananas.
• Word normalization: stemming or lemmatization?
21
26. Confidence Score
• Confidence interval? Confidence level?
• Not really
• But it can be
• Just a buzz word from speech recognition
• Shannon’s game
• Hidden-Markov models
• Generative
• The Italian who went to Malta
• Can be any reasonable score
• Mostly probability
26
27. Calculate Sentence Similarity
Confident
Trusted
Doubted
[partial match]
[exact match]
[no match]
a / b < threshold, since b is higher
when
a = prob. of (
#2(w1 w2 w3 w4)
#1(w1 w2 w3) #1(w2 w3 w4)
#1(w1 w2) #1(w2 w3) #1(w3 w4)
#2(w1 w3) #2(w2 w4)
#3(w1 w4));
b = avg. prob. of all known exact matches;
where #n: any other (n - 1) words in-between.
Sentence:“w1 w2 w3 w4.”
27
37. There are two kinds of…
PAIN. The sort of pain that
makes you strong, or useless
pain. The sort of pain that's only
suffering. I have no patience for
useless things.
37
38. What might make me
stronger……
(See also http://www.no-free-lunch.org)
38
47. Transliteration
• Alignment
• Alignment
• Alignment
• (And better be more
than bilingual)
47
(1)
er of
n the
and
ence
also
s or
of
to-one-alignments possible. Furthermore,
combine to produce a single phoneme (d
single letter can sometimes produce tw
phonemes). For example, the English wo
Chinese transliteration “ ”, which
“phonemes”, is aligned as [15]:
A BE RT
| | |
53. Reinforcement
• Explore vs. Exploit
• Interactive
• Online
• Free Lunches
• Second moments and higher of
algorithms' generalisation error
• Coevolution
• Confidence intervals can give a
priori distinctions between
algorithms
• People respond to incentives
53
54. Translate X for Y
• {restaurant AD, coupon}
• {game, credit}
• {subtitle, DRM-free video}
• {Heart Sūtra, inner peace}
• {inside news, outside support}
• Taiwanese protesters
• {anything, incentives}
• See also: Unbabel, Duolingo
54
55. New Types of Assistance
for Translators
by Philipp Koehn
(http://www.mastar.jp/wfdtr/shiryou2013/Philipp%20Koehn.pdf
via http://www.mastar.jp/wfdtr/index-e.html)
55
57. Wrap up
• Where’s my pony semantics?
• Adaptation
• Chinese restaurant process
• Indian buffet process
• 信 (adequate)、達 (fluent)
• 雅 (elegant)?貼 (pertinent)?
• Bilingual might be insufficient: 全⽇日空 → ANA
• Pony: you can’t always get what you want
• Extrinsic evaluation
• Embrace and enjoy changes
57