SlideShare a Scribd company logo
1 of 137
Download to read offline
Extraction of domain-specific bilingual lexicon
from comparable corpora
compositional translation and ranking

Estelle Delpech1 , B´atrice Daille1 , Emmanuel Morin1 , Claire
e
Lemaire2,3
1 LINA,

2 GREMUTS, Universit´ de Grenoble
Universit´ de Nantes
e
e
3 Lingua et Machina

COLING’12

10/12/12

Mumbai, India
Outline

1

Context

2

Translation method

3

Ranking method

4

Results of experiments

5

Future work
Outline

1

Context

2

Translation method

3

Ranking method

4

Results of experiments

5

Future work
Context
Translation method
Ranking method
Results of experiments
Future work

Context : comparable corpora for Computer-Aided
Translation

1 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Context : comparable corpora for Computer-Aided
Translation

Aim : provide domain-specific bilingual lexicons to translators
when no parallel data is available

1 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Context : comparable corpora for Computer-Aided
Translation

Aim : provide domain-specific bilingual lexicons to translators
when no parallel data is available
⇒ Comparable corpora :
Set of texts in languages L1 and L2, which are not
translations, but which deal with the same subject matter, so
that there is still a possibility to extract translation pairs

1 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Motivations for compositional translation

2 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Motivations for compositional translation

Usual context-based methods [Fung, 1997]:

2 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Motivations for compositional translation

Usual context-based methods [Fung, 1997]:
51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]

2 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Motivations for compositional translation

Usual context-based methods [Fung, 1997]:
51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]

2 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Motivations for compositional translation

Usual context-based methods [Fung, 1997]:
51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]

Compositional translation :

2 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Motivations for compositional translation

Usual context-based methods [Fung, 1997]:
51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]

Compositional translation :
81% to 94% precision on Top1
[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]

2 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Motivations for compositional translation

Usual context-based methods [Fung, 1997]:
51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]

Compositional translation :
81% to 94% precision on Top1
[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
More than 60% of terms in technical and scientific domains are
morphologically complex [Namer and Baud, 2007]

2 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Motivations for compositional translation

Usual context-based methods [Fung, 1997]:
51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]

Compositional translation :
81% to 94% precision on Top1
[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
More than 60% of terms in technical and scientific domains are
morphologically complex [Namer and Baud, 2007]
Outperforms context-based approaches for the translation of
terms with compositional meaning [Morin and Daille, 2009]

2 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]

3 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”

3 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}

3 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}
Translate {α, β}

3 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}
Translate {α, β}
Reorder {αβ, βα}

3 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose
Translate
Reorder
Select

{a, b}
{α, β}
{αβ, βα}
αβ

3 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose
Translate
Reorder
Select

{a, b}
{α, β}
{αβ, βα}
αβ

Output : ”αβ”

3 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Related work

4 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Related work

Applied to phrases, decomposed into words
[Robitaille et al., 2006, Morin and Daille, 2009]
rate of evaporation → taux d’´vaporation
e

4 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Related work

Applied to phrases, decomposed into words
[Robitaille et al., 2006, Morin and Daille, 2009]
rate of evaporation → taux d’´vaporation
e

Applied to words, decomposed into morphemes
[Cartoni, 2009, Harastani et al., 2012]
cardiology → cardiologie
ricostruire → rebuild

4 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Related work

Applied to phrases, decomposed into words
[Robitaille et al., 2006, Morin and Daille, 2009]
rate of evaporation → taux d’´vaporation
e

Applied to words, decomposed into morphemes
[Cartoni, 2009, Harastani et al., 2012]
cardiology → cardiologie
ricostruire → rebuild

⇒ No approach links bound morphemes to words :
-cyto- → cellule ’cell’
cytotoxic → toxique pour les cellules ’toxic to the cells’

4 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Selection and ranking methods

5 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Selection and ranking methods

Select translations that occur in target texts / Web
[Morin and Daille, 2009]

5 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Selection and ranking methods

Select translations that occur in target texts / Web
[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]

5 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Selection and ranking methods

Select translations that occur in target texts / Web
[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]

5 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Selection and ranking methods

Select translations that occur in target texts / Web
[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]

5 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Selection and ranking methods

Select translations that occur in target texts / Web
[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion

5 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Selection and ranking methods

Select translations that occur in target texts / Web
[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)

5 / 31
Outline

1

Context

2

Translation method

3

Ranking method

4

Results of experiments

5

Future work
Context
Translation method
Ranking method
Results of experiments
Future work

Translation process overview

7 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation process overview
Input : ”non-cytotoxic”

7 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}

7 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}
Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}

7 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}
Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}

7 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}
Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}
Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicit´}, {non, cyto, toxicit´}
e
e

7 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}
Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}
Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicit´}, {non, cyto, toxicit´}
e
e

7 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}
Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}
Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicit´}, {non, cyto, toxicit´}
e
e
Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}

7 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}
Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}
Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicit´}, {non, cyto, toxicit´}
e
e
Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}

7 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}
Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}
Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicit´}, {non, cyto, toxicit´}
e
e
Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}

7 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}
Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}
Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicit´}, {non, cyto, toxicit´}
e
e
Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}

7 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}
Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}
Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicit´}, {non, cyto, toxicit´}
e
e
Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}

7 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation process overview
Input : ”non-cytotoxic”
Decompose {non, cyto, toxic}
Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non,
cytotoxic} , {noncytotoxic}
Translate {non, cellule, toxique}, {non, cyto, toxique},
{non, cellule, toxicit´}, {non, cyto, toxicit´}
e
e
Reorder {non, toxique, cellule}, {non, cellule, toxique},
{cellule, toxique, non}
Concatenate {non, toxique, cellule}, {nontoxique, cellule},
{non, toxiquecellule}, {nontoxiquecellule}
Match {non, toxique, cellule}
Output : ”non toxique pour les cellules” ’non toxic to the
cells’
7 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Decomposition

non-cytotoxic → {non, cyto, toxic}

8 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Decomposition

non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristic
rules:

8 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Decomposition

non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristic
rules:
split on hyphens

8 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Decomposition

non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristic
rules:
split on hyphens
match substrings of the source term with:
a list of morphemes
a list of lexical items

8 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Decomposition

non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristic
rules:
split on hyphens
match substrings of the source term with:
a list of morphemes
a list of lexical items

respect some length constraints on the substrings

8 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Concatenation

9 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Concatenation

Generate all possible concatenations of the minimal
components

9 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Concatenation

Generate all possible concatenations of the minimal
components
Increases the chances of matching the components with
entries of the dictionaries
{ non, cyto, toxic} → {non, cyto, ∅ }
{non, cytotoxic} → {non, cytotoxique }

9 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation with direct dictionary look-up

10 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation with direct dictionary look-up

Bilingual dictionary for lexical items:
toxic → toxique

10 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation with direct dictionary look-up

Bilingual dictionary for lexical items:
toxic → toxique

Morpheme translation table for bound morphemes:
allow bound to free morpheme translation equivalence
-cyto- → -cyto-, cellule

10 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation with direct dictionary look-up

Bilingual dictionary for lexical items:
toxic → toxique

Morpheme translation table for bound morphemes:
allow bound to free morpheme translation equivalence
-cyto- → -cyto-, cellule
{-cyto-, toxic} → {-cyto-, toxique},
{cellule, toxique}

10 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation with variation

11 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation with variation

Morphological lexicon
toxic → toxique → toxicit´ ’toxicity’
e

11 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation with variation

Morphological lexicon
toxic → toxique → toxicit´ ’toxicity’
e

Synonyms
toxic → toxique → v´n´neux ’poisonous’
e e

11 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Translation with variation

Morphological lexicon
toxic → toxique → toxicit´ ’toxicity’
e

Synonyms
toxic → toxique → v´n´neux ’poisonous’
e e

{-cyto-, toxic} → {-cyto-, toxicit´},
e
{-cyto-, v´n´neux}, {cellule, toxicit´},
e e
e
{cellule, v´n´neux}
e e

11 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Reordering

12 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Reordering

No translation patterns or reordering rules

12 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Reordering

No translation patterns or reordering rules
Permutate the translated components :
{cellule, toxique} → {cellule, toxique},
{toxique, cellule}

12 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Concatenation

13 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Concatenation

Recreate target words by generating all possible
concatenations of the components :
{toxique, cellule} →
{toxique cellule},
{toxiquecellule}

13 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Selection

14 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Selection

Match target words with the words of the target corpus

14 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Selection

Match target words with the words of the target corpus
Allow at maximum 3 stop words between two words

14 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Selection

Match target words with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’

14 / 31
Outline

1

Context

2

Translation method

3

Ranking method

4

Results of experiments

5

Future work
Context
Translation method
Ranking method
Results of experiments
Future work

Target term frequency

16 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Target term frequency

Number of occurrences of target term divided by the total
number of occurrences in the target texts
Freq(t) =

occ(t)
N

16 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Context similarity measure

17 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Context similarity measure

Corresponds to context-based approaches

17 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Context similarity measure

Corresponds to context-based approaches
Collect words coocurring with source and target term in a
window of 5 words

17 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Context similarity measure

Corresponds to context-based approaches
Collect words coocurring with source and target term in a
window of 5 words
Normalize cooccurrences with log-likelihood ratio

17 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Context similarity measure

Corresponds to context-based approaches
Collect words coocurring with source and target term in a
window of 5 words
Normalize cooccurrences with log-likelihood ratio
Compare contexts with weighted jaccard
Cont(s, t) =

min(c(s, w ), c(t, w ))
max(c(s, w ), c(t, w ))
w ∈s∪t
w ∈s∩t

17 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Part-of-speech translation probability

18 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Part-of-speech translation probability

Probability that source term with part-of-speech A translates
to target term with part of speech B
Pos(s, t)

= P(pos(t)|pos(s))
= P(B|A)

18 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Part-of-speech translation probability

Probability that source term with part-of-speech A translates
to target term with part of speech B
Pos(s, t)

= P(pos(t)|pos(s))
= P(B|A)

Acquired from pos-tagged parallel corpora [Tiedemann, 2009]
with word alignment software AnyMalign [Lardrilleux, 2008]

18 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Resources reliability score

19 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Resources reliability score

Some translation resources might give more reliable
translations than others
ex : bilingual dictionary > synonyms

19 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Resources reliability score

Some translation resources might give more reliable
translations than others
ex : bilingual dictionary > synonyms
score = mean of the reliability of the resources used for
translating the components
Reso(t = {c1 , ...cn }) =

n
i=1

resource reliability (ci )
n

19 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Resources reliability score

Some translation resources might give more reliable
translations than others
ex : bilingual dictionary > synonyms
score = mean of the reliability of the resources used for
translating the components
Reso(t = {c1 , ...cn }) =

n
i=1

resource reliability (ci )
n

Tuned on training data

19 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Combination

20 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Combination

Linear combination of the 4 criterion Frequency, Context,
Part-of-speech translation probability and Resources reliabilily
Combi(t, s) = Freq(s) + Cont(s, t) + Pos(s, t) + Reso(t)

20 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Machine learning

1

http://people.cs.umass.edu/ vdang/ranklib.html
21 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Machine learning

Learning-to-rank algorithms used in IR for ranking documents

1

http://people.cs.umass.edu/ vdang/ranklib.html
21 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Machine learning

Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1

1

http://people.cs.umass.edu/ vdang/ranklib.html
21 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Machine learning

Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
AdaRank [Li and Xu, 2007]

1

http://people.cs.umass.edu/ vdang/ranklib.html
21 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Machine learning

Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
AdaRank [Li and Xu, 2007]
Coordinate Ascend [Metzler and Croft, 2000]

1

http://people.cs.umass.edu/ vdang/ranklib.html
21 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Machine learning

Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
AdaRank [Li and Xu, 2007]
Coordinate Ascend [Metzler and Croft, 2000]
LambdaMart [Wu et al., 2010]

1

http://people.cs.umass.edu/ vdang/ranklib.html
21 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Machine learning

Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
AdaRank [Li and Xu, 2007]
Coordinate Ascend [Metzler and Croft, 2000]
LambdaMart [Wu et al., 2010]

Features: Freq, Cont, Pos, Reso

1

http://people.cs.umass.edu/ vdang/ranklib.html
21 / 31
Outline

1

Context

2

Translation method

3

Ranking method

4

Results of experiments

5

Future work
Context
Translation method
Ranking method
Results of experiments
Future work

Corpora

23 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Corpora

English → French, German

23 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Corpora

English → French, German
breast cancer

23 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Corpora

English → French, German
breast cancer
≈ 400k words per language

23 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Lexicons

24 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Lexicons

Morpheme translation table (hand-crafted)

24 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Lexicons

Morpheme translation table (hand-crafted)
General language dictionary (Xelda)

24 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Lexicons

Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)

24 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Lexicons

Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)
Domain-specific dictionary : cognates extracted from corpus
[Hauer and Kondrak, 2011]

24 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Lexicons

Morpheme translation table (hand-crafted)
General language dictionary (Xelda)
Synonyms (Xelda)
Domain-specific dictionary : cognates extracted from corpus
[Hauer and Kondrak, 2011]
Morphological families [Porter, 1980]

24 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Training and evaluation datasets

25 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Training and evaluation datasets
EVALUATION ≈ 100 source terms

25 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus with
translation(s) in target texts

25 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus with
translation(s) in target texts
TRAINING ≈ 600 source terms

25 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus with
translation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could be
generated and whose translation(s) is in the
target texts

25 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus with
translation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could be
generated and whose translation(s) is in the
target texts
generated translations were scored manually

25 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus with
translation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could be
generated and whose translation(s) is in the
target texts
generated translations were scored manually
⇒ evaluation and training datasets are disjoint

25 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus with
translation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could be
generated and whose translation(s) is in the
target texts
generated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with no
translation in dictionary
25 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Results for translation generation

# source terms
# at least 1 translation

EN → FR
126
86 (68%)

EN → DE
90
56 (62%)

# at least 1 translation
1 trans. in UMLS
1 trans. in UMLS or judged correct

86
68 (79%)
81 (94%)

56
40 (71%)
51 (91%)

26 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Results for translation ranking

Random
Freq
Cont
Pos
Reso
Combination
ML AdaRank
ML CoordAsc
ML LambdaMart

EN → FR
.83
.92
.90
.88
.92
.93
.90
.93
.86

EN → DE
.80
.84
.82
.91
.82
.89
.84
.89
.88

Average
.815
.88
.86
.895
.87
.91
.87
.91
.87

Table: Top1 translation in UMLS or judged correct
27 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Silence analysis

28 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Silence analysis

Missing translation in resources (≈30%)

28 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Silence analysis

Missing translation in resources (≈30%)
Target term is not compositional (≈30%)
breastfeeding → allaitement (FR), stillen (DE)

28 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Silence analysis

Missing translation in resources (≈30%)
Target term is not compositional (≈30%)
breastfeeding → allaitement (FR), stillen (DE)

Lexical divergence (≈20%)
radiosensitivity → Strahlentoleranz, sensitivity = toleranz

28 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Silence analysis

Missing translation in resources (≈30%)
Target term is not compositional (≈30%)
breastfeeding → allaitement (FR), stillen (DE)

Lexical divergence (≈20%)
radiosensitivity → Strahlentoleranz, sensitivity = toleranz

Additional elements (≈13%)
postpartum→ postpartalperiod

28 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Error analysis

29 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Error analysis

Problems in word reordering
self-examination → untersuchung selbst ’examination self’

29 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Error analysis

Problems in word reordering
self-examination → untersuchung selbst ’examination self’

Wrong or innapropriate translations
in-patient → pas malade ’not ill’
in → “inside” → inside patient
in → “inverse” → not a patient

29 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Impact of fertile translations

exact translations
wrong translations

EN → FR
21%
50%

EN → DE
10%
80%

Table: % of fertile translations

30 / 31
Context
Translation method
Ranking method
Results of experiments
Future work

Impact of fertile translations

exact translations
wrong translations

EN → FR
21%
50%

EN → DE
10%
80%

Table: % of fertile translations

German germanic language: tendency to agglutination
oestrogen-independant → Ostrogen-unabh¨ngige
a
French romance language: creates phrases more easily
oestrogen-independant → ind´pendant des œstrog`nes
e
e

30 / 31
Outline

1

Context

2

Translation method

3

Ranking method

4

Results of experiments

5

Future work
Context
Translation method
Ranking method
Results of experiments
Future work

Future work

Improve quality of linguistic resources
morphological derivation rules instead of stemming
use of a thesaurus

Try translations patterns on top of permutations
Try learning morpheme translation equivalences from
cognates
bilingual dictionaries
out-of-domain parallel data

31 / 31
Thank you for your attention.

B
estelle.delpech@univ-nantes.fr
beatrice.daille@univ-nantes.fr
emmanuel.morin@univ-nantes.fr
cl@lingua-et-machina.com
ADDITIONAL SLIDES
Exact translations

Non fertiles:
pathophysiological → physiopathologique
overactive → uberaktiv
¨

Fertiles:
cardiotoxicity → toxicit´ cardiaque ’cardiac toxicity’
e
mastectomy → ablation der brust ’ablation of the breast’
Morphological variants

Non fertiles:
dosimetry → dosim´trique ’dosimetric’
e
radiosensitivity → strahlenempfindlich ’radiosensitive’

Fertiles:
milk-producing → production de lait ’production of milk’
selfexamination → selbst untersuchen ’self examine’
Inexact but semantically related

Non fertiles:
oncogene → oncog´n`se ’oncogenesis’
e e
breakthrough → durchbrechen ’break’

Fertiles:
chemoradiotherapy → chemotherapie oder strahlen
’chemotherapy or radiation’
treatable → pouvoir le traiter ’can treat it’
Wrong translations

Non fertiles:
immunoscore → immunomarquer ’immunostain’
check-in → unkontrollieren ’uncontrolled’

Fertiles:
bloodstream → fliessen mehr blut ’more blood flow’
risk-reducing → risque de r´duire ’risk of reducing’
e
References I
Baldwin, T. and Tanaka, T. (2004).
Translation by machine of complex nominals.
In Proceedings of the ACL 2004 Workshop on Multiword expressions: Integrating Processing, pages 24–31,
Barcelona, Spain.
Bo, L. and Gaussier, E. (2010).
Improving corpus comparability for bilingual lexicon extraction from comparable corpora.
In 23`me International Conference on Computational Linguistics, pages 23–27, Beijing, Chine.
e
Cartoni, B. (2009).
Lexical morphology in machine translation: A feasibility study.
In Proceedings of the 12th Conference of the European Chapter of the ACL, pages 130–138, Athens, Greece.
Daille, B. and Morin, E. (2005).
French-English terminology extraction from comparable corpora.
In Proceedings, 2nd International Joint Conference on Natural Language Processing, volume 3651 of
Lecture Notes in Computer Sciences, page 707–718, Jeju Island, Korea. Springer.
Delpech, E. (2011).
Evaluation of terminologies acquired from comparable corpora : an application perspective.
In Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011), volume 11
of NEALT Proceedings Series,, pages 66–73, Riga, Latvia. Pedersen B.S., Neˇpore G., Skadi¸ a I.
s
n
Fung, P. (1997).
Finding terminology translations from non-parallel corpora.
pages 192–202, Hong Kong.
Garera, N. and Yarowsky, D. (2008).
Translating compounds by learning component gloss translation via multiple languages.
In Proceedings of the 3rd International Joint Conference on Natural Language Processing, volume 1, pages
403–410, Hyderabad, India.
References II
Grefenstette, G. (1999).
The world wide web as a resource for example-based machine translation tasks.
ASLIB’99 Translating and the computer, 21.
Harastani, R., Daille, B., and Morin, E. (2012).
Neoclassical compound alignments from comparable corpora.
In Proceedings of the 13th International Conference on Computational Linguistics and Intelligent Text
Processing, volume 2, pages 72–82, New Delhi, India.
Hauer, B. and Kondrak, G. (2011).
Clustering semantically equivalent words into cognate sets in multilingual lists.
In Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 865–873,
Chiang Mai, Thailand.
Keenan, E. L. and Faltz, L. M. (1985).
Boolean semantics for natural language.
D. Reidel, Dordrecht, Holland.
Lardrilleux, A. (2008).
A truly multilingual, high coverage, accurate, yet simple, sub-sentential alignment method.
Li, H. and Xu, J. (2007).
Adarank: A boosing algorithm for information retrieval.
In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in
information retrieval, pages 391–398, Amsterdam, The Netherlands.
Metzler, D. and Croft, W. B. (2000).
Linear feature-based models for information retrieval.
Information Retrieval, 10(3):257–274.
References III
Morin, E. and Daille, B. (2009).
Compositionality and lexical alignment of multi-word terms.
In Language Resources and Evaluation (LRE), volume 44 of Multiword expression: hard going or plain
sailing, pages 79–95. P. Rayson, S. Piao, S. Sharoff, S. Evert, B. Villada Moir´n, springer netherlands
o
edition.
Morin, E. and Daille, B. (2010).
Compositionality and lexical alignment of multi-word terms.
In Rayson, P., Piao, S., Sharoff, S., Evert, S., and B., V. M., editors, Language Resources and Evaluation
(LRE), volume 44 of Multiword expression: hard going or plain sailing, pages 79–95. Springer Netherlands.
Namer, F. and Baud, R. (2007).
Defining and relating biomedical terms: Towards a cross-language morphosemantics-based system.
International Journal of Medical Informatics, 76(2-3):226–33.
Porter, M. F. (1980).
An algorithm for suffix stripping.
Program, 14(3):130–137.
Robitaille, X., Sasaki, X., Tonoike, M., Sato, S., and Utsuro, S. (2006).
Compiling French-Japanese terminologies from the web.
In Proceedings of the 11th Conference of the European Chapter of the Association for Computational
Linguistics, pages 225–232, Trento, Italy.
Tiedemann, J. (2009).
News from opus - a collection of multilingual parallel corpora with tools and interfaces.
Wu, Q., Burges, J. C., Svore, K., and Gao, J. (2010).
Adapting boosting for information retrieval measures.
Journal of Information Retrieval, 13(3):254–270.

More Related Content

Viewers also liked

Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Association for Computational Linguistics
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeEstelle Delpech
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Estelle Delpech
 
Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesEstelle Delpech
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Association for Computational Linguistics
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology miningEstelle Delpech
 
A cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconA cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconİrem Tümer
 
Bilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsBilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsAlberto Simões
 
Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Tobias Wunner
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationwebLyzard technology
 
Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Adrien Barbaresi
 
Macro economische analyse van brazilië
Macro economische analyse van braziliëMacro economische analyse van brazilië
Macro economische analyse van braziliëJan-Willem Lammens
 
Word Formation in English
Word Formation in EnglishWord Formation in English
Word Formation in Englishteflang
 

Viewers also liked (13)

Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
 
Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologies
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
 
A cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconA cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexicon
 
Bilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsBilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation Patterns
 
Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and Evaluation
 
Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...
 
Macro economische analyse van brazilië
Macro economische analyse van braziliëMacro economische analyse van brazilië
Macro economische analyse van brazilië
 
Word Formation in English
Word Formation in EnglishWord Formation in English
Word Formation in English
 

Similar to Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking

The Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationThe Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationGennadi Lembersky
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationIJECEIAES
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine TranslationRIILP
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine TranslationRIILP
 
Using translog to investigate self correctionsin translation
Using translog to investigate self  correctionsin translationUsing translog to investigate self  correctionsin translation
Using translog to investigate self correctionsin translationRusdi Noor Rosa
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Sheeyam Shellvacumar
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...baskaran_md
 
Selecting Proper Lexical Paraphrase for Children
Selecting Proper Lexical Paraphrase for ChildrenSelecting Proper Lexical Paraphrase for Children
Selecting Proper Lexical Paraphrase for ChildrenTomoyuki Kajiwara
 
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdfsimonp16
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
 
Screencast feedback
Screencast feedback Screencast feedback
Screencast feedback ktmtchl
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...Sunayana Gawde
 
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...Kotaro Hara
 
A new hybrid metric for verifying
A new hybrid metric for verifyingA new hybrid metric for verifying
A new hybrid metric for verifyingcsandit
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELSijnlc
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...RIILP
 

Similar to Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking (20)

The Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationThe Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine Translation
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query Translation
 
A Comparison of Unsuperviesed Bilingual Term Extraction Methods Using Phrase ...
A Comparison of Unsuperviesed Bilingual Term Extraction Methods Using Phrase ...A Comparison of Unsuperviesed Bilingual Term Extraction Methods Using Phrase ...
A Comparison of Unsuperviesed Bilingual Term Extraction Methods Using Phrase ...
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
Using translog to investigate self correctionsin translation
Using translog to investigate self  correctionsin translationUsing translog to investigate self  correctionsin translation
Using translog to investigate self correctionsin translation
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...
 
Selecting Proper Lexical Paraphrase for Children
Selecting Proper Lexical Paraphrase for ChildrenSelecting Proper Lexical Paraphrase for Children
Selecting Proper Lexical Paraphrase for Children
 
Selecting proper lexical paraphrase for children
Selecting proper lexical paraphrase for childrenSelecting proper lexical paraphrase for children
Selecting proper lexical paraphrase for children
 
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
 
Selecting Proper Lexical Paraphrase for Children
Selecting Proper Lexical Paraphrase for ChildrenSelecting Proper Lexical Paraphrase for Children
Selecting Proper Lexical Paraphrase for Children
 
Screencast feedback
Screencast feedback Screencast feedback
Screencast feedback
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...
 
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
 
A new hybrid metric for verifying
A new hybrid metric for verifyingA new hybrid metric for verifying
A new hybrid metric for verifying
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
 

More from Estelle Delpech

Génération automatique de texte
Génération automatique de texteGénération automatique de texte
Génération automatique de texteEstelle Delpech
 
Identification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxIdentification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxEstelle Delpech
 
Découverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesDécouverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesEstelle Delpech
 
Invited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardInvited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardEstelle Delpech
 
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Estelle Delpech
 
Identification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxIdentification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxEstelle Delpech
 
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Estelle Delpech
 
Nomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchNomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchEstelle Delpech
 
Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Estelle Delpech
 
Nomao: local search and recommendation engine
Nomao: local search and recommendation engineNomao: local search and recommendation engine
Nomao: local search and recommendation engineEstelle Delpech
 
Évaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeÉvaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeEstelle Delpech
 
Robust rule-based parsing
Robust rule-based parsingRobust rule-based parsing
Robust rule-based parsingEstelle Delpech
 
Experimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmExperimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmEstelle Delpech
 
Text Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringText Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringEstelle Delpech
 

More from Estelle Delpech (15)

Génération automatique de texte
Génération automatique de texteGénération automatique de texte
Génération automatique de texte
 
Identification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxIdentification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieux
 
Découverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesDécouverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des Langues
 
Invited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardInvited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis award
 
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
 
Identification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxIdentification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieux
 
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...
 
Nomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchNomao: data analysis for personalized local search
Nomao: data analysis for personalized local search
 
Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)
 
Nomao: local search and recommendation engine
Nomao: local search and recommendation engineNomao: local search and recommendation engine
Nomao: local search and recommendation engine
 
Évaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeÉvaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialisée
 
R&D Lingua et Machina
R&D Lingua et MachinaR&D Lingua et Machina
R&D Lingua et Machina
 
Robust rule-based parsing
Robust rule-based parsingRobust rule-based parsing
Robust rule-based parsing
 
Experimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmExperimenting the TextTiling Algorithm
Experimenting the TextTiling Algorithm
 
Text Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringText Processing for Procedural Question Answering
Text Processing for Procedural Question Answering
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking

  • 1. Extraction of domain-specific bilingual lexicon from comparable corpora compositional translation and ranking Estelle Delpech1 , B´atrice Daille1 , Emmanuel Morin1 , Claire e Lemaire2,3 1 LINA, 2 GREMUTS, Universit´ de Grenoble Universit´ de Nantes e e 3 Lingua et Machina COLING’12 10/12/12 Mumbai, India
  • 4. Context Translation method Ranking method Results of experiments Future work Context : comparable corpora for Computer-Aided Translation 1 / 31
  • 5. Context Translation method Ranking method Results of experiments Future work Context : comparable corpora for Computer-Aided Translation Aim : provide domain-specific bilingual lexicons to translators when no parallel data is available 1 / 31
  • 6. Context Translation method Ranking method Results of experiments Future work Context : comparable corpora for Computer-Aided Translation Aim : provide domain-specific bilingual lexicons to translators when no parallel data is available ⇒ Comparable corpora : Set of texts in languages L1 and L2, which are not translations, but which deal with the same subject matter, so that there is still a possibility to extract translation pairs 1 / 31
  • 7. Context Translation method Ranking method Results of experiments Future work Motivations for compositional translation 2 / 31
  • 8. Context Translation method Ranking method Results of experiments Future work Motivations for compositional translation Usual context-based methods [Fung, 1997]: 2 / 31
  • 9. Context Translation method Ranking method Results of experiments Future work Motivations for compositional translation Usual context-based methods [Fung, 1997]: 51% to 88% precision on top 20 candidates with specialized corpora [Daille and Morin, 2005] 2 / 31
  • 10. Context Translation method Ranking method Results of experiments Future work Motivations for compositional translation Usual context-based methods [Fung, 1997]: 51% to 88% precision on top 20 candidates with specialized corpora [Daille and Morin, 2005] ⇒ lexicons difficult to use for translators [Delpech, 2011] 2 / 31
  • 11. Context Translation method Ranking method Results of experiments Future work Motivations for compositional translation Usual context-based methods [Fung, 1997]: 51% to 88% precision on top 20 candidates with specialized corpora [Daille and Morin, 2005] ⇒ lexicons difficult to use for translators [Delpech, 2011] Compositional translation : 2 / 31
  • 12. Context Translation method Ranking method Results of experiments Future work Motivations for compositional translation Usual context-based methods [Fung, 1997]: 51% to 88% precision on top 20 candidates with specialized corpora [Daille and Morin, 2005] ⇒ lexicons difficult to use for translators [Delpech, 2011] Compositional translation : 81% to 94% precision on Top1 [Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009] 2 / 31
  • 13. Context Translation method Ranking method Results of experiments Future work Motivations for compositional translation Usual context-based methods [Fung, 1997]: 51% to 88% precision on top 20 candidates with specialized corpora [Daille and Morin, 2005] ⇒ lexicons difficult to use for translators [Delpech, 2011] Compositional translation : 81% to 94% precision on Top1 [Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009] More than 60% of terms in technical and scientific domains are morphologically complex [Namer and Baud, 2007] 2 / 31
  • 14. Context Translation method Ranking method Results of experiments Future work Motivations for compositional translation Usual context-based methods [Fung, 1997]: 51% to 88% precision on top 20 candidates with specialized corpora [Daille and Morin, 2005] ⇒ lexicons difficult to use for translators [Delpech, 2011] Compositional translation : 81% to 94% precision on Top1 [Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009] More than 60% of terms in technical and scientific domains are morphologically complex [Namer and Baud, 2007] Outperforms context-based approaches for the translation of terms with compositional meaning [Morin and Daille, 2009] 2 / 31
  • 15. Context Translation method Ranking method Results of experiments Future work Compositional translation Compositionality “the meaning of the whole is a function of the meaning of the parts” [Keenan and Faltz, 1985, 24-25] 3 / 31
  • 16. Context Translation method Ranking method Results of experiments Future work Compositional translation Compositionality “the meaning of the whole is a function of the meaning of the parts” [Keenan and Faltz, 1985, 24-25] Input : ”ab” 3 / 31
  • 17. Context Translation method Ranking method Results of experiments Future work Compositional translation Compositionality “the meaning of the whole is a function of the meaning of the parts” [Keenan and Faltz, 1985, 24-25] Input : ”ab” Decompose {a, b} 3 / 31
  • 18. Context Translation method Ranking method Results of experiments Future work Compositional translation Compositionality “the meaning of the whole is a function of the meaning of the parts” [Keenan and Faltz, 1985, 24-25] Input : ”ab” Decompose {a, b} Translate {α, β} 3 / 31
  • 19. Context Translation method Ranking method Results of experiments Future work Compositional translation Compositionality “the meaning of the whole is a function of the meaning of the parts” [Keenan and Faltz, 1985, 24-25] Input : ”ab” Decompose {a, b} Translate {α, β} Reorder {αβ, βα} 3 / 31
  • 20. Context Translation method Ranking method Results of experiments Future work Compositional translation Compositionality “the meaning of the whole is a function of the meaning of the parts” [Keenan and Faltz, 1985, 24-25] Input : ”ab” Decompose Translate Reorder Select {a, b} {α, β} {αβ, βα} αβ 3 / 31
  • 21. Context Translation method Ranking method Results of experiments Future work Compositional translation Compositionality “the meaning of the whole is a function of the meaning of the parts” [Keenan and Faltz, 1985, 24-25] Input : ”ab” Decompose Translate Reorder Select {a, b} {α, β} {αβ, βα} αβ Output : ”αβ” 3 / 31
  • 22. Context Translation method Ranking method Results of experiments Future work Related work 4 / 31
  • 23. Context Translation method Ranking method Results of experiments Future work Related work Applied to phrases, decomposed into words [Robitaille et al., 2006, Morin and Daille, 2009] rate of evaporation → taux d’´vaporation e 4 / 31
  • 24. Context Translation method Ranking method Results of experiments Future work Related work Applied to phrases, decomposed into words [Robitaille et al., 2006, Morin and Daille, 2009] rate of evaporation → taux d’´vaporation e Applied to words, decomposed into morphemes [Cartoni, 2009, Harastani et al., 2012] cardiology → cardiologie ricostruire → rebuild 4 / 31
  • 25. Context Translation method Ranking method Results of experiments Future work Related work Applied to phrases, decomposed into words [Robitaille et al., 2006, Morin and Daille, 2009] rate of evaporation → taux d’´vaporation e Applied to words, decomposed into morphemes [Cartoni, 2009, Harastani et al., 2012] cardiology → cardiologie ricostruire → rebuild ⇒ No approach links bound morphemes to words : -cyto- → cellule ’cell’ cytotoxic → toxique pour les cellules ’toxic to the cells’ 4 / 31
  • 26. Context Translation method Ranking method Results of experiments Future work Selection and ranking methods 5 / 31
  • 27. Context Translation method Ranking method Results of experiments Future work Selection and ranking methods Select translations that occur in target texts / Web [Morin and Daille, 2009] 5 / 31
  • 28. Context Translation method Ranking method Results of experiments Future work Selection and ranking methods Select translations that occur in target texts / Web [Morin and Daille, 2009] Select most frequent translation [Grefenstette, 1999] 5 / 31
  • 29. Context Translation method Ranking method Results of experiments Future work Selection and ranking methods Select translations that occur in target texts / Web [Morin and Daille, 2009] Select most frequent translation [Grefenstette, 1999] Compare contexts [Garera and Yarowsky, 2008] 5 / 31
  • 30. Context Translation method Ranking method Results of experiments Future work Selection and ranking methods Select translations that occur in target texts / Web [Morin and Daille, 2009] Select most frequent translation [Grefenstette, 1999] Compare contexts [Garera and Yarowsky, 2008] ML : Binary classifier [Baldwin and Tanaka, 2004] 5 / 31
  • 31. Context Translation method Ranking method Results of experiments Future work Selection and ranking methods Select translations that occur in target texts / Web [Morin and Daille, 2009] Select most frequent translation [Grefenstette, 1999] Compare contexts [Garera and Yarowsky, 2008] ML : Binary classifier [Baldwin and Tanaka, 2004] ⇒ Combination of criterion 5 / 31
  • 32. Context Translation method Ranking method Results of experiments Future work Selection and ranking methods Select translations that occur in target texts / Web [Morin and Daille, 2009] Select most frequent translation [Grefenstette, 1999] Compare contexts [Garera and Yarowsky, 2008] ML : Binary classifier [Baldwin and Tanaka, 2004] ⇒ Combination of criterion ⇒ ML : Learning-to-rank algorithms (IR) 5 / 31
  • 34. Context Translation method Ranking method Results of experiments Future work Translation process overview 7 / 31
  • 35. Context Translation method Ranking method Results of experiments Future work Translation process overview Input : ”non-cytotoxic” 7 / 31
  • 36. Context Translation method Ranking method Results of experiments Future work Translation process overview Input : ”non-cytotoxic” Decompose {non, cyto, toxic} 7 / 31
  • 37. Context Translation method Ranking method Results of experiments Future work Translation process overview Input : ”non-cytotoxic” Decompose {non, cyto, toxic} Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non, cytotoxic} , {noncytotoxic} 7 / 31
  • 38. Context Translation method Ranking method Results of experiments Future work Translation process overview Input : ”non-cytotoxic” Decompose {non, cyto, toxic} Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non, cytotoxic} , {noncytotoxic} 7 / 31
  • 39. Context Translation method Ranking method Results of experiments Future work Translation process overview Input : ”non-cytotoxic” Decompose {non, cyto, toxic} Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non, cytotoxic} , {noncytotoxic} Translate {non, cellule, toxique}, {non, cyto, toxique}, {non, cellule, toxicit´}, {non, cyto, toxicit´} e e 7 / 31
  • 40. Context Translation method Ranking method Results of experiments Future work Translation process overview Input : ”non-cytotoxic” Decompose {non, cyto, toxic} Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non, cytotoxic} , {noncytotoxic} Translate {non, cellule, toxique}, {non, cyto, toxique}, {non, cellule, toxicit´}, {non, cyto, toxicit´} e e 7 / 31
  • 41. Context Translation method Ranking method Results of experiments Future work Translation process overview Input : ”non-cytotoxic” Decompose {non, cyto, toxic} Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non, cytotoxic} , {noncytotoxic} Translate {non, cellule, toxique}, {non, cyto, toxique}, {non, cellule, toxicit´}, {non, cyto, toxicit´} e e Reorder {non, toxique, cellule}, {non, cellule, toxique}, {cellule, toxique, non} 7 / 31
  • 42. Context Translation method Ranking method Results of experiments Future work Translation process overview Input : ”non-cytotoxic” Decompose {non, cyto, toxic} Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non, cytotoxic} , {noncytotoxic} Translate {non, cellule, toxique}, {non, cyto, toxique}, {non, cellule, toxicit´}, {non, cyto, toxicit´} e e Reorder {non, toxique, cellule}, {non, cellule, toxique}, {cellule, toxique, non} 7 / 31
  • 43. Context Translation method Ranking method Results of experiments Future work Translation process overview Input : ”non-cytotoxic” Decompose {non, cyto, toxic} Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non, cytotoxic} , {noncytotoxic} Translate {non, cellule, toxique}, {non, cyto, toxique}, {non, cellule, toxicit´}, {non, cyto, toxicit´} e e Reorder {non, toxique, cellule}, {non, cellule, toxique}, {cellule, toxique, non} Concatenate {non, toxique, cellule}, {nontoxique, cellule}, {non, toxiquecellule}, {nontoxiquecellule} 7 / 31
  • 44. Context Translation method Ranking method Results of experiments Future work Translation process overview Input : ”non-cytotoxic” Decompose {non, cyto, toxic} Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non, cytotoxic} , {noncytotoxic} Translate {non, cellule, toxique}, {non, cyto, toxique}, {non, cellule, toxicit´}, {non, cyto, toxicit´} e e Reorder {non, toxique, cellule}, {non, cellule, toxique}, {cellule, toxique, non} Concatenate {non, toxique, cellule}, {nontoxique, cellule}, {non, toxiquecellule}, {nontoxiquecellule} 7 / 31
  • 45. Context Translation method Ranking method Results of experiments Future work Translation process overview Input : ”non-cytotoxic” Decompose {non, cyto, toxic} Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non, cytotoxic} , {noncytotoxic} Translate {non, cellule, toxique}, {non, cyto, toxique}, {non, cellule, toxicit´}, {non, cyto, toxicit´} e e Reorder {non, toxique, cellule}, {non, cellule, toxique}, {cellule, toxique, non} Concatenate {non, toxique, cellule}, {nontoxique, cellule}, {non, toxiquecellule}, {nontoxiquecellule} Match {non, toxique, cellule} 7 / 31
  • 46. Context Translation method Ranking method Results of experiments Future work Translation process overview Input : ”non-cytotoxic” Decompose {non, cyto, toxic} Concatenate {non, cyto, toxic} , {noncyto, toxic}, {non, cytotoxic} , {noncytotoxic} Translate {non, cellule, toxique}, {non, cyto, toxique}, {non, cellule, toxicit´}, {non, cyto, toxicit´} e e Reorder {non, toxique, cellule}, {non, cellule, toxique}, {cellule, toxique, non} Concatenate {non, toxique, cellule}, {nontoxique, cellule}, {non, toxiquecellule}, {nontoxiquecellule} Match {non, toxique, cellule} Output : ”non toxique pour les cellules” ’non toxic to the cells’ 7 / 31
  • 47. Context Translation method Ranking method Results of experiments Future work Decomposition non-cytotoxic → {non, cyto, toxic} 8 / 31
  • 48. Context Translation method Ranking method Results of experiments Future work Decomposition non-cytotoxic → {non, cyto, toxic} Split source term into minimal components with heuristic rules: 8 / 31
  • 49. Context Translation method Ranking method Results of experiments Future work Decomposition non-cytotoxic → {non, cyto, toxic} Split source term into minimal components with heuristic rules: split on hyphens 8 / 31
  • 50. Context Translation method Ranking method Results of experiments Future work Decomposition non-cytotoxic → {non, cyto, toxic} Split source term into minimal components with heuristic rules: split on hyphens match substrings of the source term with: a list of morphemes a list of lexical items 8 / 31
  • 51. Context Translation method Ranking method Results of experiments Future work Decomposition non-cytotoxic → {non, cyto, toxic} Split source term into minimal components with heuristic rules: split on hyphens match substrings of the source term with: a list of morphemes a list of lexical items respect some length constraints on the substrings 8 / 31
  • 52. Context Translation method Ranking method Results of experiments Future work Concatenation 9 / 31
  • 53. Context Translation method Ranking method Results of experiments Future work Concatenation Generate all possible concatenations of the minimal components 9 / 31
  • 54. Context Translation method Ranking method Results of experiments Future work Concatenation Generate all possible concatenations of the minimal components Increases the chances of matching the components with entries of the dictionaries { non, cyto, toxic} → {non, cyto, ∅ } {non, cytotoxic} → {non, cytotoxique } 9 / 31
  • 55. Context Translation method Ranking method Results of experiments Future work Translation with direct dictionary look-up 10 / 31
  • 56. Context Translation method Ranking method Results of experiments Future work Translation with direct dictionary look-up Bilingual dictionary for lexical items: toxic → toxique 10 / 31
  • 57. Context Translation method Ranking method Results of experiments Future work Translation with direct dictionary look-up Bilingual dictionary for lexical items: toxic → toxique Morpheme translation table for bound morphemes: allow bound to free morpheme translation equivalence -cyto- → -cyto-, cellule 10 / 31
  • 58. Context Translation method Ranking method Results of experiments Future work Translation with direct dictionary look-up Bilingual dictionary for lexical items: toxic → toxique Morpheme translation table for bound morphemes: allow bound to free morpheme translation equivalence -cyto- → -cyto-, cellule {-cyto-, toxic} → {-cyto-, toxique}, {cellule, toxique} 10 / 31
  • 59. Context Translation method Ranking method Results of experiments Future work Translation with variation 11 / 31
  • 60. Context Translation method Ranking method Results of experiments Future work Translation with variation Morphological lexicon toxic → toxique → toxicit´ ’toxicity’ e 11 / 31
  • 61. Context Translation method Ranking method Results of experiments Future work Translation with variation Morphological lexicon toxic → toxique → toxicit´ ’toxicity’ e Synonyms toxic → toxique → v´n´neux ’poisonous’ e e 11 / 31
  • 62. Context Translation method Ranking method Results of experiments Future work Translation with variation Morphological lexicon toxic → toxique → toxicit´ ’toxicity’ e Synonyms toxic → toxique → v´n´neux ’poisonous’ e e {-cyto-, toxic} → {-cyto-, toxicit´}, e {-cyto-, v´n´neux}, {cellule, toxicit´}, e e e {cellule, v´n´neux} e e 11 / 31
  • 63. Context Translation method Ranking method Results of experiments Future work Reordering 12 / 31
  • 64. Context Translation method Ranking method Results of experiments Future work Reordering No translation patterns or reordering rules 12 / 31
  • 65. Context Translation method Ranking method Results of experiments Future work Reordering No translation patterns or reordering rules Permutate the translated components : {cellule, toxique} → {cellule, toxique}, {toxique, cellule} 12 / 31
  • 66. Context Translation method Ranking method Results of experiments Future work Concatenation 13 / 31
  • 67. Context Translation method Ranking method Results of experiments Future work Concatenation Recreate target words by generating all possible concatenations of the components : {toxique, cellule} → {toxique cellule}, {toxiquecellule} 13 / 31
  • 68. Context Translation method Ranking method Results of experiments Future work Selection 14 / 31
  • 69. Context Translation method Ranking method Results of experiments Future work Selection Match target words with the words of the target corpus 14 / 31
  • 70. Context Translation method Ranking method Results of experiments Future work Selection Match target words with the words of the target corpus Allow at maximum 3 stop words between two words 14 / 31
  • 71. Context Translation method Ranking method Results of experiments Future work Selection Match target words with the words of the target corpus Allow at maximum 3 stop words between two words {toxique cellule} → ‘‘toxique pour les cellules’’ ’toxic to the cells’ 14 / 31
  • 73. Context Translation method Ranking method Results of experiments Future work Target term frequency 16 / 31
  • 74. Context Translation method Ranking method Results of experiments Future work Target term frequency Number of occurrences of target term divided by the total number of occurrences in the target texts Freq(t) = occ(t) N 16 / 31
  • 75. Context Translation method Ranking method Results of experiments Future work Context similarity measure 17 / 31
  • 76. Context Translation method Ranking method Results of experiments Future work Context similarity measure Corresponds to context-based approaches 17 / 31
  • 77. Context Translation method Ranking method Results of experiments Future work Context similarity measure Corresponds to context-based approaches Collect words coocurring with source and target term in a window of 5 words 17 / 31
  • 78. Context Translation method Ranking method Results of experiments Future work Context similarity measure Corresponds to context-based approaches Collect words coocurring with source and target term in a window of 5 words Normalize cooccurrences with log-likelihood ratio 17 / 31
  • 79. Context Translation method Ranking method Results of experiments Future work Context similarity measure Corresponds to context-based approaches Collect words coocurring with source and target term in a window of 5 words Normalize cooccurrences with log-likelihood ratio Compare contexts with weighted jaccard Cont(s, t) = min(c(s, w ), c(t, w )) max(c(s, w ), c(t, w )) w ∈s∪t w ∈s∩t 17 / 31
  • 80. Context Translation method Ranking method Results of experiments Future work Part-of-speech translation probability 18 / 31
  • 81. Context Translation method Ranking method Results of experiments Future work Part-of-speech translation probability Probability that source term with part-of-speech A translates to target term with part of speech B Pos(s, t) = P(pos(t)|pos(s)) = P(B|A) 18 / 31
  • 82. Context Translation method Ranking method Results of experiments Future work Part-of-speech translation probability Probability that source term with part-of-speech A translates to target term with part of speech B Pos(s, t) = P(pos(t)|pos(s)) = P(B|A) Acquired from pos-tagged parallel corpora [Tiedemann, 2009] with word alignment software AnyMalign [Lardrilleux, 2008] 18 / 31
  • 83. Context Translation method Ranking method Results of experiments Future work Resources reliability score 19 / 31
  • 84. Context Translation method Ranking method Results of experiments Future work Resources reliability score Some translation resources might give more reliable translations than others ex : bilingual dictionary > synonyms 19 / 31
  • 85. Context Translation method Ranking method Results of experiments Future work Resources reliability score Some translation resources might give more reliable translations than others ex : bilingual dictionary > synonyms score = mean of the reliability of the resources used for translating the components Reso(t = {c1 , ...cn }) = n i=1 resource reliability (ci ) n 19 / 31
  • 86. Context Translation method Ranking method Results of experiments Future work Resources reliability score Some translation resources might give more reliable translations than others ex : bilingual dictionary > synonyms score = mean of the reliability of the resources used for translating the components Reso(t = {c1 , ...cn }) = n i=1 resource reliability (ci ) n Tuned on training data 19 / 31
  • 87. Context Translation method Ranking method Results of experiments Future work Combination 20 / 31
  • 88. Context Translation method Ranking method Results of experiments Future work Combination Linear combination of the 4 criterion Frequency, Context, Part-of-speech translation probability and Resources reliabilily Combi(t, s) = Freq(s) + Cont(s, t) + Pos(s, t) + Reso(t) 20 / 31
  • 89. Context Translation method Ranking method Results of experiments Future work Machine learning 1 http://people.cs.umass.edu/ vdang/ranklib.html 21 / 31
  • 90. Context Translation method Ranking method Results of experiments Future work Machine learning Learning-to-rank algorithms used in IR for ranking documents 1 http://people.cs.umass.edu/ vdang/ranklib.html 21 / 31
  • 91. Context Translation method Ranking method Results of experiments Future work Machine learning Learning-to-rank algorithms used in IR for ranking documents Tried 3 algorithms implemented in the RankLib software1 1 http://people.cs.umass.edu/ vdang/ranklib.html 21 / 31
  • 92. Context Translation method Ranking method Results of experiments Future work Machine learning Learning-to-rank algorithms used in IR for ranking documents Tried 3 algorithms implemented in the RankLib software1 AdaRank [Li and Xu, 2007] 1 http://people.cs.umass.edu/ vdang/ranklib.html 21 / 31
  • 93. Context Translation method Ranking method Results of experiments Future work Machine learning Learning-to-rank algorithms used in IR for ranking documents Tried 3 algorithms implemented in the RankLib software1 AdaRank [Li and Xu, 2007] Coordinate Ascend [Metzler and Croft, 2000] 1 http://people.cs.umass.edu/ vdang/ranklib.html 21 / 31
  • 94. Context Translation method Ranking method Results of experiments Future work Machine learning Learning-to-rank algorithms used in IR for ranking documents Tried 3 algorithms implemented in the RankLib software1 AdaRank [Li and Xu, 2007] Coordinate Ascend [Metzler and Croft, 2000] LambdaMart [Wu et al., 2010] 1 http://people.cs.umass.edu/ vdang/ranklib.html 21 / 31
  • 95. Context Translation method Ranking method Results of experiments Future work Machine learning Learning-to-rank algorithms used in IR for ranking documents Tried 3 algorithms implemented in the RankLib software1 AdaRank [Li and Xu, 2007] Coordinate Ascend [Metzler and Croft, 2000] LambdaMart [Wu et al., 2010] Features: Freq, Cont, Pos, Reso 1 http://people.cs.umass.edu/ vdang/ranklib.html 21 / 31
  • 97. Context Translation method Ranking method Results of experiments Future work Corpora 23 / 31
  • 98. Context Translation method Ranking method Results of experiments Future work Corpora English → French, German 23 / 31
  • 99. Context Translation method Ranking method Results of experiments Future work Corpora English → French, German breast cancer 23 / 31
  • 100. Context Translation method Ranking method Results of experiments Future work Corpora English → French, German breast cancer ≈ 400k words per language 23 / 31
  • 101. Context Translation method Ranking method Results of experiments Future work Lexicons 24 / 31
  • 102. Context Translation method Ranking method Results of experiments Future work Lexicons Morpheme translation table (hand-crafted) 24 / 31
  • 103. Context Translation method Ranking method Results of experiments Future work Lexicons Morpheme translation table (hand-crafted) General language dictionary (Xelda) 24 / 31
  • 104. Context Translation method Ranking method Results of experiments Future work Lexicons Morpheme translation table (hand-crafted) General language dictionary (Xelda) Synonyms (Xelda) 24 / 31
  • 105. Context Translation method Ranking method Results of experiments Future work Lexicons Morpheme translation table (hand-crafted) General language dictionary (Xelda) Synonyms (Xelda) Domain-specific dictionary : cognates extracted from corpus [Hauer and Kondrak, 2011] 24 / 31
  • 106. Context Translation method Ranking method Results of experiments Future work Lexicons Morpheme translation table (hand-crafted) General language dictionary (Xelda) Synonyms (Xelda) Domain-specific dictionary : cognates extracted from corpus [Hauer and Kondrak, 2011] Morphological families [Porter, 1980] 24 / 31
  • 107. Context Translation method Ranking method Results of experiments Future work Training and evaluation datasets 25 / 31
  • 108. Context Translation method Ranking method Results of experiments Future work Training and evaluation datasets EVALUATION ≈ 100 source terms 25 / 31
  • 109. Context Translation method Ranking method Results of experiments Future work Training and evaluation datasets EVALUATION ≈ 100 source terms source terms in UMLS meta-thesaurus with translation(s) in target texts 25 / 31
  • 110. Context Translation method Ranking method Results of experiments Future work Training and evaluation datasets EVALUATION ≈ 100 source terms source terms in UMLS meta-thesaurus with translation(s) in target texts TRAINING ≈ 600 source terms 25 / 31
  • 111. Context Translation method Ranking method Results of experiments Future work Training and evaluation datasets EVALUATION ≈ 100 source terms source terms in UMLS meta-thesaurus with translation(s) in target texts TRAINING ≈ 600 source terms source terms for which a translation could be generated and whose translation(s) is in the target texts 25 / 31
  • 112. Context Translation method Ranking method Results of experiments Future work Training and evaluation datasets EVALUATION ≈ 100 source terms source terms in UMLS meta-thesaurus with translation(s) in target texts TRAINING ≈ 600 source terms source terms for which a translation could be generated and whose translation(s) is in the target texts generated translations were scored manually 25 / 31
  • 113. Context Translation method Ranking method Results of experiments Future work Training and evaluation datasets EVALUATION ≈ 100 source terms source terms in UMLS meta-thesaurus with translation(s) in target texts TRAINING ≈ 600 source terms source terms for which a translation could be generated and whose translation(s) is in the target texts generated translations were scored manually ⇒ evaluation and training datasets are disjoint 25 / 31
  • 114. Context Translation method Ranking method Results of experiments Future work Training and evaluation datasets EVALUATION ≈ 100 source terms source terms in UMLS meta-thesaurus with translation(s) in target texts TRAINING ≈ 600 source terms source terms for which a translation could be generated and whose translation(s) is in the target texts generated translations were scored manually ⇒ evaluation and training datasets are disjoint ⇒ source terms are morphologically complex words with no translation in dictionary 25 / 31
  • 115. Context Translation method Ranking method Results of experiments Future work Results for translation generation # source terms # at least 1 translation EN → FR 126 86 (68%) EN → DE 90 56 (62%) # at least 1 translation 1 trans. in UMLS 1 trans. in UMLS or judged correct 86 68 (79%) 81 (94%) 56 40 (71%) 51 (91%) 26 / 31
  • 116. Context Translation method Ranking method Results of experiments Future work Results for translation ranking Random Freq Cont Pos Reso Combination ML AdaRank ML CoordAsc ML LambdaMart EN → FR .83 .92 .90 .88 .92 .93 .90 .93 .86 EN → DE .80 .84 .82 .91 .82 .89 .84 .89 .88 Average .815 .88 .86 .895 .87 .91 .87 .91 .87 Table: Top1 translation in UMLS or judged correct 27 / 31
  • 117. Context Translation method Ranking method Results of experiments Future work Silence analysis 28 / 31
  • 118. Context Translation method Ranking method Results of experiments Future work Silence analysis Missing translation in resources (≈30%) 28 / 31
  • 119. Context Translation method Ranking method Results of experiments Future work Silence analysis Missing translation in resources (≈30%) Target term is not compositional (≈30%) breastfeeding → allaitement (FR), stillen (DE) 28 / 31
  • 120. Context Translation method Ranking method Results of experiments Future work Silence analysis Missing translation in resources (≈30%) Target term is not compositional (≈30%) breastfeeding → allaitement (FR), stillen (DE) Lexical divergence (≈20%) radiosensitivity → Strahlentoleranz, sensitivity = toleranz 28 / 31
  • 121. Context Translation method Ranking method Results of experiments Future work Silence analysis Missing translation in resources (≈30%) Target term is not compositional (≈30%) breastfeeding → allaitement (FR), stillen (DE) Lexical divergence (≈20%) radiosensitivity → Strahlentoleranz, sensitivity = toleranz Additional elements (≈13%) postpartum→ postpartalperiod 28 / 31
  • 122. Context Translation method Ranking method Results of experiments Future work Error analysis 29 / 31
  • 123. Context Translation method Ranking method Results of experiments Future work Error analysis Problems in word reordering self-examination → untersuchung selbst ’examination self’ 29 / 31
  • 124. Context Translation method Ranking method Results of experiments Future work Error analysis Problems in word reordering self-examination → untersuchung selbst ’examination self’ Wrong or innapropriate translations in-patient → pas malade ’not ill’ in → “inside” → inside patient in → “inverse” → not a patient 29 / 31
  • 125. Context Translation method Ranking method Results of experiments Future work Impact of fertile translations exact translations wrong translations EN → FR 21% 50% EN → DE 10% 80% Table: % of fertile translations 30 / 31
  • 126. Context Translation method Ranking method Results of experiments Future work Impact of fertile translations exact translations wrong translations EN → FR 21% 50% EN → DE 10% 80% Table: % of fertile translations German germanic language: tendency to agglutination oestrogen-independant → Ostrogen-unabh¨ngige a French romance language: creates phrases more easily oestrogen-independant → ind´pendant des œstrog`nes e e 30 / 31
  • 128. Context Translation method Ranking method Results of experiments Future work Future work Improve quality of linguistic resources morphological derivation rules instead of stemming use of a thesaurus Try translations patterns on top of permutations Try learning morpheme translation equivalences from cognates bilingual dictionaries out-of-domain parallel data 31 / 31
  • 129. Thank you for your attention. B estelle.delpech@univ-nantes.fr beatrice.daille@univ-nantes.fr emmanuel.morin@univ-nantes.fr cl@lingua-et-machina.com
  • 131. Exact translations Non fertiles: pathophysiological → physiopathologique overactive → uberaktiv ¨ Fertiles: cardiotoxicity → toxicit´ cardiaque ’cardiac toxicity’ e mastectomy → ablation der brust ’ablation of the breast’
  • 132. Morphological variants Non fertiles: dosimetry → dosim´trique ’dosimetric’ e radiosensitivity → strahlenempfindlich ’radiosensitive’ Fertiles: milk-producing → production de lait ’production of milk’ selfexamination → selbst untersuchen ’self examine’
  • 133. Inexact but semantically related Non fertiles: oncogene → oncog´n`se ’oncogenesis’ e e breakthrough → durchbrechen ’break’ Fertiles: chemoradiotherapy → chemotherapie oder strahlen ’chemotherapy or radiation’ treatable → pouvoir le traiter ’can treat it’
  • 134. Wrong translations Non fertiles: immunoscore → immunomarquer ’immunostain’ check-in → unkontrollieren ’uncontrolled’ Fertiles: bloodstream → fliessen mehr blut ’more blood flow’ risk-reducing → risque de r´duire ’risk of reducing’ e
  • 135. References I Baldwin, T. and Tanaka, T. (2004). Translation by machine of complex nominals. In Proceedings of the ACL 2004 Workshop on Multiword expressions: Integrating Processing, pages 24–31, Barcelona, Spain. Bo, L. and Gaussier, E. (2010). Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In 23`me International Conference on Computational Linguistics, pages 23–27, Beijing, Chine. e Cartoni, B. (2009). Lexical morphology in machine translation: A feasibility study. In Proceedings of the 12th Conference of the European Chapter of the ACL, pages 130–138, Athens, Greece. Daille, B. and Morin, E. (2005). French-English terminology extraction from comparable corpora. In Proceedings, 2nd International Joint Conference on Natural Language Processing, volume 3651 of Lecture Notes in Computer Sciences, page 707–718, Jeju Island, Korea. Springer. Delpech, E. (2011). Evaluation of terminologies acquired from comparable corpora : an application perspective. In Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011), volume 11 of NEALT Proceedings Series,, pages 66–73, Riga, Latvia. Pedersen B.S., Neˇpore G., Skadi¸ a I. s n Fung, P. (1997). Finding terminology translations from non-parallel corpora. pages 192–202, Hong Kong. Garera, N. and Yarowsky, D. (2008). Translating compounds by learning component gloss translation via multiple languages. In Proceedings of the 3rd International Joint Conference on Natural Language Processing, volume 1, pages 403–410, Hyderabad, India.
  • 136. References II Grefenstette, G. (1999). The world wide web as a resource for example-based machine translation tasks. ASLIB’99 Translating and the computer, 21. Harastani, R., Daille, B., and Morin, E. (2012). Neoclassical compound alignments from comparable corpora. In Proceedings of the 13th International Conference on Computational Linguistics and Intelligent Text Processing, volume 2, pages 72–82, New Delhi, India. Hauer, B. and Kondrak, G. (2011). Clustering semantically equivalent words into cognate sets in multilingual lists. In Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 865–873, Chiang Mai, Thailand. Keenan, E. L. and Faltz, L. M. (1985). Boolean semantics for natural language. D. Reidel, Dordrecht, Holland. Lardrilleux, A. (2008). A truly multilingual, high coverage, accurate, yet simple, sub-sentential alignment method. Li, H. and Xu, J. (2007). Adarank: A boosing algorithm for information retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 391–398, Amsterdam, The Netherlands. Metzler, D. and Croft, W. B. (2000). Linear feature-based models for information retrieval. Information Retrieval, 10(3):257–274.
  • 137. References III Morin, E. and Daille, B. (2009). Compositionality and lexical alignment of multi-word terms. In Language Resources and Evaluation (LRE), volume 44 of Multiword expression: hard going or plain sailing, pages 79–95. P. Rayson, S. Piao, S. Sharoff, S. Evert, B. Villada Moir´n, springer netherlands o edition. Morin, E. and Daille, B. (2010). Compositionality and lexical alignment of multi-word terms. In Rayson, P., Piao, S., Sharoff, S., Evert, S., and B., V. M., editors, Language Resources and Evaluation (LRE), volume 44 of Multiword expression: hard going or plain sailing, pages 79–95. Springer Netherlands. Namer, F. and Baud, R. (2007). Defining and relating biomedical terms: Towards a cross-language morphosemantics-based system. International Journal of Medical Informatics, 76(2-3):226–33. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3):130–137. Robitaille, X., Sasaki, X., Tonoike, M., Sato, S., and Utsuro, S. (2006). Compiling French-Japanese terminologies from the web. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pages 225–232, Trento, Italy. Tiedemann, J. (2009). News from opus - a collection of multilingual parallel corpora with tools and interfaces. Wu, Q., Burges, J. C., Svore, K., and Gao, J. (2010). Adapting boosting for information retrieval measures. Journal of Information Retrieval, 13(3):254–270.