Material presented at the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India.
Paper download at http://hal.archives-ouvertes.fr/hal-00743807.
Institutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina, Gremuts.
Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking
1. Extraction of domain-specific bilingual lexicon
from comparable corpora
compositional translation and ranking
Estelle Delpech1 , B´atrice Daille1 , Emmanuel Morin1 , Claire
e
Lemaire2,3
1 LINA,
2 GREMUTS, Universit´ de Grenoble
Universit´ de Nantes
e
e
3 Lingua et Machina
COLING’12
10/12/12
Mumbai, India
5. Context
Translation method
Ranking method
Results of experiments
Future work
Context : comparable corpora for Computer-Aided
Translation
Aim : provide domain-specific bilingual lexicons to translators
when no parallel data is available
1 / 31
6. Context
Translation method
Ranking method
Results of experiments
Future work
Context : comparable corpora for Computer-Aided
Translation
Aim : provide domain-specific bilingual lexicons to translators
when no parallel data is available
⇒ Comparable corpora :
Set of texts in languages L1 and L2, which are not
translations, but which deal with the same subject matter, so
that there is still a possibility to extract translation pairs
1 / 31
9. Context
Translation method
Ranking method
Results of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:
51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]
2 / 31
10. Context
Translation method
Ranking method
Results of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:
51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]
2 / 31
11. Context
Translation method
Ranking method
Results of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:
51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :
2 / 31
12. Context
Translation method
Ranking method
Results of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:
51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :
81% to 94% precision on Top1
[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
2 / 31
13. Context
Translation method
Ranking method
Results of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:
51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :
81% to 94% precision on Top1
[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
More than 60% of terms in technical and scientific domains are
morphologically complex [Namer and Baud, 2007]
2 / 31
14. Context
Translation method
Ranking method
Results of experiments
Future work
Motivations for compositional translation
Usual context-based methods [Fung, 1997]:
51% to 88% precision on top 20 candidates with specialized
corpora [Daille and Morin, 2005]
⇒ lexicons difficult to use for translators [Delpech, 2011]
Compositional translation :
81% to 94% precision on Top1
[Robitaille et al., 2006, Cartoni, 2009, Morin and Daille, 2009]
More than 60% of terms in technical and scientific domains are
morphologically complex [Namer and Baud, 2007]
Outperforms context-based approaches for the translation of
terms with compositional meaning [Morin and Daille, 2009]
2 / 31
15. Context
Translation method
Ranking method
Results of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
3 / 31
16. Context
Translation method
Ranking method
Results of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
3 / 31
17. Context
Translation method
Ranking method
Results of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}
3 / 31
18. Context
Translation method
Ranking method
Results of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}
Translate {α, β}
3 / 31
19. Context
Translation method
Ranking method
Results of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose {a, b}
Translate {α, β}
Reorder {αβ, βα}
3 / 31
20. Context
Translation method
Ranking method
Results of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose
Translate
Reorder
Select
{a, b}
{α, β}
{αβ, βα}
αβ
3 / 31
21. Context
Translation method
Ranking method
Results of experiments
Future work
Compositional translation
Compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
Input : ”ab”
Decompose
Translate
Reorder
Select
{a, b}
{α, β}
{αβ, βα}
αβ
Output : ”αβ”
3 / 31
23. Context
Translation method
Ranking method
Results of experiments
Future work
Related work
Applied to phrases, decomposed into words
[Robitaille et al., 2006, Morin and Daille, 2009]
rate of evaporation → taux d’´vaporation
e
4 / 31
24. Context
Translation method
Ranking method
Results of experiments
Future work
Related work
Applied to phrases, decomposed into words
[Robitaille et al., 2006, Morin and Daille, 2009]
rate of evaporation → taux d’´vaporation
e
Applied to words, decomposed into morphemes
[Cartoni, 2009, Harastani et al., 2012]
cardiology → cardiologie
ricostruire → rebuild
4 / 31
25. Context
Translation method
Ranking method
Results of experiments
Future work
Related work
Applied to phrases, decomposed into words
[Robitaille et al., 2006, Morin and Daille, 2009]
rate of evaporation → taux d’´vaporation
e
Applied to words, decomposed into morphemes
[Cartoni, 2009, Harastani et al., 2012]
cardiology → cardiologie
ricostruire → rebuild
⇒ No approach links bound morphemes to words :
-cyto- → cellule ’cell’
cytotoxic → toxique pour les cellules ’toxic to the cells’
4 / 31
28. Context
Translation method
Ranking method
Results of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web
[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
5 / 31
29. Context
Translation method
Ranking method
Results of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web
[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
5 / 31
30. Context
Translation method
Ranking method
Results of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web
[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
5 / 31
31. Context
Translation method
Ranking method
Results of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web
[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
5 / 31
32. Context
Translation method
Ranking method
Results of experiments
Future work
Selection and ranking methods
Select translations that occur in target texts / Web
[Morin and Daille, 2009]
Select most frequent translation [Grefenstette, 1999]
Compare contexts [Garera and Yarowsky, 2008]
ML : Binary classifier [Baldwin and Tanaka, 2004]
⇒ Combination of criterion
⇒ ML : Learning-to-rank algorithms (IR)
5 / 31
48. Context
Translation method
Ranking method
Results of experiments
Future work
Decomposition
non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristic
rules:
8 / 31
49. Context
Translation method
Ranking method
Results of experiments
Future work
Decomposition
non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristic
rules:
split on hyphens
8 / 31
50. Context
Translation method
Ranking method
Results of experiments
Future work
Decomposition
non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristic
rules:
split on hyphens
match substrings of the source term with:
a list of morphemes
a list of lexical items
8 / 31
51. Context
Translation method
Ranking method
Results of experiments
Future work
Decomposition
non-cytotoxic → {non, cyto, toxic}
Split source term into minimal components with heuristic
rules:
split on hyphens
match substrings of the source term with:
a list of morphemes
a list of lexical items
respect some length constraints on the substrings
8 / 31
54. Context
Translation method
Ranking method
Results of experiments
Future work
Concatenation
Generate all possible concatenations of the minimal
components
Increases the chances of matching the components with
entries of the dictionaries
{ non, cyto, toxic} → {non, cyto, ∅ }
{non, cytotoxic} → {non, cytotoxique }
9 / 31
61. Context
Translation method
Ranking method
Results of experiments
Future work
Translation with variation
Morphological lexicon
toxic → toxique → toxicit´ ’toxicity’
e
Synonyms
toxic → toxique → v´n´neux ’poisonous’
e e
11 / 31
62. Context
Translation method
Ranking method
Results of experiments
Future work
Translation with variation
Morphological lexicon
toxic → toxique → toxicit´ ’toxicity’
e
Synonyms
toxic → toxique → v´n´neux ’poisonous’
e e
{-cyto-, toxic} → {-cyto-, toxicit´},
e
{-cyto-, v´n´neux}, {cellule, toxicit´},
e e
e
{cellule, v´n´neux}
e e
11 / 31
67. Context
Translation method
Ranking method
Results of experiments
Future work
Concatenation
Recreate target words by generating all possible
concatenations of the components :
{toxique, cellule} →
{toxique cellule},
{toxiquecellule}
13 / 31
70. Context
Translation method
Ranking method
Results of experiments
Future work
Selection
Match target words with the words of the target corpus
Allow at maximum 3 stop words between two words
14 / 31
71. Context
Translation method
Ranking method
Results of experiments
Future work
Selection
Match target words with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’
14 / 31
74. Context
Translation method
Ranking method
Results of experiments
Future work
Target term frequency
Number of occurrences of target term divided by the total
number of occurrences in the target texts
Freq(t) =
occ(t)
N
16 / 31
77. Context
Translation method
Ranking method
Results of experiments
Future work
Context similarity measure
Corresponds to context-based approaches
Collect words coocurring with source and target term in a
window of 5 words
17 / 31
78. Context
Translation method
Ranking method
Results of experiments
Future work
Context similarity measure
Corresponds to context-based approaches
Collect words coocurring with source and target term in a
window of 5 words
Normalize cooccurrences with log-likelihood ratio
17 / 31
79. Context
Translation method
Ranking method
Results of experiments
Future work
Context similarity measure
Corresponds to context-based approaches
Collect words coocurring with source and target term in a
window of 5 words
Normalize cooccurrences with log-likelihood ratio
Compare contexts with weighted jaccard
Cont(s, t) =
min(c(s, w ), c(t, w ))
max(c(s, w ), c(t, w ))
w ∈s∪t
w ∈s∩t
17 / 31
81. Context
Translation method
Ranking method
Results of experiments
Future work
Part-of-speech translation probability
Probability that source term with part-of-speech A translates
to target term with part of speech B
Pos(s, t)
= P(pos(t)|pos(s))
= P(B|A)
18 / 31
82. Context
Translation method
Ranking method
Results of experiments
Future work
Part-of-speech translation probability
Probability that source term with part-of-speech A translates
to target term with part of speech B
Pos(s, t)
= P(pos(t)|pos(s))
= P(B|A)
Acquired from pos-tagged parallel corpora [Tiedemann, 2009]
with word alignment software AnyMalign [Lardrilleux, 2008]
18 / 31
84. Context
Translation method
Ranking method
Results of experiments
Future work
Resources reliability score
Some translation resources might give more reliable
translations than others
ex : bilingual dictionary > synonyms
19 / 31
85. Context
Translation method
Ranking method
Results of experiments
Future work
Resources reliability score
Some translation resources might give more reliable
translations than others
ex : bilingual dictionary > synonyms
score = mean of the reliability of the resources used for
translating the components
Reso(t = {c1 , ...cn }) =
n
i=1
resource reliability (ci )
n
19 / 31
86. Context
Translation method
Ranking method
Results of experiments
Future work
Resources reliability score
Some translation resources might give more reliable
translations than others
ex : bilingual dictionary > synonyms
score = mean of the reliability of the resources used for
translating the components
Reso(t = {c1 , ...cn }) =
n
i=1
resource reliability (ci )
n
Tuned on training data
19 / 31
90. Context
Translation method
Ranking method
Results of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
1
http://people.cs.umass.edu/ vdang/ranklib.html
21 / 31
91. Context
Translation method
Ranking method
Results of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
1
http://people.cs.umass.edu/ vdang/ranklib.html
21 / 31
92. Context
Translation method
Ranking method
Results of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
AdaRank [Li and Xu, 2007]
1
http://people.cs.umass.edu/ vdang/ranklib.html
21 / 31
93. Context
Translation method
Ranking method
Results of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
AdaRank [Li and Xu, 2007]
Coordinate Ascend [Metzler and Croft, 2000]
1
http://people.cs.umass.edu/ vdang/ranklib.html
21 / 31
94. Context
Translation method
Ranking method
Results of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
AdaRank [Li and Xu, 2007]
Coordinate Ascend [Metzler and Croft, 2000]
LambdaMart [Wu et al., 2010]
1
http://people.cs.umass.edu/ vdang/ranklib.html
21 / 31
95. Context
Translation method
Ranking method
Results of experiments
Future work
Machine learning
Learning-to-rank algorithms used in IR for ranking documents
Tried 3 algorithms implemented in the RankLib software1
AdaRank [Li and Xu, 2007]
Coordinate Ascend [Metzler and Croft, 2000]
LambdaMart [Wu et al., 2010]
Features: Freq, Cont, Pos, Reso
1
http://people.cs.umass.edu/ vdang/ranklib.html
21 / 31
109. Context
Translation method
Ranking method
Results of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus with
translation(s) in target texts
25 / 31
110. Context
Translation method
Ranking method
Results of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus with
translation(s) in target texts
TRAINING ≈ 600 source terms
25 / 31
111. Context
Translation method
Ranking method
Results of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus with
translation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could be
generated and whose translation(s) is in the
target texts
25 / 31
112. Context
Translation method
Ranking method
Results of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus with
translation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could be
generated and whose translation(s) is in the
target texts
generated translations were scored manually
25 / 31
113. Context
Translation method
Ranking method
Results of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus with
translation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could be
generated and whose translation(s) is in the
target texts
generated translations were scored manually
⇒ evaluation and training datasets are disjoint
25 / 31
114. Context
Translation method
Ranking method
Results of experiments
Future work
Training and evaluation datasets
EVALUATION ≈ 100 source terms
source terms in UMLS meta-thesaurus with
translation(s) in target texts
TRAINING ≈ 600 source terms
source terms for which a translation could be
generated and whose translation(s) is in the
target texts
generated translations were scored manually
⇒ evaluation and training datasets are disjoint
⇒ source terms are morphologically complex words with no
translation in dictionary
25 / 31
115. Context
Translation method
Ranking method
Results of experiments
Future work
Results for translation generation
# source terms
# at least 1 translation
EN → FR
126
86 (68%)
EN → DE
90
56 (62%)
# at least 1 translation
1 trans. in UMLS
1 trans. in UMLS or judged correct
86
68 (79%)
81 (94%)
56
40 (71%)
51 (91%)
26 / 31
116. Context
Translation method
Ranking method
Results of experiments
Future work
Results for translation ranking
Random
Freq
Cont
Pos
Reso
Combination
ML AdaRank
ML CoordAsc
ML LambdaMart
EN → FR
.83
.92
.90
.88
.92
.93
.90
.93
.86
EN → DE
.80
.84
.82
.91
.82
.89
.84
.89
.88
Average
.815
.88
.86
.895
.87
.91
.87
.91
.87
Table: Top1 translation in UMLS or judged correct
27 / 31
124. Context
Translation method
Ranking method
Results of experiments
Future work
Error analysis
Problems in word reordering
self-examination → untersuchung selbst ’examination self’
Wrong or innapropriate translations
in-patient → pas malade ’not ill’
in → “inside” → inside patient
in → “inverse” → not a patient
29 / 31
125. Context
Translation method
Ranking method
Results of experiments
Future work
Impact of fertile translations
exact translations
wrong translations
EN → FR
21%
50%
EN → DE
10%
80%
Table: % of fertile translations
30 / 31
126. Context
Translation method
Ranking method
Results of experiments
Future work
Impact of fertile translations
exact translations
wrong translations
EN → FR
21%
50%
EN → DE
10%
80%
Table: % of fertile translations
German germanic language: tendency to agglutination
oestrogen-independant → Ostrogen-unabh¨ngige
a
French romance language: creates phrases more easily
oestrogen-independant → ind´pendant des œstrog`nes
e
e
30 / 31
128. Context
Translation method
Ranking method
Results of experiments
Future work
Future work
Improve quality of linguistic resources
morphological derivation rules instead of stemming
use of a thesaurus
Try translations patterns on top of permutations
Try learning morpheme translation equivalences from
cognates
bilingual dictionaries
out-of-domain parallel data
31 / 31
129. Thank you for your attention.
B
estelle.delpech@univ-nantes.fr
beatrice.daille@univ-nantes.fr
emmanuel.morin@univ-nantes.fr
cl@lingua-et-machina.com
131. Exact translations
Non fertiles:
pathophysiological → physiopathologique
overactive → uberaktiv
¨
Fertiles:
cardiotoxicity → toxicit´ cardiaque ’cardiac toxicity’
e
mastectomy → ablation der brust ’ablation of the breast’
132. Morphological variants
Non fertiles:
dosimetry → dosim´trique ’dosimetric’
e
radiosensitivity → strahlenempfindlich ’radiosensitive’
Fertiles:
milk-producing → production de lait ’production of milk’
selfexamination → selbst untersuchen ’self examine’
133. Inexact but semantically related
Non fertiles:
oncogene → oncog´n`se ’oncogenesis’
e e
breakthrough → durchbrechen ’break’
Fertiles:
chemoradiotherapy → chemotherapie oder strahlen
’chemotherapy or radiation’
treatable → pouvoir le traiter ’can treat it’
134. Wrong translations
Non fertiles:
immunoscore → immunomarquer ’immunostain’
check-in → unkontrollieren ’uncontrolled’
Fertiles:
bloodstream → fliessen mehr blut ’more blood flow’
risk-reducing → risque de r´duire ’risk of reducing’
e
135. References I
Baldwin, T. and Tanaka, T. (2004).
Translation by machine of complex nominals.
In Proceedings of the ACL 2004 Workshop on Multiword expressions: Integrating Processing, pages 24–31,
Barcelona, Spain.
Bo, L. and Gaussier, E. (2010).
Improving corpus comparability for bilingual lexicon extraction from comparable corpora.
In 23`me International Conference on Computational Linguistics, pages 23–27, Beijing, Chine.
e
Cartoni, B. (2009).
Lexical morphology in machine translation: A feasibility study.
In Proceedings of the 12th Conference of the European Chapter of the ACL, pages 130–138, Athens, Greece.
Daille, B. and Morin, E. (2005).
French-English terminology extraction from comparable corpora.
In Proceedings, 2nd International Joint Conference on Natural Language Processing, volume 3651 of
Lecture Notes in Computer Sciences, page 707–718, Jeju Island, Korea. Springer.
Delpech, E. (2011).
Evaluation of terminologies acquired from comparable corpora : an application perspective.
In Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011), volume 11
of NEALT Proceedings Series,, pages 66–73, Riga, Latvia. Pedersen B.S., Neˇpore G., Skadi¸ a I.
s
n
Fung, P. (1997).
Finding terminology translations from non-parallel corpora.
pages 192–202, Hong Kong.
Garera, N. and Yarowsky, D. (2008).
Translating compounds by learning component gloss translation via multiple languages.
In Proceedings of the 3rd International Joint Conference on Natural Language Processing, volume 1, pages
403–410, Hyderabad, India.
136. References II
Grefenstette, G. (1999).
The world wide web as a resource for example-based machine translation tasks.
ASLIB’99 Translating and the computer, 21.
Harastani, R., Daille, B., and Morin, E. (2012).
Neoclassical compound alignments from comparable corpora.
In Proceedings of the 13th International Conference on Computational Linguistics and Intelligent Text
Processing, volume 2, pages 72–82, New Delhi, India.
Hauer, B. and Kondrak, G. (2011).
Clustering semantically equivalent words into cognate sets in multilingual lists.
In Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 865–873,
Chiang Mai, Thailand.
Keenan, E. L. and Faltz, L. M. (1985).
Boolean semantics for natural language.
D. Reidel, Dordrecht, Holland.
Lardrilleux, A. (2008).
A truly multilingual, high coverage, accurate, yet simple, sub-sentential alignment method.
Li, H. and Xu, J. (2007).
Adarank: A boosing algorithm for information retrieval.
In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in
information retrieval, pages 391–398, Amsterdam, The Netherlands.
Metzler, D. and Croft, W. B. (2000).
Linear feature-based models for information retrieval.
Information Retrieval, 10(3):257–274.
137. References III
Morin, E. and Daille, B. (2009).
Compositionality and lexical alignment of multi-word terms.
In Language Resources and Evaluation (LRE), volume 44 of Multiword expression: hard going or plain
sailing, pages 79–95. P. Rayson, S. Piao, S. Sharoff, S. Evert, B. Villada Moir´n, springer netherlands
o
edition.
Morin, E. and Daille, B. (2010).
Compositionality and lexical alignment of multi-word terms.
In Rayson, P., Piao, S., Sharoff, S., Evert, S., and B., V. M., editors, Language Resources and Evaluation
(LRE), volume 44 of Multiword expression: hard going or plain sailing, pages 79–95. Springer Netherlands.
Namer, F. and Baud, R. (2007).
Defining and relating biomedical terms: Towards a cross-language morphosemantics-based system.
International Journal of Medical Informatics, 76(2-3):226–33.
Porter, M. F. (1980).
An algorithm for suffix stripping.
Program, 14(3):130–137.
Robitaille, X., Sasaki, X., Tonoike, M., Sato, S., and Utsuro, S. (2006).
Compiling French-Japanese terminologies from the web.
In Proceedings of the 11th Conference of the European Chapter of the Association for Computational
Linguistics, pages 225–232, Trento, Italy.
Tiedemann, J. (2009).
News from opus - a collection of multilingual parallel corpora with tools and interfaces.
Wu, Q., Burges, J. C., Svore, K., and Gao, J. (2010).
Adapting boosting for information retrieval measures.
Journal of Information Retrieval, 13(3):254–270.