Material of the 4th Intensive Summer school and collaborative workshop on Natural Language Processing (NAIST Franco-Thai Workshop 2010).
Bangkok, Thaıland.
Institution: Institut de Recherche en Informatique de Toulouse (IRIT), Lingua et Machina
2. About me
●
●
●
●
●
●
●
●
Estelle Delpech
Research engineer at Lingua et Machina,
France
CAT tools provider
ed(at)lingua-et-machina(dot)com
www.lingua-et-machina.com
Ph. Candidate at LINA, France
taln team : specialises in NLP
estelle.delpech(at)univ-nantes(dot)fr
2
3. LINGUA ET MACHINA
●
●
●
●
French company
Founded by Dr E. Planas
Led by Dr. F. De Colstoun
Small but innovative
●
8 persons
●
2 R&D engineers / Ph. D. candidates
● NLP
● Computational Linguistics
● Translation Studies
3
4. LINGUA ET MACHINA
●
2002
●
●
●
●
SIMILIS
2nd generation translation
memories
Based on Ph.D. work
2007
●
●
●
LIBELLEX
Access to TM for non-professionals
Translation and terminology
management platform
4
9. SIMILIS technology
Based on the Ph. D. work of E. Planas
●
First generation translation memory
● Works with segments, sentences
●
Second generation translation memory
● Works with chunks
● [the driver] [steps] [on the gas pedal]
●
Chunking
● Rules written by linguists
●
Fuzzy matching
● Modified edit-distance
● Several linguistic levels
●
9
10. From SIMILIS to LIBELLEX
Source Text
French Documents
Moderator
Memory
(TMX)
Glossary
English Documents
Translated Text
(lexicon)
Moderator
Translators linguists
Business Experts
10
11. LIBELLEX
●
●
Translation memories meet corporate content
management
Target : global companies
●
Many languages
● customers
● Parterns
● employees
●
Speakers
● Non native
● Not language professionals
●
Terminology and translations needs
● Official documentation
● Day to day intern communication
11
12. Libellex
●
●
Terminology management platform
● builds corporate TM
● extract / check terminology
● help employees communicate
Translation management platform
● manage translations jobs
● terminologies for translation agencies
● chunk matches for MT
12
14. R-D-I at Lingua et Machina
On going
●
Statistical term extraction
● « Cheap and quick » addition of new
languages
●
Consider hybridation with rule-based methods
●
Term alignment in comparable corpora
●
Modelize translation process
Planned
●
Development of rule-based chunking on
Chinese
●
Extraction of « Knowledge-rich contexts » for
terminologies
14
15. Research partnerships
●
●
●
●
●
Statistical term extraction and alignment
●
A. Lardilleux, Y. Lepage (Caen/Waseda)
Chinsese processing
●
EDF, Kinep
Comparable corpora
● National project + Ph. D. candidate
KRC extraction
● European project submission
Translation studies
● Ph. D. candidate : Stendhal University
15
16. Statistical term extraction and
alignment
●
●
●
Algorithm developed by A. Lardilleux in Ph. D.
Thesis
●
http://users.info.unicaen.fr/~alardill/
Uses “perfect alignments“
●
Source and target words that only occur in
the same source and target sentences
adf ↔ AD
b ↔ BE
b ↔ CF
a e ↔ AE
d
D
R n o ly b ild sm sa p s o co u
adm u s a
ll m
le f rp s
● Perfect alignments add-up
16
17. Chinese and other languages
●
●
●
●
Chinese processing
●
EDF uses Libellex
●
Needs ZH↔FR ZH ↔ EN translation
Currently :
●
Statistical term alignment and extraction
Planned :
●
Chinese chunking rule
●
Develop hybrid statistical/rule-based
chunk alignment
Other languages :
●
Asian
●
Northern european
●
Eastern european
17
19. Metricc : term alignment in comparable
corpora
●
●
●
●
●
Based on distributional analysis hypothesis
●
Words that appear in similar contexts
have similar meaning
Represent context of a word in vector :
●
Word cooccurrents + normalized
frequencies
Translate context vector with seed lexicon
Compute distance between source and target
vectors
The closer , the better
19
20. Knowledge-Rich Contexts Extraction
●
●
●
●
Project under submission
Scope : european
Partners :
●
Inbenta , BEO
●
Lljublana University, LINA
Knowlege-rich contexts
●
Help understand the term
●
Indicates of to use the term
20
21. Knowledge-Rich Contexts Extraction
●
●
●
Examples of KRC :
●
Contains of definition
●
Describes a relation between two terms
●
Indicates a collocation
●
Illustrates the term
KRC linguistic description
●
Exemples, definitions in dictionaries
●
Corpus study
KRC automatic identification
●
Morpho syntactic patterns
●
Statistical clues
21
22. Modelization of translation process
●
●
●
●
●
●
Research engineer / Ph. D. Thesis
●
Department of translations studies
●
Université Stendhal, Grenoble
How do we translate ?
What knowledge is helpful to
translators ?
What is a good translation ?
Do non-professional translate
differently ?
How do you improve software usability
?
22
23. More information
●
●
●
Lingua et Machina
● www.lingua-et-machina.com/
● contact(a)lingua-et-machina.com
Libellex
● http://libellex.fr/
Download Similis
● http://similis.org/Download/SimilisFreel
ance-2.16.04-Setup.exe
23