SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Franco-Thai Workshop 2010

Lingua et Machina
Research & Development

1
About me

●

●

●
●
●

●
●
●

Estelle Delpech
Research engineer at Lingua et Machina,
France
CAT tools provider
ed(at)lingua-et-machina(dot)com
www.lingua-et-machina.com
Ph. Candidate at LINA, France
taln team : specialises in NLP
estelle.delpech(at)univ-nantes(dot)fr

2
LINGUA ET MACHINA
●
●
●

●

French company
Founded by Dr E. Planas
Led by Dr. F. De Colstoun
Small but innovative
●
8 persons
●
2 R&D engineers / Ph. D. candidates
● NLP
● Computational Linguistics
● Translation Studies

3
LINGUA ET MACHINA

●

2002
●
●

●
●

SIMILIS
2nd generation translation
memories
Based on Ph.D. work

2007
●
●
●

LIBELLEX
Access to TM for non-professionals
Translation and terminology
management platform

4
They trust us

5
Partners

6
SIMILIS
●

●

●

●

Computer-aided translation
● Free -lance translators
● Translation agencies
Translation memories
●
Pre translations
Terminology extraction
7 languages : FR,EN,IT,ES,PT,DE,NL
→ rule based

7
Similis

Part

1/1

TITLE 1

8
SIMILIS technology
Based on the Ph. D. work of E. Planas
●
First generation translation memory
● Works with segments, sentences
●
Second generation translation memory
● Works with chunks
● [the driver] [steps] [on the gas pedal]
●
Chunking
● Rules written by linguists
●
Fuzzy matching
● Modified edit-distance
● Several linguistic levels
●

9
From SIMILIS to LIBELLEX

Source Text

French Documents

Moderator

Memory
(TMX)

Glossary
English Documents

Translated Text

(lexicon)

Moderator
Translators linguists
Business Experts
10
LIBELLEX

●

●

Translation memories meet corporate content
management
Target : global companies
●
Many languages
● customers
● Parterns
● employees
●
Speakers
● Non native
● Not language professionals
●
Terminology and translations needs
● Official documentation
● Day to day intern communication
11
Libellex

●

●

Terminology management platform
● builds corporate TM
● extract / check terminology
● help employees communicate
Translation management platform
● manage translations jobs
● terminologies for translation agencies
● chunk matches for MT

12
Libellex

Part

1/1

TITLE 1
●
●
●
●
●
●

Look up a word, a term, an expression
Manage terminology
Have a document translated
Check translations
Check text
Add new documents
13
R-D-I at Lingua et Machina
On going
●
Statistical term extraction
● « Cheap and quick » addition of new
languages
●
Consider hybridation with rule-based methods
●
Term alignment in comparable corpora
●
Modelize translation process
Planned
●
Development of rule-based chunking on
Chinese
●
Extraction of « Knowledge-rich contexts » for
terminologies
14
Research partnerships
●

●

●

●

●

Statistical term extraction and alignment
●
A. Lardilleux, Y. Lepage (Caen/Waseda)
Chinsese processing
●
EDF, Kinep
Comparable corpora
● National project + Ph. D. candidate
KRC extraction
● European project submission
Translation studies
● Ph. D. candidate : Stendhal University

15
Statistical term extraction and
alignment
●

●

●

Algorithm developed by A. Lardilleux in Ph. D.
Thesis
●
http://users.info.unicaen.fr/~alardill/
Uses “perfect alignments“
●
Source and target words that only occur in
the same source and target sentences
adf ↔ AD
b ↔ BE
b ↔ CF
a e ↔ AE
d
D
R n o ly b ild sm sa p s o co u
adm u s a
ll m
le f rp s
● Perfect alignments add-up
16
Chinese and other languages

●

●

●

●

Chinese processing
●
EDF uses Libellex
●
Needs ZH↔FR ZH ↔ EN translation
Currently :
●
Statistical term alignment and extraction
Planned :
●
Chinese chunking rule
●
Develop hybrid statistical/rule-based
chunk alignment
Other languages :
●
Asian
●
Northern european
●
Eastern european
17
Metricc projetc

●
●

●

Scope : national
Bilingual terminologies mining from
comparable corpora
●
CAT
●
Translation memories
●
CLIR
Partners
● Syllabs, Sinéqua, LM
● IMAG, Valoria

http://www.metricc.com
18
Metricc : term alignment in comparable
corpora
●

●

●
●

●

Based on distributional analysis hypothesis
●
Words that appear in similar contexts
have similar meaning
Represent context of a word in vector :
●
Word cooccurrents + normalized
frequencies
Translate context vector with seed lexicon
Compute distance between source and target
vectors
The closer , the better

19
Knowledge-Rich Contexts Extraction
●
●
●

●

Project under submission
Scope : european
Partners :
●
Inbenta , BEO
●
Lljublana University, LINA
Knowlege-rich contexts
●
Help understand the term
●
Indicates of to use the term

20
Knowledge-Rich Contexts Extraction
●

●

●

Examples of KRC :
●
Contains of definition
●
Describes a relation between two terms
●
Indicates a collocation
●
Illustrates the term
KRC linguistic description
●
Exemples, definitions in dictionaries
●
Corpus study
KRC automatic identification
●
Morpho syntactic patterns
●
Statistical clues
21
Modelization of translation process

●

●
●

●
●

●

Research engineer / Ph. D. Thesis
●
Department of translations studies
●
Université Stendhal, Grenoble
How do we translate ?
What knowledge is helpful to
translators ?
What is a good translation ?
Do non-professional translate
differently ?
How do you improve software usability
?

22
More information
●

●

●

Lingua et Machina
● www.lingua-et-machina.com/
● contact(a)lingua-et-machina.com
Libellex
● http://libellex.fr/
Download Similis
● http://similis.org/Download/SimilisFreel
ance-2.16.04-Setup.exe

23
Franco-Thai Workshop 2010
Thank you
ed(a)lingua-et-machina.com

24

Contenu connexe

En vedette

Chuyên phân phối, bán sỉ lẻ tủ nhựa đựng quần áo
Chuyên phân phối, bán sỉ lẻ tủ nhựa đựng quần áoChuyên phân phối, bán sỉ lẻ tủ nhựa đựng quần áo
Chuyên phân phối, bán sỉ lẻ tủ nhựa đựng quần áobabimart babimartviet
 
Design journal 2
Design journal 2Design journal 2
Design journal 2chiasueyi
 
Clasificacion de las computadoras
Clasificacion de las computadorasClasificacion de las computadoras
Clasificacion de las computadorasGiovanniCaballero
 
Distributional Tests for LIGO Detections
Distributional Tests for LIGO DetectionsDistributional Tests for LIGO Detections
Distributional Tests for LIGO DetectionsSophia Schwalbe
 
REDES SOCIALES
REDES SOCIALES REDES SOCIALES
REDES SOCIALES monilen
 
Vintage adventof jesusishope_12.1.13
Vintage adventof jesusishope_12.1.13Vintage adventof jesusishope_12.1.13
Vintage adventof jesusishope_12.1.13Deacon Godsey
 
Talking to Others About Jesus
Talking to Others About JesusTalking to Others About Jesus
Talking to Others About JesusDave Stewart
 
Ratios de rentabilidad
Ratios de rentabilidadRatios de rentabilidad
Ratios de rentabilidadLeidy Guevara
 
Naskah Ujian Nasional IPA SMP Tahun 2013 paket-08
Naskah Ujian Nasional IPA SMP Tahun  2013 paket-08Naskah Ujian Nasional IPA SMP Tahun  2013 paket-08
Naskah Ujian Nasional IPA SMP Tahun 2013 paket-08sajidintuban
 
Customer Success in HR Technology
Customer Success in HR TechnologyCustomer Success in HR Technology
Customer Success in HR TechnologyGainsight
 
Ноябрь 2015
Ноябрь 2015Ноябрь 2015
Ноябрь 2015maltzewa
 

En vedette (19)

Chuyên phân phối, bán sỉ lẻ tủ nhựa đựng quần áo
Chuyên phân phối, bán sỉ lẻ tủ nhựa đựng quần áoChuyên phân phối, bán sỉ lẻ tủ nhựa đựng quần áo
Chuyên phân phối, bán sỉ lẻ tủ nhựa đựng quần áo
 
Design journal 2
Design journal 2Design journal 2
Design journal 2
 
3D Animasyon, Simülasyon
3D Animasyon, Simülasyon3D Animasyon, Simülasyon
3D Animasyon, Simülasyon
 
Normas iso 9000
Normas iso 9000Normas iso 9000
Normas iso 9000
 
Resume
ResumeResume
Resume
 
Śniadanie Daje Moc
Śniadanie Daje MocŚniadanie Daje Moc
Śniadanie Daje Moc
 
Clasificacion de las computadoras
Clasificacion de las computadorasClasificacion de las computadoras
Clasificacion de las computadoras
 
Informe tecnico ingrid y yuleys ii
Informe tecnico ingrid y yuleys iiInforme tecnico ingrid y yuleys ii
Informe tecnico ingrid y yuleys ii
 
Distributional Tests for LIGO Detections
Distributional Tests for LIGO DetectionsDistributional Tests for LIGO Detections
Distributional Tests for LIGO Detections
 
REDES SOCIALES
REDES SOCIALES REDES SOCIALES
REDES SOCIALES
 
Vintage adventof jesusishope_12.1.13
Vintage adventof jesusishope_12.1.13Vintage adventof jesusishope_12.1.13
Vintage adventof jesusishope_12.1.13
 
Talking to Others About Jesus
Talking to Others About JesusTalking to Others About Jesus
Talking to Others About Jesus
 
Ratios de rentabilidad
Ratios de rentabilidadRatios de rentabilidad
Ratios de rentabilidad
 
1personas y trabajo
1personas y trabajo1personas y trabajo
1personas y trabajo
 
Naskah Ujian Nasional IPA SMP Tahun 2013 paket-08
Naskah Ujian Nasional IPA SMP Tahun  2013 paket-08Naskah Ujian Nasional IPA SMP Tahun  2013 paket-08
Naskah Ujian Nasional IPA SMP Tahun 2013 paket-08
 
Customer Success in HR Technology
Customer Success in HR TechnologyCustomer Success in HR Technology
Customer Success in HR Technology
 
Proyecto "Preparando generaciones".
Proyecto "Preparando generaciones".  Proyecto "Preparando generaciones".
Proyecto "Preparando generaciones".
 
Ноябрь 2015
Ноябрь 2015Ноябрь 2015
Ноябрь 2015
 
วันวชิราวุธ 25 พฤศจิกายน 2556
วันวชิราวุธ 25 พฤศจิกายน 2556วันวชิราวุธ 25 พฤศจิกายน 2556
วันวชิราวุธ 25 พฤศจิกายน 2556
 

Similaire à R&D Lingua et Machina

Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Prompsit Language Engineering
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Gema Ramirez-Sanchez
 
2016 EDRLab roadmap at epubsummit
2016 EDRLab roadmap at epubsummit2016 EDRLab roadmap at epubsummit
2016 EDRLab roadmap at epubsummitLaurent Le Meur
 
The essentials of the IT industry or What I wish I was taught about at Univer...
The essentials of the IT industry or What I wish I was taught about at Univer...The essentials of the IT industry or What I wish I was taught about at Univer...
The essentials of the IT industry or What I wish I was taught about at Univer...Equal Experts
 
Python assignment help from professional programmers
Python assignment help from professional programmersPython assignment help from professional programmers
Python assignment help from professional programmersAnderson Silva
 
Laura Dent: Single-Source and Localization
Laura Dent: Single-Source and LocalizationLaura Dent: Single-Source and Localization
Laura Dent: Single-Source and LocalizationJack Molisani
 
UN World Food Programme Standards & Best Practises (European Drupal Days 2015)
UN World Food Programme Standards & Best Practises (European Drupal Days 2015)UN World Food Programme Standards & Best Practises (European Drupal Days 2015)
UN World Food Programme Standards & Best Practises (European Drupal Days 2015)Eugenio Minardi
 
A Comprehensive Guide of Python Final Year Projects with Source Code.pdf
A Comprehensive Guide of Python Final Year Projects with Source Code.pdfA Comprehensive Guide of Python Final Year Projects with Source Code.pdf
A Comprehensive Guide of Python Final Year Projects with Source Code.pdfjagan477830
 
iMT Language Solutions
iMT Language SolutionsiMT Language Solutions
iMT Language SolutionsSDL
 
CV_FranciscoJonathanG_20160126
CV_FranciscoJonathanG_20160126CV_FranciscoJonathanG_20160126
CV_FranciscoJonathanG_20160126Jonathan Francisco
 
The Customer Evangelist Panel: Presentation Lotusphere 2010
The Customer Evangelist Panel: Presentation Lotusphere 2010The Customer Evangelist Panel: Presentation Lotusphere 2010
The Customer Evangelist Panel: Presentation Lotusphere 2010Roberto Mazzoni
 
Flutter not yet another mobile cross-platform framework - i ox-kl19
Flutter   not yet another mobile cross-platform framework - i ox-kl19Flutter   not yet another mobile cross-platform framework - i ox-kl19
Flutter not yet another mobile cross-platform framework - i ox-kl19oradoe
 
Roots and Routes: Crowdsourced Manuscript Transcription Workshop
Roots and Routes: Crowdsourced Manuscript Transcription WorkshopRoots and Routes: Crowdsourced Manuscript Transcription Workshop
Roots and Routes: Crowdsourced Manuscript Transcription WorkshopBen Brumfield
 
Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones RIILP
 
Drupal 7 multilingual strategy
Drupal 7 multilingual strategyDrupal 7 multilingual strategy
Drupal 7 multilingual strategyMariano
 
Summer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet EngineeringSummer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet EngineeringJácome Cunha
 
Best Practices with Zend Framework - Matthew Weier O'Phinney
Best Practices with Zend Framework - Matthew Weier O'PhinneyBest Practices with Zend Framework - Matthew Weier O'Phinney
Best Practices with Zend Framework - Matthew Weier O'Phinneydpc
 
Computer Programming Overview
Computer Programming OverviewComputer Programming Overview
Computer Programming Overviewagorolabs
 

Similaire à R&D Lingua et Machina (20)

Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...
 
2016 EDRLab roadmap at epubsummit
2016 EDRLab roadmap at epubsummit2016 EDRLab roadmap at epubsummit
2016 EDRLab roadmap at epubsummit
 
C Programming - Refresher - Part I
C Programming - Refresher - Part I C Programming - Refresher - Part I
C Programming - Refresher - Part I
 
The essentials of the IT industry or What I wish I was taught about at Univer...
The essentials of the IT industry or What I wish I was taught about at Univer...The essentials of the IT industry or What I wish I was taught about at Univer...
The essentials of the IT industry or What I wish I was taught about at Univer...
 
Python assignment help from professional programmers
Python assignment help from professional programmersPython assignment help from professional programmers
Python assignment help from professional programmers
 
Laura Dent: Single-Source and Localization
Laura Dent: Single-Source and LocalizationLaura Dent: Single-Source and Localization
Laura Dent: Single-Source and Localization
 
UN World Food Programme Standards & Best Practises (European Drupal Days 2015)
UN World Food Programme Standards & Best Practises (European Drupal Days 2015)UN World Food Programme Standards & Best Practises (European Drupal Days 2015)
UN World Food Programme Standards & Best Practises (European Drupal Days 2015)
 
A Comprehensive Guide of Python Final Year Projects with Source Code.pdf
A Comprehensive Guide of Python Final Year Projects with Source Code.pdfA Comprehensive Guide of Python Final Year Projects with Source Code.pdf
A Comprehensive Guide of Python Final Year Projects with Source Code.pdf
 
iMT Language Solutions
iMT Language SolutionsiMT Language Solutions
iMT Language Solutions
 
CV_FranciscoJonathanG_20160126
CV_FranciscoJonathanG_20160126CV_FranciscoJonathanG_20160126
CV_FranciscoJonathanG_20160126
 
PROSE: Empowering FLOSS in European Projects
PROSE: Empowering FLOSS in European ProjectsPROSE: Empowering FLOSS in European Projects
PROSE: Empowering FLOSS in European Projects
 
The Customer Evangelist Panel: Presentation Lotusphere 2010
The Customer Evangelist Panel: Presentation Lotusphere 2010The Customer Evangelist Panel: Presentation Lotusphere 2010
The Customer Evangelist Panel: Presentation Lotusphere 2010
 
Flutter not yet another mobile cross-platform framework - i ox-kl19
Flutter   not yet another mobile cross-platform framework - i ox-kl19Flutter   not yet another mobile cross-platform framework - i ox-kl19
Flutter not yet another mobile cross-platform framework - i ox-kl19
 
Roots and Routes: Crowdsourced Manuscript Transcription Workshop
Roots and Routes: Crowdsourced Manuscript Transcription WorkshopRoots and Routes: Crowdsourced Manuscript Transcription Workshop
Roots and Routes: Crowdsourced Manuscript Transcription Workshop
 
Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones
 
Drupal 7 multilingual strategy
Drupal 7 multilingual strategyDrupal 7 multilingual strategy
Drupal 7 multilingual strategy
 
Summer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet EngineeringSummer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet Engineering
 
Best Practices with Zend Framework - Matthew Weier O'Phinney
Best Practices with Zend Framework - Matthew Weier O'PhinneyBest Practices with Zend Framework - Matthew Weier O'Phinney
Best Practices with Zend Framework - Matthew Weier O'Phinney
 
Computer Programming Overview
Computer Programming OverviewComputer Programming Overview
Computer Programming Overview
 

Plus de Estelle Delpech

Génération automatique de texte
Génération automatique de texteGénération automatique de texte
Génération automatique de texteEstelle Delpech
 
Identification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxIdentification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxEstelle Delpech
 
Découverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesDécouverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesEstelle Delpech
 
Invited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardInvited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardEstelle Delpech
 
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Estelle Delpech
 
Identification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxIdentification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxEstelle Delpech
 
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Estelle Delpech
 
Nomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchNomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchEstelle Delpech
 
Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Estelle Delpech
 
Nomao: local search and recommendation engine
Nomao: local search and recommendation engineNomao: local search and recommendation engine
Nomao: local search and recommendation engineEstelle Delpech
 
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Estelle Delpech
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Estelle Delpech
 
Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesEstelle Delpech
 
Évaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeÉvaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeEstelle Delpech
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeEstelle Delpech
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology miningEstelle Delpech
 
Robust rule-based parsing
Robust rule-based parsingRobust rule-based parsing
Robust rule-based parsingEstelle Delpech
 
Experimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmExperimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmEstelle Delpech
 
Text Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringText Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringEstelle Delpech
 

Plus de Estelle Delpech (19)

Génération automatique de texte
Génération automatique de texteGénération automatique de texte
Génération automatique de texte
 
Identification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxIdentification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieux
 
Découverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesDécouverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des Langues
 
Invited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardInvited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis award
 
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
 
Identification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxIdentification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieux
 
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...
 
Nomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchNomao: data analysis for personalized local search
Nomao: data analysis for personalized local search
 
Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)
 
Nomao: local search and recommendation engine
Nomao: local search and recommendation engineNomao: local search and recommendation engine
Nomao: local search and recommendation engine
 
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
 
Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologies
 
Évaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeÉvaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialisée
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
 
Robust rule-based parsing
Robust rule-based parsingRobust rule-based parsing
Robust rule-based parsing
 
Experimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmExperimenting the TextTiling Algorithm
Experimenting the TextTiling Algorithm
 
Text Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringText Processing for Procedural Question Answering
Text Processing for Procedural Question Answering
 

Dernier

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

R&D Lingua et Machina

  • 1. Franco-Thai Workshop 2010 Lingua et Machina Research & Development 1
  • 2. About me ● ● ● ● ● ● ● ● Estelle Delpech Research engineer at Lingua et Machina, France CAT tools provider ed(at)lingua-et-machina(dot)com www.lingua-et-machina.com Ph. Candidate at LINA, France taln team : specialises in NLP estelle.delpech(at)univ-nantes(dot)fr 2
  • 3. LINGUA ET MACHINA ● ● ● ● French company Founded by Dr E. Planas Led by Dr. F. De Colstoun Small but innovative ● 8 persons ● 2 R&D engineers / Ph. D. candidates ● NLP ● Computational Linguistics ● Translation Studies 3
  • 4. LINGUA ET MACHINA ● 2002 ● ● ● ● SIMILIS 2nd generation translation memories Based on Ph.D. work 2007 ● ● ● LIBELLEX Access to TM for non-professionals Translation and terminology management platform 4
  • 7. SIMILIS ● ● ● ● Computer-aided translation ● Free -lance translators ● Translation agencies Translation memories ● Pre translations Terminology extraction 7 languages : FR,EN,IT,ES,PT,DE,NL → rule based 7
  • 9. SIMILIS technology Based on the Ph. D. work of E. Planas ● First generation translation memory ● Works with segments, sentences ● Second generation translation memory ● Works with chunks ● [the driver] [steps] [on the gas pedal] ● Chunking ● Rules written by linguists ● Fuzzy matching ● Modified edit-distance ● Several linguistic levels ● 9
  • 10. From SIMILIS to LIBELLEX Source Text French Documents Moderator Memory (TMX) Glossary English Documents Translated Text (lexicon) Moderator Translators linguists Business Experts 10
  • 11. LIBELLEX ● ● Translation memories meet corporate content management Target : global companies ● Many languages ● customers ● Parterns ● employees ● Speakers ● Non native ● Not language professionals ● Terminology and translations needs ● Official documentation ● Day to day intern communication 11
  • 12. Libellex ● ● Terminology management platform ● builds corporate TM ● extract / check terminology ● help employees communicate Translation management platform ● manage translations jobs ● terminologies for translation agencies ● chunk matches for MT 12
  • 13. Libellex Part 1/1 TITLE 1 ● ● ● ● ● ● Look up a word, a term, an expression Manage terminology Have a document translated Check translations Check text Add new documents 13
  • 14. R-D-I at Lingua et Machina On going ● Statistical term extraction ● « Cheap and quick » addition of new languages ● Consider hybridation with rule-based methods ● Term alignment in comparable corpora ● Modelize translation process Planned ● Development of rule-based chunking on Chinese ● Extraction of « Knowledge-rich contexts » for terminologies 14
  • 15. Research partnerships ● ● ● ● ● Statistical term extraction and alignment ● A. Lardilleux, Y. Lepage (Caen/Waseda) Chinsese processing ● EDF, Kinep Comparable corpora ● National project + Ph. D. candidate KRC extraction ● European project submission Translation studies ● Ph. D. candidate : Stendhal University 15
  • 16. Statistical term extraction and alignment ● ● ● Algorithm developed by A. Lardilleux in Ph. D. Thesis ● http://users.info.unicaen.fr/~alardill/ Uses “perfect alignments“ ● Source and target words that only occur in the same source and target sentences adf ↔ AD b ↔ BE b ↔ CF a e ↔ AE d D R n o ly b ild sm sa p s o co u adm u s a ll m le f rp s ● Perfect alignments add-up 16
  • 17. Chinese and other languages ● ● ● ● Chinese processing ● EDF uses Libellex ● Needs ZH↔FR ZH ↔ EN translation Currently : ● Statistical term alignment and extraction Planned : ● Chinese chunking rule ● Develop hybrid statistical/rule-based chunk alignment Other languages : ● Asian ● Northern european ● Eastern european 17
  • 18. Metricc projetc ● ● ● Scope : national Bilingual terminologies mining from comparable corpora ● CAT ● Translation memories ● CLIR Partners ● Syllabs, Sinéqua, LM ● IMAG, Valoria http://www.metricc.com 18
  • 19. Metricc : term alignment in comparable corpora ● ● ● ● ● Based on distributional analysis hypothesis ● Words that appear in similar contexts have similar meaning Represent context of a word in vector : ● Word cooccurrents + normalized frequencies Translate context vector with seed lexicon Compute distance between source and target vectors The closer , the better 19
  • 20. Knowledge-Rich Contexts Extraction ● ● ● ● Project under submission Scope : european Partners : ● Inbenta , BEO ● Lljublana University, LINA Knowlege-rich contexts ● Help understand the term ● Indicates of to use the term 20
  • 21. Knowledge-Rich Contexts Extraction ● ● ● Examples of KRC : ● Contains of definition ● Describes a relation between two terms ● Indicates a collocation ● Illustrates the term KRC linguistic description ● Exemples, definitions in dictionaries ● Corpus study KRC automatic identification ● Morpho syntactic patterns ● Statistical clues 21
  • 22. Modelization of translation process ● ● ● ● ● ● Research engineer / Ph. D. Thesis ● Department of translations studies ● Université Stendhal, Grenoble How do we translate ? What knowledge is helpful to translators ? What is a good translation ? Do non-professional translate differently ? How do you improve software usability ? 22
  • 23. More information ● ● ● Lingua et Machina ● www.lingua-et-machina.com/ ● contact(a)lingua-et-machina.com Libellex ● http://libellex.fr/ Download Similis ● http://similis.org/Download/SimilisFreel ance-2.16.04-Setup.exe 23
  • 24. Franco-Thai Workshop 2010 Thank you ed(a)lingua-et-machina.com 24