SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Dealing with Lexicon Acquired from
Comparable Corpora
Post-edition and Exchange
Estelle Delpech, Lingua et Machina
Béatrice Daille, U. de Nantes - LINA

1/23
Working w/ lexicon acquired from
comparable corpora
I. Terminology acquisition from

comparable corpora : quick overview

II. A tool for terminology post-edition
III. Data exchange : a TBX variant for
automatically acquired lexicons

IV. Future work

2/23
Part I
Terminology Acquisition from
Comparable Corpora

3/23
Terminology acquisition from
comparable corpora




Comparable corpora:
“Two corpora, respectively in two languages l1 and l2 are said
”comparable” if there exists a substantial part of the
vocabulary of the corpus in language l1 whose translation can
be found in the corpus in language l2.”
(my translation of [Déjan and Gaussier, 2002] )



Advantages :


Availabily



Real usages

4/23
Terminology acquisition from
comparable corpora




Terminology extraction : a contextual analysis







Compare contexts of source and target terms
If contexts are similar, there's a good chance
source and target terms are translations of each
other, ex :
mastectomy : reconstruction, prophylactic, treat,
undergo, removal
mastectomie : reconstruction, prophylactique,
traiter, subir, ablation
5/23
Terminology acquisition from
comparable corpora




Outputs one-to-many alignments
– Evaluation : precision on the TopNBest alignments
mastectomy



Results





0,92 ablation
0,89 mastectomie
0,48 opération

Not as good as acquisition from parallel corpora !
Fung (1997) : 30 % accuracy on the Top20
candidates
Morin et al. (2004) : translation is usually the 34th for
6/23
complex terms
Part II
A Tool for Post-edition

7/23
A tool for post-edition


Existing Tools :



ArayaTermExtractor (Waldhör 2006)





iView (Merkel and Foo, 2007)
Xerox Terminology Suite ®

Our needs :


Deal with one-to-many alignments



Non-aligned contexts



Allow non binary annotation



Display useful information to help finding the right
candidate in the corpus
8/23
“Useful” information
→ Knownledge that helps catching the in vivo
behavior terms
→Text-driven, term-oriented approach


Useful information :


Variants



Collocations



Distributional neighbors



Contexts

→ To be harvested during the term extraction /
alignment process

9/23
Useful information : example
Mastectomy

Mastectomie

risk reducting ~
simple ~

~ préventive
~ simple

Tumorectomy
Lumpectomy
Oophorectomy

Tumorectomie
Ablation
Opération

...patient may choose to have
risk-reducing bilateral
mastectomy if they have a
strong family history of breast
cancer...

...la mastectomie préventive
pourrait supprimer la grande
majorité du risque de
développer un cancer...
10/23
Post-edition interface
http://80.82.238.151/Metricc/InterfaceValidation, user “test”, no password

11/23
Part III
Data Exchange :
a TBX variant for
automatically acquired
lexicon

12/23
Quick introduction to TBX (1)





TBX : Term Base eXchange
Open, XML-based standard for exchanging
structured terminological data
approved as an international standard by LISA
and ISO (norm 30042)



Maps to TMF data model



Subset of MARTIF



Designed for various use cases



Customizable
13/23
Quick introduction to TBX (2)


2 components :




Structure : core structure based on TMF
metamodel
Content : formalism to express data-categories
and their constraints
Content

Form
Core DTD/Schema

Default TBX

Default XCS

XCS1

TBX variant 1

Adapted from ISO norm 30042:2008, Fig. 4, p.30

XCSn

TBX variant n

14/23
Quick introduction to TBX (3)


Form defined in DTD



Content
defined in XCS

respPerson
responsability
reliabilityCode
partOfSpeech
corpusTrace
termType
usageNote
Taken from ISO norm 30042:2008, Fig. 1, p.9

15/23
TBX variant for lexicon acquired from
comparable corpora


Default TBX data-categories


termType : entryTerm, variant



externalCrossReference, usageNote



partOfSpeech, frequency, reliabilityCode...



transactionType, responsability

+ Customized data-categories :


occurrences, occurrenceCount



relatedTerm



termDefinition, definitionRelevance



ntigReference

16/23
TBX variant : A term entry

17/23
TBX variant : 1-to-n alignments

18/23
TBX variant : approved alignment

19/23
Feed-back on TBX
TBX is made for stable terminologies with little
uncertainy on the status of translations not
machine-generated lexicons of “candidate
translations” :



difficult to separate of term + properties from its
alignments



no data category specific to automatically estimated
reliability





Difficult to make text-driven, term-oriented
knowledge fit in a concept oriented format


no definition category that would apply to a single term
and not the whole concept
Conclusion
Future work

21/23
Future work


Integration of prototype in Libellex


TBX import / export



edition of linguistic properties



User testing (ergonomics)



Evaluation of added-value for translation



Explore new ways of :


aligning terms



selecting contexts
22/23
References


Post-edition prototype on line : http://80.82.238.151/Metricc/InterfaceValidation/ user “test”,
no password



Metricc project : http://www.metricc.com/



Lingua et Machina : http://www.lingua-et-machina.com/



Comparable corpora : Déjean, H., Gaussier, É. (2002) : “Une nouvelle approche à
l'extraction de lexiques bilingues à partir de corpus comparables”, In Lexicometrica,
Alignement Lexical dans les corpus multilingues, pp.1-22.



ArayaTermExtractor : http://www.heartsome.de



Xerox Terminology Suite : http://www.temis.com/









Iview : Nyström, M., Merkel, M., Ahrenberg, L., Zweignebaum, P., Petersson, H. and
Åhlfeldt H. (2006) : “Creating a medical English-Swedish dictionary using interactive word
alignment”', In BMC Medical Informatics and Decision Making, 2006, pp. 6-35
TMF : ISO 16642 - Terminological markup framework
TBX : ISO 30042 - Systems to manage terminology, knowledge and content -- TermBase
eXchange (TBX)
Data categories : ISO 12620 - Terminology and other language and content resources -Specification of data categories and management of a Data Category Registry for language
resources

Contenu connexe

Tendances

ESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and Databases
ESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and DatabasesESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and Databases
ESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and Databaseseswcsummerschool
 
17. Anne Schuman (USAAR) Terminology and Ontologies 2
17. Anne Schuman (USAAR) Terminology and Ontologies 217. Anne Schuman (USAAR) Terminology and Ontologies 2
17. Anne Schuman (USAAR) Terminology and Ontologies 2RIILP
 
ontology based- data_integration.
ontology based- data_integration.ontology based- data_integration.
ontology based- data_integration.AliAlJadaa
 
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...From Free-text User Reviews to Product Recommendation using Paragraph Vectors...
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...Γιώργος Αλεξανδρίδης
 
Ontology engineering: Ontology alignment
Ontology engineering: Ontology alignmentOntology engineering: Ontology alignment
Ontology engineering: Ontology alignmentGuus Schreiber
 
Ontology For Data Integration
Ontology For Data IntegrationOntology For Data Integration
Ontology For Data Integrationjuanesteva
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for TranslationRIILP
 
SWiM – A Semantic Wiki for Mathematical Knowledge Management
SWiM – A Semantic Wiki for Mathematical Knowledge ManagementSWiM – A Semantic Wiki for Mathematical Knowledge Management
SWiM – A Semantic Wiki for Mathematical Knowledge ManagementChristoph Lange
 
Robust rule-based parsing
Robust rule-based parsingRobust rule-based parsing
Robust rule-based parsingEstelle Delpech
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic webWorawith Sangkatip
 
GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003butest
 
16. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 116. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 1RIILP
 
ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5
ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5
ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5Daisuke BEKKI
 
SMalL - Semantic Malware Log Based Reporter
SMalL  - Semantic Malware Log Based ReporterSMalL  - Semantic Malware Log Based Reporter
SMalL - Semantic Malware Log Based ReporterStefan Prutianu
 
Introduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologyIntroduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologySteven Miller
 
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHSTEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHSijcsit
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYijnlc
 
The Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
The Distributed Ontology Language (DOL): Use Cases, Syntax, and ExtensibilityThe Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
The Distributed Ontology Language (DOL): Use Cases, Syntax, and ExtensibilityChristoph Lange
 
Ontology integration - Heterogeneity, Techniques and more
Ontology integration - Heterogeneity, Techniques and moreOntology integration - Heterogeneity, Techniques and more
Ontology integration - Heterogeneity, Techniques and moreAdriel Café
 

Tendances (20)

ESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and Databases
ESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and DatabasesESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and Databases
ESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and Databases
 
17. Anne Schuman (USAAR) Terminology and Ontologies 2
17. Anne Schuman (USAAR) Terminology and Ontologies 217. Anne Schuman (USAAR) Terminology and Ontologies 2
17. Anne Schuman (USAAR) Terminology and Ontologies 2
 
ontology based- data_integration.
ontology based- data_integration.ontology based- data_integration.
ontology based- data_integration.
 
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...From Free-text User Reviews to Product Recommendation using Paragraph Vectors...
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...
 
Ontology engineering: Ontology alignment
Ontology engineering: Ontology alignmentOntology engineering: Ontology alignment
Ontology engineering: Ontology alignment
 
Ontology For Data Integration
Ontology For Data IntegrationOntology For Data Integration
Ontology For Data Integration
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
 
SWiM – A Semantic Wiki for Mathematical Knowledge Management
SWiM – A Semantic Wiki for Mathematical Knowledge ManagementSWiM – A Semantic Wiki for Mathematical Knowledge Management
SWiM – A Semantic Wiki for Mathematical Knowledge Management
 
Robust rule-based parsing
Robust rule-based parsingRobust rule-based parsing
Robust rule-based parsing
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic web
 
GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003
 
16. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 116. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 1
 
ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5
ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5
ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5
 
SMalL - Semantic Malware Log Based Reporter
SMalL  - Semantic Malware Log Based ReporterSMalL  - Semantic Malware Log Based Reporter
SMalL - Semantic Malware Log Based Reporter
 
Introduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologyIntroduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and Terminology
 
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHSTEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
 
The Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
The Distributed Ontology Language (DOL): Use Cases, Syntax, and ExtensibilityThe Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
The Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
 
Ontology integration - Heterogeneity, Techniques and more
Ontology integration - Heterogeneity, Techniques and moreOntology integration - Heterogeneity, Techniques and more
Ontology integration - Heterogeneity, Techniques and more
 
Ontology matching
Ontology matchingOntology matching
Ontology matching
 

En vedette

Chelo Vargas-Sierra
Chelo Vargas-SierraChelo Vargas-Sierra
Chelo Vargas-SierraChelo Vargas
 
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Estelle Delpech
 
Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesEstelle Delpech
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Association for Computational Linguistics
 
Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Tobias Wunner
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationwebLyzard technology
 
Macro economische analyse van brazilië
Macro economische analyse van braziliëMacro economische analyse van brazilië
Macro economische analyse van braziliëJan-Willem Lammens
 
Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Adrien Barbaresi
 
Parallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaParallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaHaithem Afli
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology miningEstelle Delpech
 
A cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconA cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconİrem Tümer
 
Bilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsBilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsAlberto Simões
 
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Association for Computational Linguistics
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Association for Computational Linguistics
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Estelle Delpech
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionSarvnaz Karimi
 
Word Formation in English
Word Formation in EnglishWord Formation in English
Word Formation in Englishteflang
 

En vedette (17)

Chelo Vargas-Sierra
Chelo Vargas-SierraChelo Vargas-Sierra
Chelo Vargas-Sierra
 
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
 
Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologies
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
 
Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and Evaluation
 
Macro economische analyse van brazilië
Macro economische analyse van braziliëMacro economische analyse van brazilië
Macro economische analyse van brazilië
 
Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...
 
Parallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaParallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corpora
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
 
A cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconA cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexicon
 
Bilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsBilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation Patterns
 
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
 
Word Formation in English
Word Formation in EnglishWord Formation in English
Word Formation in English
 

Similaire à Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange

Doc format.
Doc format.Doc format.
Doc format.butest
 
Developing an architecture for translation engine using ontology
Developing an architecture for translation engine using ontologyDeveloping an architecture for translation engine using ontology
Developing an architecture for translation engine using ontologyAlexander Decker
 
Development of the database, the website and the online transcription platfor...
Development of the database, the website and the online transcription platfor...Development of the database, the website and the online transcription platfor...
Development of the database, the website and the online transcription platfor...Itinera Nova
 
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...Christophe Tricot
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextUniversity of Bari (Italy)
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
 
Dimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityDimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityLawrie Hunter
 
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...University of Bari (Italy)
 
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h442010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44Alain Désilets
 
Term and terminology interactive fun
Term and terminology interactive funTerm and terminology interactive fun
Term and terminology interactive funPatricia Brenes
 
download
downloaddownload
downloadbutest
 
download
downloaddownload
downloadbutest
 
Multilingual Text Classification using Ontologies
Multilingual Text Classification using OntologiesMultilingual Text Classification using Ontologies
Multilingual Text Classification using OntologiesGerard de Melo
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyVassilis Protonotarios
 

Similaire à Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange (20)

Doc format.
Doc format.Doc format.
Doc format.
 
Developing an architecture for translation engine using ontology
Developing an architecture for translation engine using ontologyDeveloping an architecture for translation engine using ontology
Developing an architecture for translation engine using ontology
 
Lost in Translation - Gabriel Emanuel Borlean
Lost in Translation - Gabriel Emanuel BorleanLost in Translation - Gabriel Emanuel Borlean
Lost in Translation - Gabriel Emanuel Borlean
 
Development of the database, the website and the online transcription platfor...
Development of the database, the website and the online transcription platfor...Development of the database, the website and the online transcription platfor...
Development of the database, the website and the online transcription platfor...
 
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...
 
AICOL2015_paper_16
AICOL2015_paper_16AICOL2015_paper_16
AICOL2015_paper_16
 
Dimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityDimensions of Media Object Comprehensibility
Dimensions of Media Object Comprehensibility
 
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
 
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h442010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44
 
Term and terminology interactive fun
Term and terminology interactive funTerm and terminology interactive fun
Term and terminology interactive fun
 
Topicmodels
TopicmodelsTopicmodels
Topicmodels
 
NLP and LSA getting started
NLP and LSA getting startedNLP and LSA getting started
NLP and LSA getting started
 
download
downloaddownload
download
 
download
downloaddownload
download
 
Multilingual Text Classification using Ontologies
Multilingual Text Classification using OntologiesMultilingual Text Classification using Ontologies
Multilingual Text Classification using Ontologies
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet Ontology
 

Plus de Estelle Delpech

Génération automatique de texte
Génération automatique de texteGénération automatique de texte
Génération automatique de texteEstelle Delpech
 
Identification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxIdentification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxEstelle Delpech
 
Découverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesDécouverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesEstelle Delpech
 
Invited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardInvited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardEstelle Delpech
 
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Estelle Delpech
 
Identification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxIdentification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxEstelle Delpech
 
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Estelle Delpech
 
Nomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchNomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchEstelle Delpech
 
Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Estelle Delpech
 
Nomao: local search and recommendation engine
Nomao: local search and recommendation engineNomao: local search and recommendation engine
Nomao: local search and recommendation engineEstelle Delpech
 
Évaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeÉvaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeEstelle Delpech
 
Experimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmExperimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmEstelle Delpech
 
Text Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringText Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringEstelle Delpech
 

Plus de Estelle Delpech (14)

Génération automatique de texte
Génération automatique de texteGénération automatique de texte
Génération automatique de texte
 
Identification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxIdentification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieux
 
Découverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesDécouverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des Langues
 
Invited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardInvited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis award
 
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
 
Identification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxIdentification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieux
 
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...
 
Nomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchNomao: data analysis for personalized local search
Nomao: data analysis for personalized local search
 
Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)
 
Nomao: local search and recommendation engine
Nomao: local search and recommendation engineNomao: local search and recommendation engine
Nomao: local search and recommendation engine
 
Évaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeÉvaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialisée
 
R&D Lingua et Machina
R&D Lingua et MachinaR&D Lingua et Machina
R&D Lingua et Machina
 
Experimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmExperimenting the TextTiling Algorithm
Experimenting the TextTiling Algorithm
 
Text Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringText Processing for Procedural Question Answering
Text Processing for Procedural Question Answering
 

Dernier

How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Alexander Turgeon
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 

Dernier (20)

How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 

Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange

  • 1. Dealing with Lexicon Acquired from Comparable Corpora Post-edition and Exchange Estelle Delpech, Lingua et Machina Béatrice Daille, U. de Nantes - LINA 1/23
  • 2. Working w/ lexicon acquired from comparable corpora I. Terminology acquisition from comparable corpora : quick overview II. A tool for terminology post-edition III. Data exchange : a TBX variant for automatically acquired lexicons IV. Future work 2/23
  • 3. Part I Terminology Acquisition from Comparable Corpora 3/23
  • 4. Terminology acquisition from comparable corpora   Comparable corpora: “Two corpora, respectively in two languages l1 and l2 are said ”comparable” if there exists a substantial part of the vocabulary of the corpus in language l1 whose translation can be found in the corpus in language l2.” (my translation of [Déjan and Gaussier, 2002] )  Advantages :  Availabily  Real usages 4/23
  • 5. Terminology acquisition from comparable corpora   Terminology extraction : a contextual analysis     Compare contexts of source and target terms If contexts are similar, there's a good chance source and target terms are translations of each other, ex : mastectomy : reconstruction, prophylactic, treat, undergo, removal mastectomie : reconstruction, prophylactique, traiter, subir, ablation 5/23
  • 6. Terminology acquisition from comparable corpora   Outputs one-to-many alignments – Evaluation : precision on the TopNBest alignments mastectomy  Results    0,92 ablation 0,89 mastectomie 0,48 opération Not as good as acquisition from parallel corpora ! Fung (1997) : 30 % accuracy on the Top20 candidates Morin et al. (2004) : translation is usually the 34th for 6/23 complex terms
  • 7. Part II A Tool for Post-edition 7/23
  • 8. A tool for post-edition  Existing Tools :   ArayaTermExtractor (Waldhör 2006)   iView (Merkel and Foo, 2007) Xerox Terminology Suite ® Our needs :  Deal with one-to-many alignments  Non-aligned contexts  Allow non binary annotation  Display useful information to help finding the right candidate in the corpus 8/23
  • 9. “Useful” information → Knownledge that helps catching the in vivo behavior terms →Text-driven, term-oriented approach  Useful information :  Variants  Collocations  Distributional neighbors  Contexts → To be harvested during the term extraction / alignment process 9/23
  • 10. Useful information : example Mastectomy Mastectomie risk reducting ~ simple ~ ~ préventive ~ simple Tumorectomy Lumpectomy Oophorectomy Tumorectomie Ablation Opération ...patient may choose to have risk-reducing bilateral mastectomy if they have a strong family history of breast cancer... ...la mastectomie préventive pourrait supprimer la grande majorité du risque de développer un cancer... 10/23
  • 12. Part III Data Exchange : a TBX variant for automatically acquired lexicon 12/23
  • 13. Quick introduction to TBX (1)    TBX : Term Base eXchange Open, XML-based standard for exchanging structured terminological data approved as an international standard by LISA and ISO (norm 30042)  Maps to TMF data model  Subset of MARTIF  Designed for various use cases  Customizable 13/23
  • 14. Quick introduction to TBX (2)  2 components :   Structure : core structure based on TMF metamodel Content : formalism to express data-categories and their constraints Content Form Core DTD/Schema Default TBX Default XCS XCS1 TBX variant 1 Adapted from ISO norm 30042:2008, Fig. 4, p.30 XCSn TBX variant n 14/23
  • 15. Quick introduction to TBX (3)  Form defined in DTD  Content defined in XCS respPerson responsability reliabilityCode partOfSpeech corpusTrace termType usageNote Taken from ISO norm 30042:2008, Fig. 1, p.9 15/23
  • 16. TBX variant for lexicon acquired from comparable corpora  Default TBX data-categories  termType : entryTerm, variant  externalCrossReference, usageNote  partOfSpeech, frequency, reliabilityCode...  transactionType, responsability + Customized data-categories :  occurrences, occurrenceCount  relatedTerm  termDefinition, definitionRelevance  ntigReference 16/23
  • 17. TBX variant : A term entry 17/23
  • 18. TBX variant : 1-to-n alignments 18/23
  • 19. TBX variant : approved alignment 19/23
  • 20. Feed-back on TBX TBX is made for stable terminologies with little uncertainy on the status of translations not machine-generated lexicons of “candidate translations” :  difficult to separate of term + properties from its alignments  no data category specific to automatically estimated reliability   Difficult to make text-driven, term-oriented knowledge fit in a concept oriented format  no definition category that would apply to a single term and not the whole concept
  • 22. Future work  Integration of prototype in Libellex  TBX import / export  edition of linguistic properties  User testing (ergonomics)  Evaluation of added-value for translation  Explore new ways of :  aligning terms  selecting contexts 22/23
  • 23. References  Post-edition prototype on line : http://80.82.238.151/Metricc/InterfaceValidation/ user “test”, no password  Metricc project : http://www.metricc.com/  Lingua et Machina : http://www.lingua-et-machina.com/  Comparable corpora : Déjean, H., Gaussier, É. (2002) : “Une nouvelle approche à l'extraction de lexiques bilingues à partir de corpus comparables”, In Lexicometrica, Alignement Lexical dans les corpus multilingues, pp.1-22.  ArayaTermExtractor : http://www.heartsome.de  Xerox Terminology Suite : http://www.temis.com/     Iview : Nyström, M., Merkel, M., Ahrenberg, L., Zweignebaum, P., Petersson, H. and Åhlfeldt H. (2006) : “Creating a medical English-Swedish dictionary using interactive word alignment”', In BMC Medical Informatics and Decision Making, 2006, pp. 6-35 TMF : ISO 16642 - Terminological markup framework TBX : ISO 30042 - Systems to manage terminology, knowledge and content -- TermBase eXchange (TBX) Data categories : ISO 12620 - Terminology and other language and content resources -Specification of data categories and management of a Data Category Registry for language resources