This paper presents usage scenarios of the platform being developed within the TTC project (Terminology Extraction, Translation Tools and Comparable Corpora) along with the first feedback from potential users. The TTC project aims at leveraging translation tools, computer-assisted translation tools, and terminology management tools by automatically generating bilingual terminologies from comparable corpora in several languages of the European Union (English, French, German, Latvian and Spanish), as well as in Chinese and Russian. The TTC platform includes a web crawler and a corpora management tool, as well as tools for monolingual term extraction and bilingual terminology alignment, online terminology management, and terminology export into CAT tools and MT systems.
Overall, the paper focuses on the language activities to be carried out with the TTC tools, issues with respect to the availability of required language resources and linguistic knowledge, and different user profiles and needs. Regarding potential user needs, we discuss the results of an online questionnaire-based survey on terminology and corpora issues conducted in the translation and localization industry to reveal user needs. Furthermore, we present the envisaged usage scenarios as well as first feedback from potential users. The expected TTC input and outputs are also outlined. Finally, as it seems clear that the amount of available data and resources will not be the same for all languages, we discuss technical solutions to achieve language coverage: the TTC tools will offer different approaches depending on the amount and type of linguistic knowledge available.
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Tralogy 2011-user scenariosttc
1. User-centered Views on Terminology
Extraction Tools:
Usage Scenarios and Integration into
MT/CAT
Helena Blancafort, Ulrich Heid, Tatiana Gornostay,
Claude Méchoulam, Béatrice Daille, Serge Sharoff
Project TTC
Terminology Extraction, Translation Tools and Comparable Corpora
Tralogy - 3rd of March 2011 1
2. The idea behind TTC
Tool Functions and Applications
WEB
Crawling
Corpus Corpus
in SL in TL
Tralogy - 3rd of March 2011 2
3. The idea behind TTC
Tool Functions and Applications
WEB
Crawling
Corpus Corpus
in SL in TL
wind energy aérogénérateur
Term Term
wind turbine énergie éolienne
Extraction Extraction
4. The idea behind TTC
Tool Functions and Applications
WEB
Crawling
Corpus Corpus
in SL in TL
wind energy aérogénérateur
Term Term
wind turbine énergie éolienne
Extraction Extraction
Term
Alignment
Tralogy - 3rd of March 2011 4
5. The idea behind TTC
Tool Functions and Applications
WEB
Crawling
Corpus Corpus
in SL in TL
wind energy aérogénérateur
Term Term
wind turbine énergie éolienne
Extraction Extraction
Term
Alignment
MT
CAT Tools Rule-based - Systran
Statistical MT -Moses
Tralogy - 3rd of March 2011 5
6. First Interaction with Users
Needs and Expectations
Online Survey among Translation Industry ( March 2010)
• 139 respondents from 31 countries
Workshop with experts (Oct 2010)
• Users, deployers, developers
feedback TTC specifications
Topics
• Relevance of terminology work
• User and application types
• Input to TTC tools
• Output of TTC tools
Tralogy - 3rd of March 2011 6
7. User Needs Survey
Relevance of Terminology Work
Stable since 2004 LISA survey
• 75% systematic terminology work (LISA)
• Over 50% spend 10-30% time on terminology (TTC)
Use of tools
• 74% use CAT tools (TTC), 27% terminology tools (SDL)
• 66% are interested in new solutions (TTC)
• 50% collect corpora manually
• 40% agree to share their terminology within an online
database
Need / opportunity for terminology tools
Tralogy - 3rd of March 2011 7
8. Types of Potential TTC tool users
Requests of Oct Workshop
Standard • little time, small amount of information
Users • Translators, technical writers
Advanced • Terminology specialists, translation proofreaders
Users • Interest in broad documentation of output
• interest in specific solutions
MT users • Focus on workflow integration
Tralogy - 3rd of March 2011 8
9. Input to TTC tools
Feedback of Domain Experts
Awareness of mixed quality:
WEB genres, text types
Crawling TTC output include
METADATA
Seeds
-Company data Users have different input
- existing terminologies
- just keywords
Tralogy - 3rd of March 2011 9
10. Output of TTC tools by User Types
Standard • Equivalents: maxim. 5 candidates
Users • Format for CAT tools: TBX, tables (Excel)
Advanced • Term origin metadata (Dublin Core based)
• Reliability confidence values
Users • Term variants
• Output adapted to the respective system
MT users • RBMT vs. SMT vs. CAT tools
Tralogy - 3rd of March 2011 10
11. Next steps
Integrate lessons learnt from users into TTC
prototype
• Metadata in focused crawler
• Provide term variants
• Different output formats
Test of TTC tool prototype with Advisory Board
members
2nd Users Workshop Spring 2012
Tralogy - 3rd of March 2011 11
13. TTC Output for Advanced Users
Requests from Potential Users
1. Equivalents
2. Example sentences
3. Definitions x
4. Style/usage
5. Frequency
6. Subject field
7. Synonyms x
8. Word classes
Tralogy - 3rd of March 2011 13