2. Computer-assisted translation xenotext
xenotext
Computer-assisted translation (CAT)
or computer-aided translation is a
translation process in which a
human translator uses software to obtain
a higher degree of precision and efficiency.
2
Computer-Assisted Translation Introduction
3. Computer-assisted translation xenotext
xenotext
Typical components of a CAT-solution include:
Data mining tools:
Translation memory
alignment and
(TM)
term extraction
Translation editor Quality assurance
Translation management
Termbase
system (TMS)
33
Computer-Assisted Translation Introduction
5. Translation memory xenotext
xenotext
A translation memory (TM) is a database that
stores sentences and their translations for reuse in
new translation projects.
This is a Ceci est
This is a sentence.
sentence. Ceci est une phrase.
une phrase.
5
Computer-Assisted Translation Translation memory
6. Translation unit xenotext
xenotext
A record in the translation memory is called a
translation unit (TU).
source segment This is a sentence.
target segment Ceci est une phrase.
Created on: 18/09/2006
Created by: Gerrit
information fields
Customer: ACME
Project: Training
6
Computer-Assisted Translation Translation memory
7. Segmentation xenotext
xenotext
Segmentation is the process of splitting the new
source text into logical, reusable units.
Segmentation can be either sentence-based or
paragraph-based.
Paragraph-based segmentation Sentence-based segmentation
1 Welcome to Brussels 1 Welcome to Brussels
2 Brussels is the capital of 2 Brussels is the capital of
Belgium. It is officially bilingual. Belgium.
3 It is officially bilingual.
7
Computer-Assisted Translation Translation memory
8. Match types xenotext
xenotext
Translation memory
(TM)
0% 99% or lower 100% 101% ??
No match Fuzzy match Exact match Context match
The new source The new source The new source The new source
segment is segment is segment is segment is
not found in the similar (but not identical to a identical to a
TM. identical) to a source segment source segment
source segment found in the TM. found in the TM
found in the TM. and they both
have the same
context.
8Computer-Assisted Translation Translation memory
9. TMX xenotext
xenotext
• Most translation memory tools support TMX
(Translation Memory eXchange), an XML-based
open standard for the exchange of translation
memory data.
• TMX is developed and maintained by LISA
(www.lisa.org).
TMX does not ensure 100% compatibility between
different translation tools: e.g. segmentation or
formatting may be handled in different ways.
9
Computer-Assisted Translation Translation memory
10. SRX xenotext
xenotext
• SRX (Segmentation Rules eXchange) is an
XML-based open standard for the exchange of
segmentation rules.
• Without SRX, TMX leverage may be lower than
expected.
• SRX is developed and maintained by LISA
(www.lisa.org).
SRX is currently not supported by SDL Trados.
10
Computer-Assisted Translation Translation memory
12. Translation editor xenotext
xenotext
• A translation editor is the translator's working
environment, offering easy access to source and target
segments.
• Translation editors typically include spelling checkers
in a wide variety of languages, and may enable the
user to add comments or status indications to
each translation.
• File filters convert the source document to a
translatable (or localizable) format, such as XLIFF.
12
Computer-Assisted Translation Translation editor
13. File filters xenotext
xenotext
Source Document Translation Editor Target Document
HTML DLL HTML DLL
EXE PowerPoint EXE PowerPoint
InDesign PHP InDesign PHP
SGML FrameMaker SGML FrameMaker
XLIFF
DOCX File filters File filters DOCX
PDF RTF PDF RTF
QuarkXPress QuarkXPress
OpenOffice Excel OpenOffice Excel
TXT XML TXT XML
DITA DITA
PageMaker PageMaker
13
Computer-Assisted Translation Translation editor
14. XLIFF xenotext
xenotext
• XLIFF (XML Localization Interchange File Format)
is an XML-based open standard for translatable (or
localizable) files.
• XLIFF is developed and maintained by OASIS
(www.oasis-open.org).
There are various "flavours" of XLIFF (e.g. SDLXLIFF),
which in practice complicates the interchange of XLIFF
data between different tools.
14
Computer-Assisted Translation Translation editor
17. Alignment xenotext
xenotext
Alignment is the process in which specialized
software compares a source text with its
translation, matching equivalent segments, e.g. for
the purpose of creating a translation memory.
In a semi-automatic alignment process, the
alignment results are reviewed and misalignments
are corrected by a human linguist.
17
17
Computer-Assisted Translation Alignment
20. Example entry structure xenotext
xenotext
Entry Subject
Note
English Definition Source
Term Gender
Source
Term Gender
Source
French Definition Source
Term Gender
Source
20
Computer-Assisted Translation Termbase
21. Concept-oriented termbases xenotext
xenotext
Your concept may
look like this
All terms and synonyms referring to the same concept
should be stored in the same entry:
car, motorcar, automobile, voiture, bagnole, ...
This will ensure that each language in your termbase
can be used as source or target language.
21
Computer-Assisted Translation Termbase
22. TBX xenotext
xenotext
• TBX (TermBase eXchange) is an XML-based
open standard for exchanging structured
terminological data.
• The TBX standard is developed by LISA
(www.lisa.org) and has also been published as an
ISO standard.
22
22
Computer-Assisted Translation Termbase
23. Term extraction xenotext
xenotext
Term extraction (or terminology extraction)
is the process of extracting mono- or bilingual lists
of potentially interesting terms from a selection of
electronic texts.
23
23
Computer-Assisted Translation Termbase
24. Terminology extraction xenotext
xenotext
Linguistic term extraction:
• uses grammatical information to identify
term candidates (and their translations)
• language dependent
Statistical term extraction:
• looks for repeated sequences of lexical items
• language independent
24
Computer-Assisted Translation Termbase